From vincent.blondeau@polytech-lille.net Fri Aug 14 02:31:43 2015 From: Vincent Blondeau To: moose-dev@list.inf.unibe.ch Subject: [Moose-dev] Re: Getting some tag in an HTML file Date: Fri, 14 Aug 2015 02:31:34 +0200 Message-ID: <20150814003140.2D58C8056F@mailhub-lb1.unibe.ch> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2481157371648289414==" --===============2481157371648289414== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi,=20 Look at the class side, there is the method parse: namespace: validation: . c= all this method instead of parse: with false in the two last arguments. It sh= ould work. Anyway, you should use the sax parser. It is faster and memory less consuming= . It is very simple to get only one tag. Cheers Vincent Le 14 ao=C3=BBt 2015 01:31, Alexandre Bergel a = =C3=A9crit : > > Hi! > > Together with Nicolas we are trying to get all the from html files. > We have tried to use XMLDOMParser, but many webpages are actually not well = formed, therefore the parser is complaining. > > Anyone has tried to get some particular tags from HTML files? This looks li= ke a classical thing to do. Maybe some of you have done it. > Is there a way to configure the parser to accept a broken XML/HTML content? > > Cheers, > Alexandre > --=20 > _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: > Alexandre Bergel=C2=A0 http://www.bergel.eu > ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > > > _______________________________________________ > Moose-dev mailing list > Moose-dev(a)iam.unibe.ch > https://www.iam.unibe.ch/mailman/listinfo/moose-dev --===============2481157371648289414==--