From hannes.hirzel@gmail.com Fri Aug 14 06:05:50 2015 From: "H. Hirzel" To: moose-dev@list.inf.unibe.ch Subject: [Moose-dev] Re: Getting some tag in an HTML file Date: Fri, 14 Aug 2015 04:05:40 +0000 Message-ID: In-Reply-To: <20150814003140.2D58C8056F@mailhub-lb1.unibe.ch> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5250822005054626376==" --===============5250822005054626376== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable http://ss3.gemtalksystems.com/ss/Tabular.html contains an application example of a SAX parser. You only pick what is of interest. On 8/14/15, Vincent Blondeau wrote: > Hi, > > Look at the class side, there is the method parse: namespace: validation: . > call this method instead of parse: with false in the two last arguments. It > should work. > > Anyway, you should use the sax parser. It is faster and memory less > consuming. It is very simple to get only one tag. > > Cheers > Vincent > > Le 14 ao=C3=BBt 2015 01:31, Alexandre Bergel a = =C3=A9crit : >> >> Hi! >> >> Together with Nicolas we are trying to get all the >> from html files. >> We have tried to use XMLDOMParser, but many webpages are actually not well >> formed, therefore the parser is complaining. >> >> Anyone has tried to get some particular tags from HTML files? This looks >> like a classical thing to do. Maybe some of you have done it. >> Is there a way to configure the parser to accept a broken XML/HTML >> content? >> >> Cheers, >> Alexandre >> -- >> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: >> Alexandre Bergel=C2=A0 http://www.bergel.eu >> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. >> >> >> _______________________________________________ >> Moose-dev mailing list >> Moose-dev(a)iam.unibe.ch >> https://www.iam.unibe.ch/mailman/listinfo/moose-dev > > _______________________________________________ > Moose-dev mailing list > Moose-dev(a)iam.unibe.ch > https://www.iam.unibe.ch/mailman/listinfo/moose-dev > --===============5250822005054626376==--