[Moose-dev] Re: [Pharo-dev] Getting some tag in an HTML file

14 Aug 2015


      Hi,
You can also consider using island parsing, this very cool addition to
PetitParser developed by Jan:
beginScript := '<script>' asParser.
endScript := '</script>' asParser.
script := beginScript , endScript negate star flatten , endScript ==>
#second.
islandScripts := (script island ==> #second) star.
If you apply it on:
code := 'uninteresting part
<script>
some code
</script>
another
uninteresting part
<script>
some other
code
</script>
yet another
uninteresting part
'.
You get:
islandScripts parse: code
==>  "#('some code' 'some other
code')"
Quite cool, no? :)
Doru
On Fri, Aug 14, 2015 at 1:31 AM, Alexandre Bergel alexandre.bergel@me.com
wrote:
...
Hi!
Together with Nicolas we are trying to get all the <script …> … </script>
from html files.
We have tried to use XMLDOMParser, but many webpages are actually not well
formed, therefore the parser is complaining.
Anyone has tried to get some particular tags from HTML files? This looks
like a classical thing to do. Maybe some of you have done it.
Is there a way to configure the parser to accept a broken XML/HTML content?
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
-- 
www.tudorgirba.com

"Every thing has its own flow"

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Moose-dev] Re: [Pharo-dev] Getting some tag in an HTML file