Moose-dev

moose-dev@list.inf.unibe.ch

6621 discussions

Re: [Pharo-dev] Getting some tag in an HTML file

by Floyd May

If your scripts contain string literals with '<script>' or '</script>' in them (I've seen this before), then your mileage may vary with Tudor's approach. Also consider that script tags may have attributes, and those attributes may have single or double quotes. Also, script tags may or may not refer to javascript. Many javascript libraries use script tags for HTML template sources, for instance. These tags you'd probably want to keep (and perhaps follow the reference for the third): <script type='text/javascript'> [code here] </script> <script type='text/javascript'> document.write('<script src="somewhere.js"></script>");</script>  <script type='text/javascript' src="path/to/javascript/source.js"></script> However, something like this you might want to ignore: <script type='text/html' id='someTemplate'> <span>{{some template syntax}}</span> </script> If you can make some assumptions about what you're parsing you might be able to adapt Tudor's solution to be more robust. However, if you're trying for a general-purpose solution, I'd highly recommend using an existing HTML parsing library, not an XML parser. In general, parsing HTML as XML is the wrong approach. HTML is technically not a subset of XML (closing tags aren't required), so most true XML parsers are going to barf on it. Some further reading: https://en.wikipedia.org/wiki/Tag_soup https://en.wikipedia.org/wiki/HTML5#XHTML5_.28XML-serialized_HTML5.29 I'm new to smalltalk so I can't recommend a library, but in Java I've used Tag Soup and I've used Beautiful Soup in Python. Hope this helps, Floyd On Fri, Aug 14, 2015 at 9:40 AM, Tudor Girba <tudor(a)tudorgirba.com> wrote: Hi, > > You can also consider using island parsing, this very cool addition to > PetitParser developed by Jan: > > beginScript := '<script>' asParser. > endScript := '</script>' asParser. > script := beginScript , endScript negate star flatten , endScript ==> > #second. > islandScripts := (script island ==> #second) star. > > If you apply it on: > > code := 'uninteresting part > <script> > some code > </script> > another > uninteresting part > <script> > some other > code > </script> > yet another > uninteresting part > '. > > You get: > islandScripts parse: code > ==> "#('some code' 'some other > code')" > > Quite cool, no? :) > > Doru > > > On Fri, Aug 14, 2015 at 1:31 AM, Alexandre Bergel <alexandre.bergel(a)me.com > > wrote: > > > Hi! >> >>> >> Together with Nicolas we are trying to get all the <script …> … </script> >> from html files. >> >>> We have tried to use XMLDOMParser, but many webpages are actually not >> well formed, therefore the parser is complaining. >> >>> >> Anyone has tried to get some particular tags from HTML files? This looks >> like a classical thing to do. Maybe some of you have done it. >> >>> Is there a way to configure the parser to accept a broken XML/HTML >> content? >> >>> >> Cheers, >> >>> Alexandre >> >>> -- >> >>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: >> >>> Alexandre Bergel http://www.bergel.eu >> >>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. >> >>> >> >> >> >> > -- > www.tudorgirba.com > > "Every thing has its own flow" > > _______________________________________________ > > Moose-dev mailing list > > Moose-dev(a)iam.unibe.ch > > https://www.iam.unibe.ch/mailman/listinfo/moose-dev > >

9 years, 10 months

Is there a JavaScript parser somewhere?

by Alexandre Bergel

Just wondering… Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

Presentation Context Update?

by Sean P. DeNigris

"A moldable presentation consists of a set of basic presentations selected according to the current development context... Currently defining a development context and filtering presentations based on it is not supported" [1] Is there a timeline for this feature? [1]. http://scg.unibe.ch/archive/papers/Chis14a-MoldableInspector.pdf ----- Cheers, Sean -- View this message in context: http://forum.world.st/Presentation-Context-Update-tp4843171.html Sent from the Moose mailing list archive at Nabble.com.

9 years, 10 months

sQuick?

by Alexandre Bergel

Hi! I see the project: github.com/jig08/sQuick_new Can I use it as a replacement of Spotlight on OS X? Is it means to replace it? Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

Re: [Pharo-dev] PetitParser speed and cost of PPFailure

by Tudor Girba

You will probably have more chances to get a response on the Moose mailing list. Cheers, Doru On Sun, Aug 16, 2015 at 5:58 PM, Holger Freyther <holger(a)freyther.de> wrote: > Hi, > > once again I am not sure if this is the right list. The first parser I > wrote using > PetitParser was a SIP (and then MGCP) parser. I have recently ported[1] the > code to Pharo and with Pharo it is very tempting to Use > BlockClosure>>#bench > to get an idea of the speed. > > > I have two performance “issues” and wonder if others hand similar issues > with > PetitParser and if there is a general approach to this. > > > > 1.) Combining two PPCharsetPredicates does not combine the “classification” > table it had. One could create a PPPredicateObjectParser subclass that is > special casing >>#/ to build a combined classification table. > > > 2.) When blindly following a BNF enumeration of "A or B or C or D or E > or CatchAll” and each “A, B” follow common pattern (e.g. token COLON value) > one pays a high cost in the backtracking and constructing the PPFailure for > each failed case. > > In my SIPGrammar I have action parsers for To ==>.. From ==> and would > like to keep that. At the same time I would be happy if the token in front > of the > colon is only consumed once and then delegated to the right parser and if > that > one failed use the ‘catch all’ one. > > I don’t know which abstraction would be needed to allow creating optimized > PetitParsers for such grammars. > > sorry for the long mail, long details and context is below. > > > kind regards > holger > > > > > > > Full details: > > > 1.) CharSetPredicate > > | aParser bParser combinedParser aTime bTime cTime | > > aParser := #digit asParser. > bParser := #letter asParser. > combinedParser := aParser / bParser. > > aTime := [ aParser parse: 'b'] bench. > bTime := [ bParser parse: 'b'] bench. > cTime := [ combinedParser parse: 'b'] bench. > { aTime. bTime. cTime } > > cTime is bounded by the time execution time of of the slowest > of these parsers + overhead (e.g. PPFailure creation). > > e.g. > > #('559,000 per second.' '1,010,000 per second.' '429,000 per second.') > > With a proof of concept PPPredicateCharSetParser > > #('1,330,000 per second.' '1,550,000 per second.' '1,580,000 per second.’) > > The noise is pretty string here but what is important is that bParser and > the > combinedParser are in the same ballpark. > > 2.) Choice Parser > > > > The BNF grammar of the parser is roughly: > > Request = Request-Line > *( message-header ) > CRLF > [ message-body ] > > message-header = (Accept > … > / To > / From > / Via > / extension-header) CRLF > > Alert-Info = "Alert-Info" HCOLON alert-param *(COMMA alert-param) > Accept = "Accept" HCOLON > [ accept-range *(COMMA accept-range) ] > > > So there can be several lines of “message-header”. And each method header > starts with a token/word, a colon and then the parameter. > “extension-header” > is kind of a catch all if no other rule matched. E.g. if a client sends a > To which is > wrongly encoded it would end up with the extension-header. > > I transferred the above naively to PetitParser and end up with something > like > parsing ~500 messages a second. The main cost appears to come from the > choice parser that needs to create a PetitFailure all the time. E.g. if > you have a > line like this: > > ‘From: “Holger Freyther” <sip:323234@foo.de>’ > > The choice parser will start with the “Accept” rule, parse the token > (“From” and > then create a PPFailure, then … rules, then “To”, parse the token.. So we > have > parsing the same token more than once and creating PPFailures all the > time. I > ended up creating a custom parser that will peek the token, have a > horrible chain > of token = ‘XYZ’ ifTrue and then dispatch to the other rule. > > It would be nice if PetitParser could be taught to only parse the token > once and > then delegate to the param rule. E.g. a PPAnyOfParser that allows to > specify the > token to match, the parser to continue with and a fallback parser? > > > > [1] > http://smalltalkhub.com/#!/~osmocom/SIP > http://smalltalkhub.com/#!/~osmocom/MGCP > -- www.tudorgirba.com "Every thing has its own flow"

9 years, 10 months

Slowdown?

by Alexandre Bergel

Hi! I have noticed a slowdown in moose. Opening a menu is now particularly slow. Here is a https://dl.dropboxusercontent.com/u/31543901/TMP/slowdown.mov I have just tried on Pharo 5, and it looks like to have a similar problem. Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

Tiny Blog Update

by Sean P. DeNigris

From http://www.humane-assessment.com/blog/the-moldable-gtinspector-deconstructed : > In the GTInspector, you have another option: spawn it to the right (select > and Cmd+o). It should now be Cmd+g, right? ----- Cheers, Sean -- View this message in context: http://forum.world.st/Tiny-Blog-Update-tp4843167.html Sent from the Moose mailing list archive at Nabble.com.

9 years, 10 months

Worried...

by Alexandre Bergel

Hi! I am a bit worried. The jenkins for Moose 6.0 has not been green for ages. I am using a fresh 6.0, and I get error when tracing in the debugger: -=-=-=-=-=-=-=-=-=-=-=-= DebuggerMethodMapOpal >> tempNamesForContext: aContext "Answer an Array of all the temp names in scope in aContext starting with the home's first local (the first argument or first temporary if no arguments)." ^ aContext sourceNode scope allTempNames. -=-=-=-=-=-=-=-=-=-=-=-= #scope is sent to nil. A good starting point would be to have the jenkins green back. Having it yellow for too long is not constructive. cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

Stack plot

by Alexandre Bergel

Hi! Milton worked on a StackPlot builder. This is currently highly prototypal. Inspect the expression: RTExperimentalExample new exampleStackOnRoassal It gives something like, which shows the amount of code subclasses of RTShape: Pretty cool! Many other examples are contained in the class RTExperimentalExample It is worth having a look at them. And yes, it is exportable to HTML. Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

very strange error with do: !!!

by Nicolas Anquetil

How is this possible? aCollection is an array with 1 element (an AdaParameter ...) but each in the do: block contains nil (so the add: gives a DNU) !?!?!? I have no idea how this can be possible Any clue nicolas

9 years, 11 months

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Moose-dev