- Moose-dev - INF Mailman

Re: [Pharo-dev] Getting some tag in an HTML file

by Floyd May

If your scripts contain string literals with '<script>' or '</script>' in them (I've seen this before), then your mileage may vary with Tudor's approach. Also consider that script tags may have attributes, and those attributes may have single or double quotes. Also, script tags may or may not refer to javascript. Many javascript libraries use script tags for HTML template sources, for instance. These tags you'd probably want to keep (and perhaps follow the reference for the third): <script type='text/javascript'> [code here] </script> <script type='text/javascript'> document.write('<script src="somewhere.js"></script>");</script>  <script type='text/javascript' src="path/to/javascript/source.js"></script> However, something like this you might want to ignore: <script type='text/html' id='someTemplate'> <span>{{some template syntax}}</span> </script> If you can make some assumptions about what you're parsing you might be able to adapt Tudor's solution to be more robust. However, if you're trying for a general-purpose solution, I'd highly recommend using an existing HTML parsing library, not an XML parser. In general, parsing HTML as XML is the wrong approach. HTML is technically not a subset of XML (closing tags aren't required), so most true XML parsers are going to barf on it. Some further reading: https://en.wikipedia.org/wiki/Tag_soup https://en.wikipedia.org/wiki/HTML5#XHTML5_.28XML-serialized_HTML5.29 I'm new to smalltalk so I can't recommend a library, but in Java I've used Tag Soup and I've used Beautiful Soup in Python. Hope this helps, Floyd On Fri, Aug 14, 2015 at 9:40 AM, Tudor Girba <tudor(a)tudorgirba.com> wrote: Hi, > > You can also consider using island parsing, this very cool addition to > PetitParser developed by Jan: > > beginScript := '<script>' asParser. > endScript := '</script>' asParser. > script := beginScript , endScript negate star flatten , endScript ==> > #second. > islandScripts := (script island ==> #second) star. > > If you apply it on: > > code := 'uninteresting part > <script> > some code > </script> > another > uninteresting part > <script> > some other > code > </script> > yet another > uninteresting part > '. > > You get: > islandScripts parse: code > ==> "#('some code' 'some other > code')" > > Quite cool, no? :) > > Doru > > > On Fri, Aug 14, 2015 at 1:31 AM, Alexandre Bergel <alexandre.bergel(a)me.com > > wrote: > > > Hi! >> >>> >> Together with Nicolas we are trying to get all the <script …> … </script> >> from html files. >> >>> We have tried to use XMLDOMParser, but many webpages are actually not >> well formed, therefore the parser is complaining. >> >>> >> Anyone has tried to get some particular tags from HTML files? This looks >> like a classical thing to do. Maybe some of you have done it. >> >>> Is there a way to configure the parser to accept a broken XML/HTML >> content? >> >>> >> Cheers, >> >>> Alexandre >> >>> -- >> >>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: >> >>> Alexandre Bergel http://www.bergel.eu >> >>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. >> >>> >> >> >> >> > -- > www.tudorgirba.com > > "Every thing has its own flow" > > _______________________________________________ > > Moose-dev mailing list > > Moose-dev(a)iam.unibe.ch > > https://www.iam.unibe.ch/mailman/listinfo/moose-dev > >

9 years, 10 months

3
2
0 0

Is there a JavaScript parser somewhere?

by Alexandre Bergel

Just wondering… Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

4
6
0 0

Presentation Context Update?

by Sean P. DeNigris

"A moldable presentation consists of a set of basic presentations selected according to the current development context... Currently defining a development context and filtering presentations based on it is not supported" [1] Is there a timeline for this feature? [1]. http://scg.unibe.ch/archive/papers/Chis14a-MoldableInspector.pdf ----- Cheers, Sean -- View this message in context: http://forum.world.st/Presentation-Context-Update-tp4843171.html Sent from the Moose mailing list archive at Nabble.com.

9 years, 10 months

2
10
0 0

sQuick?

by Alexandre Bergel

Hi! I see the project: github.com/jig08/sQuick_new Can I use it as a replacement of Spotlight on OS X? Is it means to replace it? Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
1
0 0

Re: [Pharo-dev] PetitParser speed and cost of PPFailure

by Tudor Girba

You will probably have more chances to get a response on the Moose mailing list. Cheers, Doru On Sun, Aug 16, 2015 at 5:58 PM, Holger Freyther <holger(a)freyther.de> wrote: > Hi, > > once again I am not sure if this is the right list. The first parser I > wrote using > PetitParser was a SIP (and then MGCP) parser. I have recently ported[1] the > code to Pharo and with Pharo it is very tempting to Use > BlockClosure>>#bench > to get an idea of the speed. > > > I have two performance “issues” and wonder if others hand similar issues > with > PetitParser and if there is a general approach to this. > > > > 1.) Combining two PPCharsetPredicates does not combine the “classification” > table it had. One could create a PPPredicateObjectParser subclass that is > special casing >>#/ to build a combined classification table. > > > 2.) When blindly following a BNF enumeration of "A or B or C or D or E > or CatchAll” and each “A, B” follow common pattern (e.g. token COLON value) > one pays a high cost in the backtracking and constructing the PPFailure for > each failed case. > > In my SIPGrammar I have action parsers for To ==>.. From ==> and would > like to keep that. At the same time I would be happy if the token in front > of the > colon is only consumed once and then delegated to the right parser and if > that > one failed use the ‘catch all’ one. > > I don’t know which abstraction would be needed to allow creating optimized > PetitParsers for such grammars. > > sorry for the long mail, long details and context is below. > > > kind regards > holger > > > > > > > Full details: > > > 1.) CharSetPredicate > > | aParser bParser combinedParser aTime bTime cTime | > > aParser := #digit asParser. > bParser := #letter asParser. > combinedParser := aParser / bParser. > > aTime := [ aParser parse: 'b'] bench. > bTime := [ bParser parse: 'b'] bench. > cTime := [ combinedParser parse: 'b'] bench. > { aTime. bTime. cTime } > > cTime is bounded by the time execution time of of the slowest > of these parsers + overhead (e.g. PPFailure creation). > > e.g. > > #('559,000 per second.' '1,010,000 per second.' '429,000 per second.') > > With a proof of concept PPPredicateCharSetParser > > #('1,330,000 per second.' '1,550,000 per second.' '1,580,000 per second.’) > > The noise is pretty string here but what is important is that bParser and > the > combinedParser are in the same ballpark. > > 2.) Choice Parser > > > > The BNF grammar of the parser is roughly: > > Request = Request-Line > *( message-header ) > CRLF > [ message-body ] > > message-header = (Accept > … > / To > / From > / Via > / extension-header) CRLF > > Alert-Info = "Alert-Info" HCOLON alert-param *(COMMA alert-param) > Accept = "Accept" HCOLON > [ accept-range *(COMMA accept-range) ] > > > So there can be several lines of “message-header”. And each method header > starts with a token/word, a colon and then the parameter. > “extension-header” > is kind of a catch all if no other rule matched. E.g. if a client sends a > To which is > wrongly encoded it would end up with the extension-header. > > I transferred the above naively to PetitParser and end up with something > like > parsing ~500 messages a second. The main cost appears to come from the > choice parser that needs to create a PetitFailure all the time. E.g. if > you have a > line like this: > > ‘From: “Holger Freyther” <sip:323234@foo.de>’ > > The choice parser will start with the “Accept” rule, parse the token > (“From” and > then create a PPFailure, then … rules, then “To”, parse the token.. So we > have > parsing the same token more than once and creating PPFailures all the > time. I > ended up creating a custom parser that will peek the token, have a > horrible chain > of token = ‘XYZ’ ifTrue and then dispatch to the other rule. > > It would be nice if PetitParser could be taught to only parse the token > once and > then delegate to the param rule. E.g. a PPAnyOfParser that allows to > specify the > token to match, the parser to continue with and a fallback parser? > > > > [1] > http://smalltalkhub.com/#!/~osmocom/SIP > http://smalltalkhub.com/#!/~osmocom/MGCP > -- www.tudorgirba.com "Every thing has its own flow"

9 years, 10 months

1
0
0 0

Slowdown?

by Alexandre Bergel

Hi! I have noticed a slowdown in moose. Opening a menu is now particularly slow. Here is a https://dl.dropboxusercontent.com/u/31543901/TMP/slowdown.mov I have just tried on Pharo 5, and it looks like to have a similar problem. Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
1
0 0

Tiny Blog Update

by Sean P. DeNigris

From http://www.humane-assessment.com/blog/the-moldable-gtinspector-deconstructed : > In the GTInspector, you have another option: spawn it to the right (select > and Cmd+o). It should now be Cmd+g, right? ----- Cheers, Sean -- View this message in context: http://forum.world.st/Tiny-Blog-Update-tp4843167.html Sent from the Moose mailing list archive at Nabble.com.

9 years, 10 months

1
0
0 0

Worried...

by Alexandre Bergel

Hi! I am a bit worried. The jenkins for Moose 6.0 has not been green for ages. I am using a fresh 6.0, and I get error when tracing in the debugger: -=-=-=-=-=-=-=-=-=-=-=-= DebuggerMethodMapOpal >> tempNamesForContext: aContext "Answer an Array of all the temp names in scope in aContext starting with the home's first local (the first argument or first temporary if no arguments)." ^ aContext sourceNode scope allTempNames. -=-=-=-=-=-=-=-=-=-=-=-= #scope is sent to nil. A good starting point would be to have the jenkins green back. Having it yellow for too long is not constructive. cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
1
0 0

Stack plot

by Alexandre Bergel

Hi! Milton worked on a StackPlot builder. This is currently highly prototypal. Inspect the expression: RTExperimentalExample new exampleStackOnRoassal It gives something like, which shows the amount of code subclasses of RTShape: Pretty cool! Many other examples are contained in the class RTExperimentalExample It is worth having a look at them. And yes, it is exportable to HTML. Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
2
0 0

very strange error with do: !!!

by Nicolas Anquetil

How is this possible? aCollection is an array with 1 element (an AdaParameter ...) but each in the do: block contains nil (so the add: gives a DNU) !?!?!? I have no idea how this can be possible Any clue nicolas

9 years, 10 months

2
3
0 0

simple question about OSProcress

by Alexandre Bergel

Hi! I am considering in using OSProcess to run an external application. I have tried: OSProcess waitForCommand: 'ls -l’ but this expression never return a value. I also tried: OSProcess command: 'ls -l /etc’ But it returns an ExternalUnixOSProcess. I am not sure what I can do with this. How can I get the result of the unix command? Sorry if these questions looks naive :-) Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

4
4
0 0

Getting some tag in an HTML file

by Alexandre Bergel

Hi! Together with Nicolas we are trying to get all the <script …> … </script> from html files. We have tried to use XMLDOMParser, but many webpages are actually not well formed, therefore the parser is complaining. Anyone has tried to get some particular tags from HTML files? This looks like a classical thing to do. Maybe some of you have done it. Is there a way to configure the parser to accept a broken XML/HTML content? Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
1
0 0

Re: Getting some tag in an HTML file

by Vincent Blondeau

Hi, Look at the class side, there is the method parse: namespace: validation: . call this method instead of parse: with false in the two last arguments. It should work. Anyway, you should use the sax parser. It is faster and memory less consuming. It is very simple to get only one tag. Cheers Vincent Le 14 août 2015 01:31, Alexandre Bergel <alexandre.bergel(a)me.com> a écrit : > > Hi! > > Together with Nicolas we are trying to get all the <script …> … </script> from html files. > We have tried to use XMLDOMParser, but many webpages are actually not well formed, therefore the parser is complaining. > > Anyone has tried to get some particular tags from HTML files? This looks like a classical thing to do. Maybe some of you have done it. > Is there a way to configure the parser to accept a broken XML/HTML content? > > Cheers, > Alexandre > -- > _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: > Alexandre Bergel http://www.bergel.eu > ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > > > _______________________________________________ > Moose-dev mailing list > Moose-dev(a)iam.unibe.ch > https://www.iam.unibe.ch/mailman/listinfo/moose-dev

9 years, 10 months

2
1
0 0

unstable image

by Alexandre Bergel

Hi! Am I the only one to experience image freeze? Especially when I load from Monticello. cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
2
0 0

DSM

by Alexandre Bergel

Hi! Roassal has a new builder called RTDSM. It is currently very basic. Here is an example: dsm := RTDSM new. dsm objects: RTShape withAllSubclasses. dsm dependency: #dependentClasses. produces the following: Are there some algorithms to find an optimal ordering of the elements to display? Here is a slightly bigger matrix Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
2
0 0

Moving Issues to GitHub

by Sean P. DeNigris

In progress: x. Export Google Issues to JSON since we have too many to use the Export button x. Create GH access token for Python import script x. Request GH raise our API rate limit so that script doesn't fall over so frequently 4. Re-run script to import remaining (almost all) issues ----- Cheers, Sean -- View this message in context: http://forum.world.st/Moving-Issues-to-GitHub-tp4840767.html Sent from the Moose mailing list archive at Nabble.com.

9 years, 10 months

5
25
0 0

Why Moose Inspector on a collection doesn't show its content but only its size?

by Blondeau Vincent

Hi, I would like to know why it is implemented like that: Collection>>mooseDisplayStringOn: stream stream print: self size. self isEmpty ifTrue: [ stream nextPutAll: ' items' ]. self size = 1 ifTrue: [ stream nextPutAll: ' item' ]. self size > 1 ifTrue: [ stream nextPutAll: ' items' ]. Because we get this kind of result: [cid:image006.png@01D0D397.86F21980] Which is not very easy to read... Can I change it or is there a performance issue? Cheers, Vincent BLONDEAU RMOD Team Bât B - Bureau 306 Centre de recherche Lille-Nord Europe +33 (0)3 59 35 87 45 vincent.blondeau(a)inria.fr<mailto:vincent.blondeau@inria.fr> [cid:image001.jpg@01D0D396.C12C3780] Software Architects SDCO ZI A, rue de la Pointe 59113 SECLIN +33.(0)3.28.54.41.54 vincent.blondeau(a)worldline.com<mailto:vincent.blondeau@worldline.com> [cid:image003.gif@01D0D396.C12C3780] ________________________________ Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.

9 years, 10 months

2
1
0 0

Cannot open the Setting browser

by Alexandre Bergel

Big big issue! I get an error with the BLFormatter or something. Juraj, how can I manually choose to send spotter data? I have not tried whether in Pharo 5 I have the same error or not. Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 10 months

2
1
0 0

jdt2famix works on moose 6 / pharo 5

by Tudor Girba

Hi, Thanks to Joachim, JNIPort now works on Pharo 5, and Jdt2Famix now works on Moose 6. Cheers, Doru -- www.tudorgirba.com "Every thing has its own flow"

9 years, 10 months

1
0
0 0

Basic Woden Loading tutorial

by Ronie Salgado

Hi, I started writing a basic woden loading tutorial: http://woden.ronie.cl/ Please, bear in mind that this is an early draft, so expect lot of mistakes. Greetings, Ronie

9 years, 11 months

2
1
0 0

[ann] jdt2famix - an open-source java importer project

by Tudor Girba

Hi, I would like to announce the jdt2famix project. This aims to be an open-source solution for importing Java projects into Moose: http://www.smalltalkhub.com/#!/~Moose/Jdt2Famix The project is based on: - JDT for raw parsing. This is implemented in Java. - JNIPort for delegating to Pharo the Java methods that visit the Java AST. Installation details can be found on the main project page. The current importing logic is rudimentary, but the first goal was to setup the whole ping-pong between Pharo and Java. This one works, and I am quite happy about that. You can take a look at the JdtImporterTest. CAVEATS: - Due to a problem in JNIPort, currently, this project only works in Moose 5.0. - Also, for now it works out of the box only for Mac OS X. - And, on top of that, it requires Java 1.6 for now (until we will get the Spur VM on 64 bits). There are still quite some challenges left, but once we get this going, we would also be able to use deep AST analysis live, and to do incremental model update when something changes on disk. Furthermore, if it scales, this would not be based on an intermediary MSE file anymore. I would like to ask for help in several directions: 1. Implement the full model import. This would require diving into JDT and implementing the corresponding mapping logic. I spent a few days on this. It is hairy, but it is not that impossible (only it has a ton of edge cases). This should be test driven, in that, for each case, we need to have a corresponding sample. 2. Fix JNIPort to work in Pharo 5. 3. Get the whole thing to work out of the box for Linux and Windows. 4. Check scalability. Please let me know your opinions, and let me know if you would like to participate. Cheers, Doru -- www.tudorgirba.com "Every thing has its own flow"

9 years, 11 months

3
7
0 0

how to increase memory for the jvm through jniport?

by Tudor Girba

Hi, When using JNIPort I am getting a "java.lang.OutOfMemory: Java heap space". To go around this, I would need to provide more memory to the JVM. Essentially, I would like to provide the correspondent of a command line like: java -Xmx4000m How should I do this? Cheers, Doru -- www.tudorgirba.com "Every thing has its own flow"

9 years, 11 months

1
0
0 0

Important problem due to syntax highlighting

by Alexandre Bergel

Hi! Here is a serious bug: Type (in a playground or code browser) the following: #(‘hello’) I get a violent #isNumberLiteralToken DNU on RBErrorToken Also, consider the following: The closing ) has the same color than the symbol. Which is not correct. It should have the same color than the opening ( Apparently these error do not occur in Pharo 5 https://github.com/moosetechnology/Moose/issues/1125 <https://github.com/moosetechnology/Moose/issues/1125> Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

9 years, 11 months

2
2
0 0

RTEdge class "edge construction" deprecated

by Peter Uhnák

Hi, "edge construction" protocol of RTEdge is deprecated, but I don't see by what. RTEdgeBuilder perhaps? Thanks, Peter

9 years, 11 months

2
3
0 0

interview on .Net Rocks! about Moose, humane assessment, Pharo and GT

by Tudor Girba

Hi, I had the pleasure of giving an interview for the .Net Rocks! podcast. It was quite fun and I managed to mention Moose, humane assessment, Pharo and GT in 45 minutes. It was not as smooth as a presentation but I still think it's a useful advertisement. Let me know what you think, and feel free to promote it further: https://dotnetrocks.com/default.aspx?showNum=1172 http://www.tudorgirba.com/blog/dotnetrocks-interview-moose-humane-assessmen… Cheers, Doru -- www.tudorgirba.com "Every thing has its own flow"

9 years, 11 months

5
9
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Moose-dev