If your scripts contain string literals with '<script>' or '</script>' in
them (I've seen this before), then your mileage may vary with Tudor's
approach. Also consider that script tags may have attributes, and those
attributes may have single or double quotes. Also, script tags may or may
not refer to javascript. Many javascript libraries use script tags for HTML
template sources, for instance. These tags you'd probably want to keep (and
perhaps follow the reference for the third):
<script type='text/javascript'> [code here] </script>
<script type='text/javascript'> document.write('<script
src="somewhere.js"></script>");</script> <!-- here be dragons! -->
<script type='text/javascript' src="path/to/javascript/source.js"></script>
However, something like this you might want to ignore:
<script type='text/html' id='someTemplate'>
<span>{{some template syntax}}</span>
</script>
If you can make some assumptions about what you're parsing you might be
able to adapt Tudor's solution to be more robust. However, if you're trying
for a general-purpose solution, I'd highly recommend using an existing HTML
parsing library, not an XML parser.
In general, parsing HTML as XML is the wrong approach. HTML is technically
not a subset of XML (closing tags aren't required), so most true XML
parsers are going to barf on it.
Some further reading:
https://en.wikipedia.org/wiki/Tag_souphttps://en.wikipedia.org/wiki/HTML5#XHTML5_.28XML-serialized_HTML5.29
I'm new to smalltalk so I can't recommend a library, but in Java I've used
Tag Soup and I've used Beautiful Soup in Python.
Hope this helps,
Floyd
On Fri, Aug 14, 2015 at 9:40 AM, Tudor Girba <tudor(a)tudorgirba.com> wrote:
Hi,
>
> You can also consider using island parsing, this very cool addition to
> PetitParser developed by Jan:
>
> beginScript := '<script>' asParser.
> endScript := '</script>' asParser.
> script := beginScript , endScript negate star flatten , endScript ==>
> #second.
> islandScripts := (script island ==> #second) star.
>
> If you apply it on:
>
> code := 'uninteresting part
> <script>
> some code
> </script>
> another
> uninteresting part
> <script>
> some other
> code
> </script>
> yet another
> uninteresting part
> '.
>
> You get:
> islandScripts parse: code
> ==> "#('some code' 'some other
> code')"
>
> Quite cool, no? :)
>
> Doru
>
>
> On Fri, Aug 14, 2015 at 1:31 AM, Alexandre Bergel <alexandre.bergel(a)me.com
> > wrote:
>
>
> Hi!
>>
>>>
>> Together with Nicolas we are trying to get all the <script …> … </script>
>> from html files.
>>
>>> We have tried to use XMLDOMParser, but many webpages are actually not
>> well formed, therefore the parser is complaining.
>>
>>>
>> Anyone has tried to get some particular tags from HTML files? This looks
>> like a classical thing to do. Maybe some of you have done it.
>>
>>> Is there a way to configure the parser to accept a broken XML/HTML
>> content?
>>
>>>
>> Cheers,
>>
>>> Alexandre
>>
>>> --
>>
>>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>>
>>> Alexandre Bergel http://www.bergel.eu
>>
>>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>>
>>>
>>
>>
>>
>>
> --
> www.tudorgirba.com
>
> "Every thing has its own flow"
>
> _______________________________________________
>
> Moose-dev mailing list
>
> Moose-dev(a)iam.unibe.ch
>
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
>
>
Hi!
I see the project: github.com/jig08/sQuick_new
Can I use it as a replacement of Spotlight on OS X? Is it means to replace it?
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
You will probably have more chances to get a response on the Moose mailing
list.
Cheers,
Doru
On Sun, Aug 16, 2015 at 5:58 PM, Holger Freyther <holger(a)freyther.de> wrote:
> Hi,
>
> once again I am not sure if this is the right list. The first parser I
> wrote using
> PetitParser was a SIP (and then MGCP) parser. I have recently ported[1] the
> code to Pharo and with Pharo it is very tempting to Use
> BlockClosure>>#bench
> to get an idea of the speed.
>
>
> I have two performance “issues” and wonder if others hand similar issues
> with
> PetitParser and if there is a general approach to this.
>
>
>
> 1.) Combining two PPCharsetPredicates does not combine the “classification”
> table it had. One could create a PPPredicateObjectParser subclass that is
> special casing >>#/ to build a combined classification table.
>
>
> 2.) When blindly following a BNF enumeration of "A or B or C or D or E
> or CatchAll” and each “A, B” follow common pattern (e.g. token COLON value)
> one pays a high cost in the backtracking and constructing the PPFailure for
> each failed case.
>
> In my SIPGrammar I have action parsers for To ==>.. From ==> and would
> like to keep that. At the same time I would be happy if the token in front
> of the
> colon is only consumed once and then delegated to the right parser and if
> that
> one failed use the ‘catch all’ one.
>
> I don’t know which abstraction would be needed to allow creating optimized
> PetitParsers for such grammars.
>
> sorry for the long mail, long details and context is below.
>
>
> kind regards
> holger
>
>
>
>
>
>
> Full details:
>
>
> 1.) CharSetPredicate
>
> | aParser bParser combinedParser aTime bTime cTime |
>
> aParser := #digit asParser.
> bParser := #letter asParser.
> combinedParser := aParser / bParser.
>
> aTime := [ aParser parse: 'b'] bench.
> bTime := [ bParser parse: 'b'] bench.
> cTime := [ combinedParser parse: 'b'] bench.
> { aTime. bTime. cTime }
>
> cTime is bounded by the time execution time of of the slowest
> of these parsers + overhead (e.g. PPFailure creation).
>
> e.g.
>
> #('559,000 per second.' '1,010,000 per second.' '429,000 per second.')
>
> With a proof of concept PPPredicateCharSetParser
>
> #('1,330,000 per second.' '1,550,000 per second.' '1,580,000 per second.’)
>
> The noise is pretty string here but what is important is that bParser and
> the
> combinedParser are in the same ballpark.
>
> 2.) Choice Parser
>
>
>
> The BNF grammar of the parser is roughly:
>
> Request = Request-Line
> *( message-header )
> CRLF
> [ message-body ]
>
> message-header = (Accept
> …
> / To
> / From
> / Via
> / extension-header) CRLF
>
> Alert-Info = "Alert-Info" HCOLON alert-param *(COMMA alert-param)
> Accept = "Accept" HCOLON
> [ accept-range *(COMMA accept-range) ]
>
>
> So there can be several lines of “message-header”. And each method header
> starts with a token/word, a colon and then the parameter.
> “extension-header”
> is kind of a catch all if no other rule matched. E.g. if a client sends a
> To which is
> wrongly encoded it would end up with the extension-header.
>
> I transferred the above naively to PetitParser and end up with something
> like
> parsing ~500 messages a second. The main cost appears to come from the
> choice parser that needs to create a PetitFailure all the time. E.g. if
> you have a
> line like this:
>
> ‘From: “Holger Freyther” <sip:323234@foo.de>’
>
> The choice parser will start with the “Accept” rule, parse the token
> (“From” and
> then create a PPFailure, then … rules, then “To”, parse the token.. So we
> have
> parsing the same token more than once and creating PPFailures all the
> time. I
> ended up creating a custom parser that will peek the token, have a
> horrible chain
> of token = ‘XYZ’ ifTrue and then dispatch to the other rule.
>
> It would be nice if PetitParser could be taught to only parse the token
> once and
> then delegate to the param rule. E.g. a PPAnyOfParser that allows to
> specify the
> token to match, the parser to continue with and a fallback parser?
>
>
>
> [1]
> http://smalltalkhub.com/#!/~osmocom/SIP
> http://smalltalkhub.com/#!/~osmocom/MGCP
>
--
www.tudorgirba.com
"Every thing has its own flow"
Hi!
I have noticed a slowdown in moose. Opening a menu is now particularly slow. Here is a
https://dl.dropboxusercontent.com/u/31543901/TMP/slowdown.mov
I have just tried on Pharo 5, and it looks like to have a similar problem.
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
Hi!
I am a bit worried.
The jenkins for Moose 6.0 has not been green for ages.
I am using a fresh 6.0, and I get error when tracing in the debugger:
-=-=-=-=-=-=-=-=-=-=-=-=
DebuggerMethodMapOpal >> tempNamesForContext: aContext
"Answer an Array of all the temp names in scope in aContext starting with
the home's first local (the first argument or first temporary if no arguments)."
^ aContext sourceNode scope allTempNames.
-=-=-=-=-=-=-=-=-=-=-=-=
#scope is sent to nil. A good starting point would be to have the jenkins green back. Having it yellow for too long is not constructive.
cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
Hi!
Milton worked on a StackPlot builder.
This is currently highly prototypal.
Inspect the expression:
RTExperimentalExample new exampleStackOnRoassal
It gives something like, which shows the amount of code subclasses of RTShape:
Pretty cool!
Many other examples are contained in the class RTExperimentalExample
It is worth having a look at them.
And yes, it is exportable to HTML.
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
How is this possible?
aCollection is an array with 1 element (an AdaParameter ...)
but each in the do: block contains nil (so the add: gives a DNU) !?!?!?
I have no idea how this can be possible
Any clue
nicolas
Hi!
I am considering in using OSProcess to run an external application.
I have tried:
OSProcess waitForCommand: 'ls -l’
but this expression never return a value.
I also tried:
OSProcess command: 'ls -l /etc’
But it returns an ExternalUnixOSProcess. I am not sure what I can do with this. How can I get the result of the unix command?
Sorry if these questions looks naive :-)
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
Hi!
Together with Nicolas we are trying to get all the <script …> … </script> from html files.
We have tried to use XMLDOMParser, but many webpages are actually not well formed, therefore the parser is complaining.
Anyone has tried to get some particular tags from HTML files? This looks like a classical thing to do. Maybe some of you have done it.
Is there a way to configure the parser to accept a broken XML/HTML content?
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
Hi,
Look at the class side, there is the method parse: namespace: validation: . call this method instead of parse: with false in the two last arguments. It should work.
Anyway, you should use the sax parser. It is faster and memory less consuming. It is very simple to get only one tag.
Cheers
Vincent
Le 14 août 2015 01:31, Alexandre Bergel <alexandre.bergel(a)me.com> a écrit :
>
> Hi!
>
> Together with Nicolas we are trying to get all the <script …> … </script> from html files.
> We have tried to use XMLDOMParser, but many webpages are actually not well formed, therefore the parser is complaining.
>
> Anyone has tried to get some particular tags from HTML files? This looks like a classical thing to do. Maybe some of you have done it.
> Is there a way to configure the parser to accept a broken XML/HTML content?
>
> Cheers,
> Alexandre
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
> _______________________________________________
> Moose-dev mailing list
> Moose-dev(a)iam.unibe.ch
> https://www.iam.unibe.ch/mailman/listinfo/moose-dev
Hi!
Am I the only one to experience image freeze? Especially when I load from Monticello.
cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
Hi!
Roassal has a new builder called RTDSM.
It is currently very basic. Here is an example:
dsm := RTDSM new.
dsm objects: RTShape withAllSubclasses.
dsm dependency: #dependentClasses.
produces the following:
Are there some algorithms to find an optimal ordering of the elements to display?
Here is a slightly bigger matrix
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
In progress:
x. Export Google Issues to JSON since we have too many to use the Export
button
x. Create GH access token for Python import script
x. Request GH raise our API rate limit so that script doesn't fall over so
frequently
4. Re-run script to import remaining (almost all) issues
-----
Cheers,
Sean
--
View this message in context: http://forum.world.st/Moving-Issues-to-GitHub-tp4840767.html
Sent from the Moose mailing list archive at Nabble.com.
Hi,
I would like to know why it is implemented like that:
Collection>>mooseDisplayStringOn: stream
stream print: self size.
self isEmpty
ifTrue: [ stream nextPutAll: ' items' ].
self size = 1
ifTrue: [ stream nextPutAll: ' item' ].
self size > 1
ifTrue: [ stream nextPutAll: ' items' ].
Because we get this kind of result:
[cid:image006.png@01D0D397.86F21980]
Which is not very easy to read... Can I change it or is there a performance issue?
Cheers,
Vincent BLONDEAU
RMOD Team
Bât B - Bureau 306
Centre de recherche
Lille-Nord Europe
+33 (0)3 59 35 87 45
vincent.blondeau(a)inria.fr<mailto:vincent.blondeau@inria.fr>
[cid:image001.jpg@01D0D396.C12C3780]
Software Architects
SDCO
ZI A, rue de la Pointe
59113 SECLIN
+33.(0)3.28.54.41.54
vincent.blondeau(a)worldline.com<mailto:vincent.blondeau@worldline.com>
[cid:image003.gif@01D0D396.C12C3780]
________________________________
Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis.
This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.
Big big issue! I get an error with the BLFormatter or something.
Juraj, how can I manually choose to send spotter data?
I have not tried whether in Pharo 5 I have the same error or not.
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
Hi,
Thanks to Joachim, JNIPort now works on Pharo 5, and Jdt2Famix now works on
Moose 6.
Cheers,
Doru
--
www.tudorgirba.com
"Every thing has its own flow"
Hi,
I started writing a basic woden loading tutorial: http://woden.ronie.cl/
Please, bear in mind that this is an early draft, so expect lot of mistakes.
Greetings,
Ronie
Hi,
I would like to announce the jdt2famix project. This aims to be an
open-source solution for importing Java projects into Moose:
http://www.smalltalkhub.com/#!/~Moose/Jdt2Famix
The project is based on:
- JDT for raw parsing. This is implemented in Java.
- JNIPort for delegating to Pharo the Java methods that visit the Java AST.
Installation details can be found on the main project page.
The current importing logic is rudimentary, but the first goal was to setup
the whole ping-pong between Pharo and Java. This one works, and I am quite
happy about that. You can take a look at the JdtImporterTest.
CAVEATS:
- Due to a problem in JNIPort, currently, this project only works in Moose
5.0.
- Also, for now it works out of the box only for Mac OS X.
- And, on top of that, it requires Java 1.6 for now (until we will get the
Spur VM on 64 bits).
There are still quite some challenges left, but once we get this going, we
would also be able to use deep AST analysis live, and to do incremental
model update when something changes on disk. Furthermore, if it scales,
this would not be based on an intermediary MSE file anymore.
I would like to ask for help in several directions:
1. Implement the full model import. This would require diving into JDT and
implementing the corresponding mapping logic. I spent a few days on this.
It is hairy, but it is not that impossible (only it has a ton of edge
cases). This should be test driven, in that, for each case, we need to have
a corresponding sample.
2. Fix JNIPort to work in Pharo 5.
3. Get the whole thing to work out of the box for Linux and Windows.
4. Check scalability.
Please let me know your opinions, and let me know if you would like to
participate.
Cheers,
Doru
--
www.tudorgirba.com
"Every thing has its own flow"
Hi,
When using JNIPort I am getting a "java.lang.OutOfMemory: Java heap space".
To go around this, I would need to provide more memory to the JVM.
Essentially, I would like to provide the correspondent of a command line
like:
java -Xmx4000m
How should I do this?
Cheers,
Doru
--
www.tudorgirba.com
"Every thing has its own flow"
Hi!
Here is a serious bug: Type (in a playground or code browser) the following: #(‘hello’)
I get a violent #isNumberLiteralToken DNU on RBErrorToken
Also, consider the following:
The closing ) has the same color than the symbol. Which is not correct. It should have the same color than the opening (
Apparently these error do not occur in Pharo 5
https://github.com/moosetechnology/Moose/issues/1125 <https://github.com/moosetechnology/Moose/issues/1125>
Cheers,
Alexandre
--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.