Hi Offray,
2015-03-31 18:17 GMT+02:00 Offray Vladimir Luna Cárdenas <offray(a)riseup.net>
:
Hi,
Following the advice of Peter Uhnák on tag clouds and avatars I made some
progress on my intended visualization. If you run the code at [1] you will
get something similar to [2] (the difference is that screenshot is for code
inside a grafoscopio document instead of a simple playground).
[1]
http://ws.stfx.eu/9G5PEGYFL1MW
[2]
http://mutabit.com/deltas/repos.fossil/datapolis/doc/
tip/Figures/personal-tagcloud.png
I will prioritize working on scrapping and cleaning the data, leaving the
position of the avatar to the end (hopefully Alexandre will read this and
in his attempt to make Roassal the best visualization engine in the
universe and its users happier, he will implement my suggestion at the end).
So in my attempt to clean the data I'm trying to process originalText
(look at [1]) to split it to single words. For that I start copying that
text and replacing any occurrence of punctuation characters and parenthesis
by spaces and then applying #splitOn: ' ' to the new string. I made this by
the chunk of code at [3], but seems inelegant and trying to use cascades
and ending in #yourself didn't make the trick.
=[3]==========================
cookedText1 := originalText.
cookedText1 := cookedText1 copyReplaceAll: ',' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ';' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: '.' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ':' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: ')' with: ' '.
cookedText1 := cookedText1 copyReplaceAll: '(' with: ' '.
==============================
So here come my questions:
a) There is any form to replace code at [3] by a more elegant
Smalltalk-ish way so I can have only words no matter if they are separated
by spaces, punctuation marks of starting/ending with parenthesis?
Did you try RxMatcher? Probably much slower, but more flexible.
cookedText1 := (RxMatcher forString:'\w+') matchesIn:originalText.
b) Why some uninteresting words like the Spanish 'La' or 'Se' are still
getting their way in the final visualization even if I try to evade them
with the code at [4]
Because your copyReplace calls only replace punctuations and not the
invisible characters like '\n'.
(The RxMatcher result does not include the line break characters, therefore
this problem shouldn't occur).
=[4]==========================
(cookedText1 splitOn: ' ') do: [:word |
((word size > 1) & (uninterestingWords includes: word asLowercase)
not) ifTrue: [cookedText2 := cookedText2, word, ' ']].
==============================
And my suggestion:
Please consider making tag clouds with variable layouts and forms. Python
has something similar with [5]
[5]
http://sebastianraschka.com/Articles/2014_twitter_wordcloud.html
Yes, that looks great.
nicolai
I will be waiting for your suggestions and thanks for keeping Pharo/Moose
awesome!
Cheers,
Offray
_______________________________________________
Moose-dev mailing list
Moose-dev(a)iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev