On 21 February 2010 15:26, Nick Ager <nick.ager(a)gmail.com> wrote:
Hi,
I'm using a PRTagCloudWidget and I've noticed that I'm seeing "div"
appear
as popular tag - which comes from the verbatim sections of my page's
environment. I'd have thought that ideally verbatim sections shouldn't be
included in tag cloud tokenisation???
I think the PRTextWriter should probably not include the verbatim
text. I'll remove it from there.
Lukas
Assuming I haven't missed some simple mechanism
for extracting a pages
contents without the verbatim section, I've been pondering how I'd
implement it. I can see that by creating a visitor, say
PRTagCloudTextWriter, and leaving the implementation
of PRTagCloudTextWriter>>visitVerbatim: blank the resulting text will be
free of any PRVerbatim objects. However I'm unsure how to wire it all
together. Currently PRTagCloudWidget reads a page contents via PRCase
class>>descriptionDocument which uses the PRCase>>document accessor, which
give wiki text output. My initial implementation thoughts would be:
1) add a new accessor to PRCase say cloudText (as a PRTagCloudWidget
extension)
2) PRCase>>cloudText would use PRTagCloudTextWriter to extract plain text
without PRVerbatim content
3) PRTagCloudWidget, then needs a mechanism for tagging PRCase>>cloudText as
the accessor it should use, in preference to PRCase
class>>descriptionDocument. How about introducing PRCase
class>>cloudTextdescriptionDocument (as a PRTagCloudWidget extension),
which
PRTagCloudWidget would find and realise that it should be used in preference
to PRCase class>>descriptionDocument??
Or perhaps I've missed something really simple???
Nick
_______________________________________________
Magritte, Pier and Related Tools ...
https://www.iam.unibe.ch/mailman/listinfo/smallwiki
--
Lukas Renggli
http://www.lukas-renggli.ch