PRTagCloudWidget and PRVerbatim

21 Feb 2010

Hi,
I'm using a PRTagCloudWidget and I've noticed that I'm seeing "div"
appear
as popular tag - which comes from the verbatim sections of my page's
environment. I'd have thought that ideally verbatim sections shouldn't be
included in tag cloud tokenisation???

Assuming I haven't missed some simple mechanism for extracting a pages
contents without the verbatim section, I've been pondering how I'd
 implement it. I can see that by creating a visitor, say
PRTagCloudTextWriter,  and leaving the implementation
of PRTagCloudTextWriter>>visitVerbatim: blank the resulting text will be
free of any PRVerbatim objects. However I'm unsure how to wire it all
together. Currently PRTagCloudWidget reads a page contents via  PRCase
class>>descriptionDocument which uses the PRCase>>document accessor, which
give wiki text output. My initial implementation thoughts would be:
1) add a new accessor to PRCase say cloudText (as a PRTagCloudWidget
extension)
2) PRCase>>cloudText would use PRTagCloudTextWriter to extract plain text
without PRVerbatim content
3) PRTagCloudWidget, then needs a mechanism for tagging PRCase>>cloudText as
the accessor it should use, in preference to  PRCase
class>>descriptionDocument. How about introducing PRCase
class>>cloudTextdescriptionDocument  (as a PRTagCloudWidget extension),
which
PRTagCloudWidget would find and realise that it should be used in preference
to PRCase class>>descriptionDocument??

Or perhaps I've missed something really simple???

Nick

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

PRTagCloudWidget and PRVerbatim