Thursday, July 19, 2007

Lucene Tag Cloud Generator

I wrote a tag cloud generator for Lucene, examples of it include
The generator is written as Nutch Plugin for no good reason ;)

I build the cloud from reading the lucene index and pruning it down. It is pruned down by a junk words file which can be used to control how it gets pruned down.
Once I build the list I run a javascriipt file passing in the results, and then the javascript outputs the cloud.
There are a few files to all of this....

JavaSourceCode
The source code requires lucene. Though I wrote it as a Nutch plugin, it does not depend on Nutch.

JunkWordsFile
The junk word file contains terms, and some options.
The options are baked into the code.

The words do not support regex, they are just matched.
Options inlucde

-numbers - ignore numbers
-smallwords - skips words with three or less chars
-dashes - ignore terms with dashes
-# : comments are also supported with #

JavaScriptFile
This file converts into HTML and uses CSS to dress the cloud.

CloudCssFile
CSS file is modified off an example i found in a php tag cloud project.

3 comments:

ShaveGuild said...

Any chance to get the code, all the links to code seem to be broken?

Unknown said...

Fixed.

Anonymous said...

Some of the code links are not working... any ideas?