I wrote a tag cloud generator for Lucene, examples of it include
The generator is written as Nutch Plugin for no good reason ;)
I build the cloud from reading the lucene index and pruning it down. It is pruned down by a junk words file which can be used to control how it gets pruned down.
Once I build the list I run a javascriipt file passing in the results, and then the javascript outputs the cloud.
There are a few files to all of this....
JavaSourceCode
The source code requires lucene. Though I wrote it as a Nutch plugin, it does not depend on Nutch.
JunkWordsFile
The junk word file contains terms, and some options.
The options are baked into the code.
The words do not support regex, they are just matched.
Options inlucde
-numbers - ignore numbers
-smallwords - skips words with three or less chars
-dashes - ignore terms with dashes
-# : comments are also supported with #
JavaScriptFile
This file converts into HTML and uses CSS to dress the cloud.
CloudCssFile
CSS file is modified off an example i found in a php tag cloud project.
Work with people who have passion.
Organize, Design, Develop and Test to build great products.
Enjoy coming in every day because you respect the people you work with.
@RedLine13
Showing posts with label tag cloud. Show all posts
Showing posts with label tag cloud. Show all posts
Thursday, July 19, 2007
Tuesday, June 19, 2007
Creating a Tag Cloud
First I was very frustrated, I was attempting to figure out how to generate a 'field' cloud from a lucene index. I was googling '"tag cloud" generator lucene' and I just could not find my way. My view is google at times is losing the simplicity of it's core strength. I wasted some time, but eventually attempted to search in google's blog search and finally found a collection of "tag cloud" generators. And finally found a java cloud generator as a starting point. And that while interesting as it is fully integrated into hibernate is used to generate a cloud from the index created behind the data within your application.
So... I gave up and built it myself, but got a head start from two projects
a. Luke, Lucene Index Toolbox. Great tool for working Lucene. On the first page there is a list of terms, so I started by reviewing that code base.
b. A good PHP example with some basic CSS and leveraging of SPAN tags, also reviewed the PHP code to see how they randomize and divine strength of item within cloud.
I chose to crawl from my blog and indexing a few times -topN 2000, so not a very large crawl, but enough to generate the dataset. And from that data the cloud on top which is based on the 'content' field, and if you scroll all the way to the bottom there is a cloud based on the site field.
I will post the code once I get it cleaned up. If you need it sooner just let me know.
Here is the example cloud I am able to generate.
products software real main blog tools links source business news services search policy privacy documentation community contact service view
So... I gave up and built it myself, but got a head start from two projects
a. Luke, Lucene Index Toolbox. Great tool for working Lucene. On the first page there is a list of terms, so I started by reviewing that code base.
b. A good PHP example with some basic CSS and leveraging of SPAN tags, also reviewed the PHP code to see how they randomize and divine strength of item within cloud.
I chose to crawl from my blog and indexing a few times -topN 2000, so not a very large crawl, but enough to generate the dataset. And from that data the cloud on top which is based on the 'content' field, and if you scroll all the way to the bottom there is a cloud based on the site field.
I will post the code once I get it cleaned up. If you need it sooner just let me know.
Here is the example cloud I am able to generate.
products software real main blog tools links source business news services search policy privacy documentation community contact service view
Subscribe to:
Posts (Atom)