Tuesday, June 19, 2007

Creating a Tag Cloud

First I was very frustrated, I was attempting to figure out how to generate a 'field' cloud from a lucene index. I was googling '"tag cloud" generator lucene' and I just could not find my way. My view is google at times is losing the simplicity of it's core strength. I wasted some time, but eventually attempted to search in google's blog search and finally found a collection of "tag cloud" generators. And finally found a java cloud generator as a starting point. And that while interesting as it is fully integrated into hibernate is used to generate a cloud from the index created behind the data within your application.

So... I gave up and built it myself, but got a head start from two projects
a. Luke, Lucene Index Toolbox. Great tool for working Lucene. On the first page there is a list of terms, so I started by reviewing that code base.
b. A good PHP example with some basic CSS and leveraging of SPAN tags, also reviewed the PHP code to see how they randomize and divine strength of item within cloud.

I chose to crawl from my blog and indexing a few times -topN 2000, so not a very large crawl, but enough to generate the dataset. And from that data the cloud on top which is based on the 'content' field, and if you scroll all the way to the bottom there is a cloud based on the site field.

I will post the code once I get it cleaned up. If you need it sooner just let me know.
Here is the example cloud I am able to generate.


 products    software    real    main    blog    tools    links    source    business    news    services    search    policy    privacy    documentation    community    contact    service    view  

3 comments:

Sascha said...

Hi,

I'm currently looking for exact the same thing. I want to integrate a tagcloud based on Lucene in OpenCMS and I´m very interested in your code. Could you post your code in that "unclean" version you got at the moment?

Thanks in advance and maybe it results in an nice opencms extension module...

greets
Sascha

Paul said...

Almost a year after you posted this message and still in the process of cleaning up your code? Dude, I am glad you don't work in my projects.

Richard Friedman said...

Paul,

I totally posted a blog entry with URLs to the code and examples of using it! http://richardfriedman.blogspot.com/2007/07/lucene-tag-cloud-generator.html

The only issue I found was that when my servers were moved (http://www.osadvisors.com/twiki/bin/view.pl/TagClouds/WebHome) session login was broken on twiki. I fixed that.

Is that what you are looking for?

Rich