Badger: An Entropy-Based Web Search Clustering System with Randomization and Voting
File(s)
Date
2005Author
Wang, Lidan
Schulze, Chloe Whyte
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Metadata
Show full item recordAbstract
We have implemented and improved an entropy-based clustering algorithm. In addition to utilizing entropy as a clustering mechanism, our algorithm, Badger, uses randomization and a voting scheme to improve the quality of the resulting clusters. Using parsed web search result snippets, we have tested our algorithm and compared it against EigenCluster, a clustering meta-search engine developed
by a research group at MIT. Our algorithm performs comparably to EigenCluster, but with slightly more overhead due to the extra work of the randomization step.
We have found entropy to be a valid and interesting measure of document similarity and additionally we find it produces cohesive clusters.
Permanent Link
http://digital.library.wisc.edu/1793/60458Citation
TR1537