About This Item

Ask the MINDS@UW Librarian

TagLDA: Bringing a document structure knowledge into topic models

Show full item record

File(s):

Author(s)
Zhu, Xiaojin (Jerry); Blei, David; Lafferty, John
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Citation
TR1553
Date
2006
Abstract
Latent Dirichlet Allocation models a document by a mixture of topics, where each topic itself is typically modeled by a unigram word distribution. Documents however often have known structures, and the same topic can exhibit different word distributions under different parts of the structure. We extend latent Dirichlet allocation model by replacing the unigram word distributions with a factored representation conditioned on both the topic and the structure. In the resultant model each topic is equivalent to a set of unigrams, reflecting the structure a word is in. The proposed model is more flexible in modeling the corpus. The factored representation prevents combinatorial explosion and leads to efficient parameterization. We derive the variational optimization algorithm for the new model. The model shows improved perplexity on text and image data, but no significant accuracy improvement when used for classification.
Permanent link
http://digital.library.wisc.edu/1793/60486 
Export
Export to RefWorks 
‚Äč

Part of

Show full item record

Search and browse




About MINDS@UW

Deposit materials

  1. Register to deposit in MINDS@UW
  2. Need deposit privileges? Contact us.
  3. Already registered? Have deposit privileges? Deposit materials.