About This Item

Ask the MINDS@UW Librarian

TagLDA: Bringing a document structure knowledge into topic models

Show simple item record

File(s):

Files Size Format View
TR1553.pdf 1.968Mb application/pdf View/Open
Key Value Language
dc.contributor.author Zhu, Xiaojin en_US
dc.contributor.author Blei, David en_US
dc.contributor.author Lafferty, John en_US
dc.date.accessioned 2012-03-15T17:20:05Z
dc.date.available 2012-03-15T17:20:05Z
dc.date.created 2006 en_US
dc.date.issued 2012-03-15T17:20:05Z
dc.identifier.uri http://digital.library.wisc.edu/1793/60486
dc.description.abstract Latent Dirichlet Allocation models a document by a mixture of topics, where each topic itself is typically modeled by a unigram word distribution. Documents however often have known structures, and the same topic can exhibit different word distributions under different parts of the structure. We extend latent Dirichlet allocation model by replacing the unigram word distributions with a factored representation conditioned on both the topic and the structure. In the resultant model each topic is equivalent to a set of unigrams, reflecting the structure a word is in. The proposed model is more flexible in modeling the corpus. The factored representation prevents combinatorial explosion and leads to efficient parameterization. We derive the variational optimization algorithm for the new model. The model shows improved perplexity on text and image data, but no significant accuracy improvement when used for classification. en_US
dc.description.provenance Made available in DSpace on 2012-03-15T17:20:05Z (GMT). No. of bitstreams: 1 TR1553.pdf: 1968306 bytes, checksum: 632bc49c1091f9c91f0c7d52354e7df2 (MD5) en
dc.format.mimetype application/pdf en_US
dc.publisher University of Wisconsin-Madison Department of Computer Sciences en_US
dc.title TagLDA: Bringing a document structure knowledge into topic models en_US
dc.type Technical Report en_US

Part of

Show simple item record