About This Item

Ask the MINDS@UW Librarian

Document Recovery from Bag-of-Word Indices

Show full item record

File(s):

Author(s)
Fillmore, Nathanael; Goldberg, Andrew B.; Zhu, Xiaojin
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Date
Mar 15, 2012
Abstract
Motivated by computer privacy issues, we present the novel problem of document recovery from an index: given only a document's bag-of-words (BOW) vector or other type of index, reconstruct the original ordered document. We investigate a variety of index types, including count-based BOW vectors, stopwords-removed count BOW vectors, indicator BOW vectors, and bigram count vectors. We formulate the problem as hypothesis rescoring using A* search with the Google Web 1T 5-gram corpus. Our experiments on five domains indicate that if original documents are short, the documents can be recovered with high accuracy.
Permanent link
http://digital.library.wisc.edu/1793/60654 
Export
Export to RefWorks 
‚Äč

Part of

Show full item record

Search and browse




About MINDS@UW

Deposit materials

  1. Register to deposit in MINDS@UW
  2. Need deposit privileges? Contact us.
  3. Already registered? Have deposit privileges? Deposit materials.