Scalable Anonymization Algorithms for Large Data Sets
File(s)
Date
2007Author
LeFevre, Kristen
DeWitt, David
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Metadata
Show full item recordAbstract
k-Anonymity is a widely-studied mechanism for protecting identity
when distributing non-aggregate personal data. This basic mechanism
can also be extended to protect an individual-level sensitive attribute.
Numerous algorithms have been developed in recent years
for generalizing, clustering, or otherwise manipulating data to satisfy
one or more anonymity requirements. However, few have considered
large-scale input data sets that do not fit in main memory.
This paper proposes two techniques for incorporating (external)
scalability into an existing algorithmic framework. The first technique
is based on ideas from scalable decision tree construction,
and the second technique is based on sampling. In both cases, the
resulting algorithms are guaranteed to produce output data that satisfies
the given anonymity requirements. We evaluate the performance
of each algorithm both analytically and experimentally.
Permanent Link
http://digital.library.wisc.edu/1793/60548Citation
TR1590