End-Biased Samples for Join Cardinality Estimation

File(s)
Date
2005Author
Estan, Cristian
Naughton, Jeffrey F.
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Metadata
Show full item recordAbstract
We present a new technique for using samples to estimate join cardinalities. This technique, which we term �end-biased samples,� is inspired by recent work in network traffic measurement. It improves on random samples by using coordinated pseudo-random samples and retaining the sampled values in proportion to their frequency. We show that end-biased samples always provide more accurate
estimates than random samples with the same sample size. The comparison with histograms is more interesting � while end-biased histograms are somewhat better than end-biased samples for uncorrelated data sets, end-biased samples dominate by a large margin when the data is correlated. Finally, we compare end-biased samples to the recently proposed �skimmed sketches� and show that neither dominates the other, that each has different and compelling strengths and weaknesses. These results suggest that endbiased samples may be a useful addition to the repertoire of techniques used for data summarization.
Permanent Link
http://digital.library.wisc.edu/1793/60464Type
Technical Report
Citation
TR1541