Show simple item record

dc.contributor.authorShen, Warrenen_US
dc.contributor.authorDeRose, Pedroen_US
dc.contributor.authorVu, Longen_US
dc.contributor.authorDoan, AnHaien_US
dc.contributor.authorRamakrishnan, Raghuen_US
dc.date.accessioned2012-03-15T17:20:18Z
dc.date.available2012-03-15T17:20:18Z
dc.date.created2006en_US
dc.date.issued2006en_US
dc.identifier.citationTR1559en_US
dc.identifier.urihttp://digital.library.wisc.edu/1793/60494
dc.description.abstractEntity matching (a.k.a. record linkage) plays a crucial role in integrating multiple data sources, and numerous matching solutions have been developed. However, the solutions have largely exploited only information available in the mentions and employed a single matching technique. We show how to exploit\ information about data sources to significantly improve matching accuracy. In particular, we observe that different sources often vary substantially in their level of semantic ambiguity, thus requiring different matching techniques. In addition, it is often beneficial to group and match mentions in related sources first, before considering other sources. These observations lead to a large space of matching strategies, analogous to the space of query evaluation plans considered by a relational optimizer. We propose viewing entity matching as a composition of basic steps into a ?match execution plan?. We analyze formal properties of the plan space, and show how to find a good match plan. To do so, we employ ideas from social network analysis to infer the ambiguity and relatedness of data sources. We conducted extensive experiments on several real-world data sets on the Web and in the domain of personal information management (PIM). The results show that our solution significantly outperforms current best matching methods.en_US
dc.format.mimetypeapplication/pdfen_US
dc.publisherUniversity of Wisconsin-Madison Department of Computer Sciencesen_US
dc.titleSource-Aware Entity Matching: A Compositional Approachen_US
dc.typeTechnical Reporten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • CS Technical Reports
    Technical Reports Archive for the Department of Computer Sciences at the University of Wisconsin-Madison

Show simple item record