Show simple item record

dc.contributor.authorWild, Edward
dc.contributor.authorMangasarian, Olvi
dc.description.abstractAn e ective method for selecting features in clustering unlabeled data is proposed based on changing the objective function of the standard k-median clustering algorithm. The change consists of perturbing the objective function by a term that drives the medians of each of the k clusters toward the (shifted) global median of zero for the entire dataset. As the perturbation parameter is increased, more and more features are driven automatically toward the global zero median and are eliminated from the problem until one last feature remains. An error curve for unlabeled data clustering as a function of the number of features used gives reducedfeature clustering error relative to the \gold standard" of the full-feature clustering. This clustering error curve parallels a classi cation error curve based on real data labels. This justi es the utility of the former error curve for unlabeled data as a means of choosing an appropriate number of reduced features in order to achieve a correctness comparable to that obtained by the full set of original features. For example, on the 3-class Wine dataset, clustering with 4 selected input space features is comparable to within 4% to clustering using the original 13 features of the problem.en
dc.subjectcentered dataen
dc.subjectnon-smooth optimizationen
dc.subjectfeature selectionen
dc.titleFeature Selection in k-Median Clusteringen
dc.typeTechnical Reporten

Files in this item


This item appears in the following Collection(s)

  • DMI Technical Reports
    DMI Technical Reports Archive for the Department of Computer Sciences at the University of Wisconsin-Madison

Show simple item record