Feature Selection in k-Median Clustering
Abstract
An e ective method for selecting features in clustering
unlabeled data is proposed based on changing the objective
function of the standard k-median clustering algorithm. The
change consists of perturbing the objective function by a
term that drives the medians of each of the k clusters toward
the (shifted) global median of zero for the entire dataset.
As the perturbation parameter is increased, more and more
features are driven automatically toward the global zero
median and are eliminated from the problem until one last
feature remains. An error curve for unlabeled data clustering
as a function of the number of features used gives reducedfeature
clustering error relative to the \gold standard" of the
full-feature clustering. This clustering error curve parallels
a classi cation error curve based on real data labels. This
justi es the utility of the former error curve for unlabeled
data as a means of choosing an appropriate number of
reduced features in order to achieve a correctness comparable
to that obtained by the full set of original features. For
example, on the 3-class Wine dataset, clustering with 4
selected input space features is comparable to within 4%
to clustering using the original 13 features of the problem.
Subject
regularization
centered data
non-smooth optimization
feature selection
k-median
clustering
Permanent Link
http://digital.library.wisc.edu/1793/64328Citation
04-01