While feature selection is a well studied problem in supervised learning, the important issue of determining what attributes of the data better reveal its cluster structure is rarely touched upon. Feature selection for clustering is a difficult because, in the absence of class labels, there is no obvious optimality criterion. In the second half of my talk I will describe two recently proposed approaches to feature selection for mixture-based clustering. One of the approaches uses a new concept of "feature saliency" which can be estimated using an EM algorithm. The second approach extends the mutual-information-based feature relevance criterion to the unsupervised learning case. The result is an algorithm which "wraps" mixture estimation in an outer layer that performs feature selection. Experiments show that both methods have promising performance.
Time and Place: Tues., Jan. 20, at 4 pm in 3609 Engr. Hall. *** NOTE SPECIAL DAY, TIME, & PLACE ***
SYSTEMS SEMINAR WEB PAGE: http://www.cae.wisc.edu/~gubner/seminar/