nmanydataanalysistasks,oneisoftenconfrontedwithveryhighdimensionaldata.Featureselectiontechniquesaredesignedtofindtherelevantfeaturesubsetoftheoriginalfeatureswhichcanfacilitateclustering,classificationandretrieval.Thefeatureselectionproblemisessentiallyacombinatorialoptimizationproblemwhichiscomputationallyexpensive.Traditionalfeatureselectionmethodsaddressthisissuebyselectingthetoprankedfeaturesbasedoncertainscorescomputedindependentlyforeachfeature.Theseapproachesneglectthepossiblecorrelationbetweendifferentfeaturesandthuscannotproduceanoptimalfeaturesubset.InspiredfromtherecentdevelopmentsonmanifoldlearningandL1-regularizedmodelsforsubsetselection,weproposehereanewapproach,called{\emMulti-Cluster/ClassFeatureSelection}(MCFS),forfeatureselection.Specifically,weselectthosefeaturessuchthatthemulti-cluster/classstructureofthedatacanbebestpreserved.Thecorrespondingoptimizationproblemcanbeefficientlysolvedsinceitonlyinvolvesasparseeigen-problemandaL1-regularizedleastsquaresproblem.ItisimportanttonotethatMCFScanbeappliedinsuperised,unsupervisedandsemi-supervisedcases.Ifyoufindthesealgoirthmsuseful,weappreciateitverymuchifyoucanciteourfollowingworks:PapersDengCai,ChiyuanZhang,XiaofeiHe,"UnsupervisedFeatureSelectionforMulti-clusterData",16thACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD'10),July2010.BibtexsourceXiaofeiHe,DengCai,andParthaNiyogi,"LaplacianScoreforFeatureSelection",AdvancesinNeuralInformationProcessingSystems18(NIPS'05),Vancouver,Canada,2005Bibtexsource
1