We study the problem of supervised linear dimensionality reduction, taking an information-theoretic viewpoint. The linear projection matrix is designed by maximizing the mutual information between the projected signal and the class label. By harnessing a recent theoretical result on the gradient of mutual information, the above optimization problem can be solved directly using gradient descent, without requiring simplification of the objective function. Theoretical analysis and empirical comparison are made between the proposed method and two closely related methods, and comparisons are also made with a method in which Rényi entropy is used to define the mutual information (in this case the gradient may be computed simply, under a special parameter setting). Relative to these alternative approaches, the proposed method achieves promising results on real datasets. Copyright 2012 by the author(s)/owner(s).
|Original language||English (US)|
|Title of host publication||Proceedings of the 29th International Conference on Machine Learning, ICML 2012|
|Number of pages||8|
|State||Published - Oct 10 2012|