Cluster Indicator Decomposition for Efficient Matrix Factorization
Dijun Luo, Chris Ding and Heng Huang
We propose a new clustering based low-rank matrix approximation method, Cluster Indicator Decomposition (CID), which yields more accurate low-rank approximations than previous commonly used singular value decomposition and other Nystr¨om style decompositions. Meantime, our cluster indicator decomposition gives out more interpretable decompositions, because the basis vectors are cluster indicators. We present theoretical analysis on our new approaches with several novel error bounds. Our model utilizes the intrinsic structures of data and theoretically be more compact and accurate than the traditional low rank approximation approaches. Further more, the reconstruction of the matrices is much more efficient than any other factorization-based methods, which leads to a desirable advantage of our method in large scale kernel machines (like Support Vector Machines) in which the reconstruction of the kernels needs to be frequently computed. Experimental results indicate that our approach compress images much more efficiently than other factorization based methods. We also show that combining our method with Support Vector Machines obtains more accurate approximation and more accurate prediction while consuming much less computation resources.