Missing Data and Statistical Model Estimation
Benjamin M. Marlin, Richard S. Zemel, Sam T. Roweis, Malcolm Slaney
This work describes what we believe to be the most important problem in rating-based collaborative filtering: a fundamental incompatibility between the properties of collaborative filtering data sets and the assumptions required for valid estimation and evaluation of statistical models in the presence of missing data. This problem impacts essentially all known rating-based collaborative filtering models and methods. We discuss the implications of this problem and describe extended modelling and evaluation frameworks for circumventing it. We present rating prediction and ranking results showing that models developed and tested under these alternative frameworks significantly out-perform standard models on both rating prediction and ranking tasks.