Similarity-Based Approach for Positive and Unlabeled learning
Yanshan Xiao, Bo Liu, Jie Yin, Longbing Cao and Chengqi Zhang
Positive and unlabeled learning (PU learning) has been investigated to deal with the situation where the only positive examples and unlabeled examples are available. Most of the previous work identify negative examples from the unlabeled data so that the supervised learning methods can be applied to build a classifier. For the remaining unlabeled data, which can not be identified as positive or negative (we call them ambiguous examples here), they either exclude them out of the training or just treat them as positive class or negative class in the training. Consequently, their performance will be constrained. This paper proposes a novel method, called as Similarity-Based PU learning (SPUL), to address ambiguous examples by assigning each instance similarity-weights which denote the belongingness of it towards positive and negative class. We then put forward local similarity-based and global similarity-based mechanisms to generate similarity-weights. These ambiguous examples and their similarity-weights are thereafter incorporated into an SVM-based learning phase to build a more accurate classifier. Extensive experiments on 32 sub-datasets have shown that SPUL outperforms state-of-the-art PU learning methods.