Learning to Identify Review Spam
Fangtao Li, Minlie Huang, Yi Yang and Xiaoyan Zhu
In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, They may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. In our crawled review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors' products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpful voting, or rating deviation, which limit the performance of this task. In this paper, we exploit the machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. With this collection, we first analyze the effect of various features. We also observe that the review spammer consistently writes spam. This provide us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a two-view semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvement as compared with the heuristic baselines.