Predicting Epidemic Tendency through Search Behavior Analysis
Danqing Xu
In recent years, seasonal epidemics have posed a tremendous threat to public health and caused hundreds of thousands of deaths worldwide each year. It is believed that early detection of latent patients would greatly reduce the loss. With the wide application of search engines, some social events including epidemic spreading can be traced from users’ search logs. This paper presents an effort to predict the number of future epidemic occurrences based on search behavior. In order to effectively separate the health from the latent patients, we propose two new features: epidemic related pages (ERPs) epidemic related news (ERNs). Given collected search logs, epidemic related queries, pages and news are extracted by the method of query clustering, and then they are adopted as our key features. We apply multivariate line regression to model the tendency of future epidemics on these features. In addition, a real-time system is developed to demonstrate prediction results, which dynamically predicts the number of hand-foot-and-mouth disease cases during the following week in Beijing. The result shows that our proposed features and model are effective for seasonal epidemics.