A Wikipedia based Semantic Graph Model for Topic Tracking in Blogsphere
Jintao Tang, ting wang, Qin Lu and ji wang
There are two key issues for information diffusion in blog world: (1) the blog posts usually are short, noisy and multi-theme, (2) the information diffusion through blogosphere is primary driven by the “word-of-mouth” effect, so the topics discussed in blogosphere usually shifted very fast. This paper present a novel topic tracking method to address these issues by modeling the topic as an semantic graph, in which the semantic relatednesses between keywords are learned from Wikipedia. For a given blog post, the name entities, Wikipedia concepts, and the relatednesses between them have been extracted to generate the graph representation. Then the noise terms and irrelevant topics have been filtered out through the graph-clustering algorithm. To overcome the challenge of topic evolution, the topic model has been enriched by using the Wikipedia as background graph. Moreover, the graph edit distance has been adopted to measure the semantic similarity between the topic model and the post graphs. The effectiveness of the proposed method is tested experimentally using real-word blog data. Experimental results show the advantage of the smantic graph model based on Wikipedia when tracking the topic in short, noisy blog posts.