Learning Inter-Related Statistical Query Translation Models for English-Chinese Bi-Directional CLIR
Yuejie Zhang, Lei Cen, Cheng Jin, Xiangyang Xue, Jianping Fan
To support more precise query translation for English-Chinese Bi-Directional Cross-Language Information Retrieval (CLIR), we have developed a novel framework by integrating a semantic network to characterize the correlations between multiple inter-related text terms of interest and learn their inter-related statistical query translation models. First, a semantic network is automatically generated from large-scale English-Chinese bilingual parallel corpora to characterize the correlations between a large number of text terms of interest. Second, the semantic network is exploited to learn the statistical query translation models for such text terms of interest. Finally, these inter-related query translation models are used to translate the queries more precisely and achieve more effective CLIR. The major difference between our approach and other existing query translation methods is that ours exploits the inter-term correlations to learn their statistical translation models simultaneously while the others treat all the query terms equally and independently. Our experiments on a large number of official public data have obtained very positive results.