A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation using First-Order Logic
David Andrzejewski, Xiaojin Zhu, Mark Craven and Benjamin Recht
Topic models have been used successfully for a variety of problems, often in the form of application-specific extensions of the basic Latent Dirichlet Allocation (LDA) model. Because deriving these new models in order to encode domain knowledge can be difficult and time-consuming, we propose the fold·all model, which allows the user to specify general domain knowledge in First-Order Logic (FOL). However, combining topic modeling with FOL can result in inference problems beyond the capabilities of existing techniques. We have therefore developed a scalable inference technique using stochastic gradient descent which may also be useful to the Markov Logic Network (MLN) research community. Experiments demonstrate the expressive power of fold·all, as well as the scalability of our proposed inference method.