Source-Selection-Free Transfer Learning
Evan Xiang, Sinno Pan, Weike Pan, Jian Su and Qiang Yang
Transfer learning has been proposed to address the machine learning problems when there is an insufficient amount of labeled training data. Typical transfer learning approaches require one or more source datasets be given by the designers of the learning problem. However, how to identify the right source data to enable effective knowledge transfer has been an unsolved problem, which limits the applicability of many transfer learning approaches. In this paper, we propose a novel transfer learning approach that requires no specific source data be given; instead, for a given target learning task, the system can find the right subset to use as the auxiliary data from an extremely large collection on the world wide web. In our approach, which is known as source-free transfer learning (SFTL), we are given a target data set to classify, where the training data may have an insufficient amount of labeled data only. To build a classifier, SFTL turns to some very large knowledge sources such as the Wikipedia for help, by identifying a portion of the knowledge base as the potential source data. Since the labels provided by the world-wide knowledge may not match exactly with those in the target task, a translation must be done through a translator, to achieve knowledge transfer. This is done by consulting the social media which we compile into a graph Laplacian based representation. One advantage of our approach is its source-free nature; learning task users no longer need to find the necessary source data to start learning. Another advantage is scalability; unlike many previous transfer learning approaches, which are difficult to scale up to the WWW scale, our approach is highly scalable and can offset much of the training work to offline stage. We demonstrate these advantages through extensive experiments on several real-world datasets, both in terms of efficiency of learning and classification performance.