This paper addresses the problem of using unlabeled data in transfer learning. Specifically, we focus on transfer learning for a new unlabeled dataset using partially labeled training datasets that consist of a small number of labeled data points and a large number of unlabeled data points.
To enable transfer learning, we assume that the training and testing datasets are drawn from similar probability distributions and that the unlabeled data in each dataset can be described by similar underlying manifolds. The solution offered is a distribution free, kernel and graph Laplacian-based approach which optimizes empirical risk in the appropriate reproducing kernel Hilbert space. The approach is tested on a synthetic dataset for classification accuracy and on the Parkinson’s Telemonitoring dataset from the UCI machine learning repository for prediction accuracy.
Our results show a 27.3% improvement in miss-classification error and a 5.9% improvement in prediction error as compared to standard supervised learning algorithms. The results shown in this work can be widely applied in domains from medicine, to machine reliability, to prediction of human actions.