In this paper, we revisited the classical technique of Regularized Least Squares (RLS) for the classification of large-scale nonlinear data. Specifically, we focus on a low-rank formulation of RLS and show that it has linear time complexity in the data size only and does not rely on the number of labels and features for problems with moderate feature dimension. This makes low-rank RLS particularly suitable for classification with large data sets. Moreover, we have proposed a general theorem for the closed-form solutions to the Leave-One-Out Cross Validation (LOOCV) estimation problem in empirical risk minimization which encompasses all types of RLS classifiers as special cases. This eliminates the reliance on cross validation, a computationally expensive process for parameter selection, and greatly accelerate the training process of RLS classifiers. Experimental results on real and synthetic large-scale benchmark data sets have shown that low-rank RLS achieves comparable classification performance while being much more efficient than standard kernel SVM for nonlinear classification. The improvement in efficiency is more evident for data sets with higher dimensions.
The increasing number of classification applications in large data sets demands that efficient classifiers be designed not only in training but also for prediction. In this paper, we address the problem of learning kernel classifiers with reduced complexity and improved efficiency for prediction in comparison to those trained by standard methods. A single optimisation problem is formulated for classifier learning which optimises both classifier weights and eXpansion Vectors (XVs) that define the classification function in a joint fashion. Unlike the existing approach of Wu et al, which performs optimisation in the dual formulation, our approach solves the primal problem directly. The primal problem is much more efficient to solve, as it can be converted to the training of a linear classifier in each iteration, which scales linearly to the size of the data set and the number of expansions. This makes our primal approach highly desirable for large-scale applications, where the dual approach is inadequate and prohibitively slow due to the solution of cubic-time kernel SVM involved in each iteration. Experimental results have demonstrated the efficiency and effectiveness of the proposed primal approach for learning sparse kernel classifiers that clearly outperform the alternatives.