Semi-Supervised Sparse Metric Lea...
Semi-Supervised Sparse Metric Learning Using Alternating Linearization Optimization Wei Liu Columbia University New York, NY, USA wl2223@columbia.edu Shiqian Ma Columbia University New York, NY, USA sm2756@columbia.edu Dacheng Tao Nanyang Technological University Singapore dctao@ntu.edu.sg Jianzhuang Liu The Chinese University of Hong Kong Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, China jzliu@ie.cuhk.edu.hk Peng Liu Barclays Capital New York, NY, USA liup1024@gmail.com ABSTRACT In plenty of scenarios, data can be represented as vectors and then mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining applications need proximity measures over data, a simple and universal distance metric is desirable, and met- ric learning methods have been explored to produce sensible distance measures consistent with data relationship. How- ever, most existing methods suffer from limited labeled data and expensive training. In this paper, we address these two issues through employing abundant unlabeled data and pur- suing sparsity of metrics, resulting in a novel metric learning approach called semi-supervised sparse metric learning. Two important contributions of our approach are: 1) it propa- gates scarce prior affinities between data to the global scope and incorporates the full affinities into the metric learning and 2) it uses an efficient alternating linearization method to directly optimize the sparse metric. Compared with con- ventional methods, ours can effectively take advantage of semi-supervision and automatically discover the sparse met- ric structure underlying input data patterns. We demon- strate the efficacy of the proposed approach with extensive experiments carried out on six datasets, obtaining clear per- formance gains over the state-of-the-arts. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications- Data Mining H.3 [Information Storage and Retrieval]: Information Search and Retrieval Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD���10, July 25���28, 2010, Washington, DC, USA. Copyright 2010 ACM 978-1-4503-0055-1/10/07 ...$10.00. General Terms Algorithms Keywords Metric learning, semi-supervised sparse metric learning, sparse inverse covariance estimation, alternating linearization 1. INTRODUCTION Vectored data frequently occur in a variety of fields, which are easy to handle since they can be mathematically ab- stracted as points residing in a Euclidean space. An appro- priate distance metric in this space spanned by input data vectors is quite demanding for a great number of applications including classification, clustering and retrieval. Two most commonly used distance metrics are Euclidean distance and Mahalanobis distance, of which the former is independent of the input data while the latter is related to second-order statistics of the input data. In practice, we need to seek distance metrics suitable for the requirements of different tasks. In the context of classification, distance metrics are fre- quently applied in concert with kNN classifiers. As such, the goal of metric learning towards kNN classification is to keep the distances among nearby points as small as possible and push the differently labeled neighbors out of the neighbor- hood of any of these points. Neighbourhood Components Analysis (NCA) [11] and its seminal work Maximally Col- lapsing Metric Learning (MCML) [10] emphasize that the target metric should support tight neighborhoods and even reach zero distances within all neighborhoods. Like the no- tion of SVMs, Large Margin Nearest Neighbor classification (LMNN) [28] not only narrows the distance gap within all neighborhoods, but also maximizes the soft margin of dis- tances over each neighborhood. As for clustering, metric learning usually cooperates with constrained clustering, namely, semi-supervised clustering [26][3][15] where some background knowledge about data proximities is given beforehand. Specifically, two kinds of 1139
pairwise links, i.e., must-links and cannot-links, are given. The must-links indicate that two data points must be in the same cluster, while the cannot-links require that two data points not be grouped into the same cluster. Therefore, the purpose of metric learning applied to semi-supervised clus- tering is to minimize the distances associated with must- links and simultaneously maximize the distances associated with cannot-links. There have been some works [29][4][8][25] which are engaged in learning metrics towards better clus- terings. In the field of content-based image retrieval (CBIR), choos- ing appropriate distance metrics plays a key role in establish- ing effective systems. Regular CBIR systems usually adopt the Euclidean distance measure for images represented in a vector form. Unfortunately, Euclidean distance is gener- ally not effective enough in retrieving relevant images. A main reason stems from the well-known semantic gap be- tween low-level visual features and high-level semantic con- cepts [24]. The commonly used relevance feedback scheme [23] may remedy the semantic gap issue, which produces, aided by users, a set of pairwise constraints about relevance (similarity) or irrelevance (dissimilarity) between two im- ages. These constraints along with involved image examples are called log data. Then the key to CBIR is to find an effective way of utilizing the log data in relevance feedback so that the semantic gap can be successfully reduced. A lot of ways have been studied to utilize the log data to boost the performance of CBIR. In particular, one can use a met- ric learning technique devoted to semi-supervised cluster- ing for tackling CBIR since these relevance constraints are essentially must-links and cannot-links. The recent works [2][14][18] have recommended learning proper distance met- rics for image retrieval tasks. In this paper, we pose metric learning under the semi- supervised setting where only a few pairwise constraints in- cluding similar and dissimilar exist and most data instances are not involved in such constraints. We propose a novel metric learning technique called semi-supervised sparse met- ric learning (S3ML). The major features of S3ML include: 1) it is capable of propagating scarce pairwise constraints to all data pairs 2) it generates a sparse metric matrix which coincides with the sparse spirit of feature correlations in the high-dimensional feature space and 3) it is quite efficient by using the alternating linearization method in contrast to existing metric learning approaches using expensive opti- mizations such as semidefinite programming. The proposed S3ML technique has widespread applicability without be- ing limited to particular backgrounds. Quantitative exper- iments are performed for classification and retrieval tasks, uncovering the effectiveness and efficiency of S3ML. The remainder of this paper is arranged as follows. Section 2 reviews the related work on recent metric learning. Section 3 describes and addresses the semi-supervised metric learn- ing problem. Section 4 presents the S3ML algorithm by us- ing the alternating linearization optimization method. Sec- tion 5 validates the efficacy of the proposed S3ML through extensive experiments. Section 6 includes conclusions. 2. RELATED WORK In recent years, there are some emerging research inter- ests in learning data representations in some intrinsic low- dimensional space embedded in the ambient high-dimensional space such that regular Euclidean distance is more meaning- ful in the low-dimensional space. The early efforts are learn- ing linear representations by Principal Component Analy- sis (PCA) and learning nonlinear representations by mani- fold learning. However, these methods are unsupervised and loosely related to the distance outcome. This paper inves- tigates distance metric learning which is vital to a lot of machine learning and data mining applications. The recent metric learning research can be classified into three main categories as follows. 2.1 Supervised Metric Learning The first category is supervised metric learning approaches for classification where distance metrics are usually learned from training data associated with explicit class labels. The representative techniques include Neighbourhood Compo- nents Analysis (NCA) [11], Maximally Collapsing Metric Learning (MCML) [10], and metric learning for Large Mar- gin Nearest Neighbor classification (LMNN) [28]. Neverthe- less, the performance of these supervised approaches rests highly on the amount of labeled data that are often prac- tically difficult and expensive to gather. Moreover, all of them request nontrivial optimizations such as semidefinite programming [5], which is inefficient for real-world datasets. 2.2 Weakly Supervised Metric Learning Our work is closer to the second category of weakly su- pervised metric learning which learns distance metrics from pairwise constraints present in input data, or known as side information [29]. It is manifest that such side informa- tion is weaker than exact label information. In detail, each constraint indicates whether two data points are relevant (similar) or irrelevant (dissimilar) in a particular learning task. A well-known metric learning method with these con- straints was proposed by Xing et al. [29], which casts the learning task into a convex optimization problem and ap- plies the generated solution to data clustering. Following this work, there are several emerging metric learning tech- niques in the ���weakly supervised��� direction. For instance, Relevance Component Analysis (RCA) learns a global lin- ear transformation by exploiting only the equivalent (rele- vant) constraints [2]. Recently, Information-Theoretic Met- ric Learning (ITML) [8][7] expresses the weakly supervised metric learning problem as a Bregman optimization prob- lem where the pairwise constraints are treated as inequality constraints. 2.3 Sparse Metric Learning In [21], to speedup the training time of metric learning, ���1 regularization is incorporated into the original non-sparse metric learning objective, resulting in a much faster learning procedure: Sparse Distance Metric Learning (SDML). Be- sides, the sparsity of desirable metric matrices makes sense since the Mahalanobis matrix is nearly sparse under the high-dimensional data space. This sparsity spirit stems from the weak correlations among different feature dimensions in the high-dimensional feature space because most distinct features are measured by distinct mechanisms and relatively independent of each other. In another perspective, [22][30] attempt to learn a low-rank sparse metric matrix by induc- ing sparsity to a low-rank linear mapping whose outer prod- uct constitutes the sparse metric. Nonetheless, learning low- rank sparse metrics often implicates complex optimization 1140