Sign up & Download
Sign in

A Practical Guide to Support Vector Classification

by Chih-wei Hsu, Chih-chung Chang, Chih-jen Lin
Bioinformatics ()

Abstract

The support vector machine (SVM) is a popular classification technique. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but significant steps. In this guide, we propose a simple procedure which usually gives reasonable results.

Cite this document (BETA)

Available from citeseerx.ist.psu.edu
Page 1
hidden

A Practical Guide to Support Vect...

A Practical Guide to Support Vector Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin Department of Computer Science National Taiwan University, Taipei 106, Taiwan http://www.csie.ntu.edu.tw/~cjlin Initial version: 2003 Last updated: April 15, 2010 Abstract The support vector machine (SVM) is a popular classification technique. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but significant steps. In this guide, we propose a simple procedure which usually gives reasonable results. 1 Introduction SVMs (Support Vector Machines) are a useful technique for data classification. Al- though SVM is considered easier to use than Neural Networks, users not familiar with it often get unsatisfactory results at first. Here we outline a ���cookbook��� approach which usually gives reasonable results. Note that this guide is not for SVM researchers nor do we guarantee you will achieve the highest accuracy. Also, we do not intend to solve challenging or diffi- cult problems. Our purpose is to give SVM novices a recipe for rapidly obtaining acceptable results. Although users do not need to understand the underlying theory behind SVM, we briefly introduce the basics necessary for explaining our procedure. A classification task usually involves separating data into training and testing sets. Each instance in the training set contains one ���target value��� (i.e. the class labels) and several ���attributes��� (i.e. the features or observed variables). The goal of SVM is to produce a model (based on the training data) which predicts the target values of the test data given only the test data attributes. Given a training set of instance-label pairs (xi,yi),i = 1,...,l where xi ��� Rn and y ��� {1,-1}l, the support vector machines (SVM) (Boser et al., 1992 Cortes and Vapnik, 1995) require the solution of the following optimization problem: min w,b,�� 1 2 wT w + C Xl i=1 ��i subject to yi(wT ��(xi) + b) ��� 1 - ��i, (1) ��i ��� 0. 1
Page 2
hidden
Table 1: Problem characteristics and performance comparisons. Applications #training #testing #features #classes Accuracy Accuracy data data by users by our procedure Astroparticle1 3,089 4,000 4 2 75.2% 96.9% Bioinformatics2 391 04 20 3 36% 85.2% Vehicle3 1,243 41 21 2 4.88% 87.8% Here training vectors xi are mapped into a higher (maybe infinite) dimensional space by the function ��. SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space. C 0 is the penalty parameter of the error term. Furthermore, K(xi, xj) ��� ��(xi)T ��(xj) is called the kernel function. Though new kernels are being proposed by researchers, beginners may find in SVM books the following four basic kernels: ��� linear: K(xi, xj) = xi T xj. ��� polynomial: K(xi, xj) = (��xiT xj + r)d, �� 0. ��� radial basis function (RBF): K(xi, xj) = exp(-��kxi - xjk2), �� 0. ��� sigmoid: K(xi, xj) = tanh(��xiT xj + r). Here, ��, r, and d are kernel parameters. 1.1 Real-World Examples Table 1 presents some real-world examples. These data sets are supplied by our users who could not obtain reasonable accuracy in the beginning. Using the procedure illustrated in this guide, we help them to achieve better performance. Details are in Appendix A. These data sets are at http://www.csie.ntu.edu.tw/~cjlin/papers/guide/ data/ 1Courtesy of Jan Conrad from Uppsala University, Sweden. 2Courtesy of Cory Spencer from Simon Fraser University, Canada (Gardy et al., 2003). 3Courtesy of a user from Germany. 4As there are no testing data, cross-validation instead of testing accuracy is presented here. Details of cross-validation are in Section 3.2. 2
Page 3
hidden
1.2 Proposed Procedure Many beginners use the following procedure now: ��� Transform data to the format of an SVM package ��� Randomly try a few kernels and parameters ��� Test We propose that beginners try the following procedure first: ��� Transform data to the format of an SVM package ��� Conduct simple scaling on the data ��� Consider the RBF kernel K(x, y) = e-��kx-yk2 ��� Use cross-validation to find the best parameter C and �� ��� Use the best parameter C and �� to train the whole training set5 ��� Test We discuss this procedure in detail in the following sections. 2 Data Preprocessing 2.1 Categorical Feature SVM requires that each data instance is represented as a vector of real numbers. Hence, if there are categorical attributes, we first have to convert them into numeric data. We recommend using m numbers to represent an m-category attribute. Only one of the m numbers is one, and others are zero. For example, a three-category attribute such as {red, green, blue} can be represented as (0,0,1), (0,1,0), and (1,0,0). Our experience indicates that if the number of values in an attribute is not too large, this coding might be more stable than using a single number. 5The best parameter might be affected by the size of data set but in practice the one obtained from cross-validation is already suitable for the whole training set. 3

Readership Statistics

2821 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
27% Ph.D. Student
 
24% Student (Master)
 
12% Student (Bachelor)
by Country
 
15% United States
 
10% China
 
9% United Kingdom

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in