Classification, Clustering, Features and Distances of Sequence Data

Guozhu Dong; Jian Pei

Book Chapter

Classification, Clustering, Features and Distances of Sequence Data

Dong G
Pei J

Springer US, (2007), 47-65

DOI: 10.1007/978-0-387-69937-0_3

N/ACitations

12Readers

Get full text

Abstract

This chapter is concerned with the classification and clustering of sequence data, together with sequence features and sequence distance functions. It is organized as follows: • Section 3.1 provides a general categorization of sequence classification and sequence clustering tasks. There are three general tasks. Two of those tasks concern whole sequences and will be presented here. The third topic, namely sequence motifs (site/position-based identification and characterization of sequence families), will be presented in Chapter 4. The existence of a third task is due to the facts that (a) positions inside sequences are important, a factor which is not present for non-sequence data, and (b) succinct characterizations of sequence families are desired for identifying future members of the families. • Section 3.2 discusses sequence features, concerning various feature types and general feature selection criteria. Section 3.3 is about sequence sim-ilarity/distance functions. The materials in Sections 3.2 and 3.3 will be useful not only for classification and clustering, but also for other topics (such as identification and characterization of sequence families). • Section 3.4 discusses sequence classification. Several popular sequence classification algorithms are presented and a brief discussion on the evaluation of classifiers and classification algorithms is given. • Section 3.5 discusses sequence clustering; it includes several popular sequence clustering approaches and a brief discussion on clustering quality analysis. 3.1 Three Tasks on Sequence Classification/Clustering On sequence data, the following three data mining tasks are related to the general data mining tasks of classification and clustering.

Cite

CITATION STYLE

APA

Dong, G., & Pei, J. (2007). Classification, Clustering, Features and Distances of Sequence Data. In Sequence Data Mining (pp. 47–65). Springer US. https://doi.org/10.1007/978-0-387-69937-0_3

Classification, Clustering, Features and Distances of Sequence Data

Abstract

Cite

Register to see more suggestions