the spectral study of cancer dates back 50 years, but it is still not known whether spectral measurements suf- fice to distinguish cancerous from normal tissue. An objective approach to that question is designing automatic classifiers for discrimination between these two classes and then estimating generalization error rates. Previous studies have not estimated errors adequately: it is not a priori clear whether unseen spectra from patients in the algorithm’s test set are sufficiently indepen- dent of the training data to provide a fair evaluation. We show experimentally that to obtain accurate error estimations, spectra from unseen patients are necessary. Our results suggest that although spectra are not sufficient to distinguish fully between cancerous and normal tissue, some high degree of discrimination is possible. This leads us to ask how discriminatory spectral features should be selected. The features in previous work on cancer spectroscopy have been chosen according to heuristics. We use the “best basis” algorithm to select a Haar wavelet packet basis which is optimal for the discrimination task at hand. These provide interpretable spectral features consisting of contiguous wavelength bands. However they are outperformed by features which use information from all parts of the spectrum, combined linearly at random.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below