Abstract
Many applications in surveillance, monitoring, scientific dis- covery, and data cleaning require the identification of anoma- lies. Although many methods have been developed to iden- tify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statis- tically significant. Category detection is an emerging area of machine learning that can help address this issue using a "human-in-the-loop" approach. In this interactive setting, the algorithm asks the user to label a query data point un- der an existing category or declare the query data point to belong to a previously undiscovered category. The goal of category detection is to bring to the user's attention a rep- resentative data point from each category in the data in as few queries as possible. In a data set with imbalanced cate- gories, the main challenge is in identifying the rare categories or anomalies; hence, the task is often referred to as rare cate- gory detection. We present a new approach to rare category detection based on hierarchical mean shift. In our approach, a hierarchy is created by repeatedly applying mean shift with an increasing bandwidth on the data. This hierarchy allows us to identify anomalies in the data set at different scales, which are then posed as queries to the user. The main ad- vantage of this methodology over existing approaches is that it does not require any knowledge of the dataset properties such as the total number of categories or the prior probabil- ities of the categories. Results on real-world data sets show that our hierarchical mean shift approach performs consis- tently better than previous techniques. Copyright 2009 ACM.
Author supplied keywords
Cite
CITATION STYLE
Vatturi, P., & Wong, W. K. (2009). Category detection using hierarchical mean shift. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 847–855). https://doi.org/10.1145/1557019.1557112
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.