Distance metrics in open-set classification of text documents by local outlier factor and doc2vec

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we investigate the influence of distance metrics on the results of open-set subject classification of text documents. We utilize the Local Outlier Factor (LOF) algorithm to extend a closed-set classifier (i.e. multilayer perceptron) with an additional class that identifies outliers. The analyzed text documents are represented by averaged word embeddings calculated using the fastText method on training data. Conducting the experiment on two different text corpora we show how the distance metric chosen for LOF (Euclidean or cosine) and a transformation of the feature space (vector representation of documents) both influence the open-set classification results. The general conclusion seems to be that the cosine distance outperforms the Euclidean distance in terms of performance of open-set classification of text documents.

Cite

CITATION STYLE

APA

Walkowiak, T., Datko, S., & Maciejewski, H. (2019). Distance metrics in open-set classification of text documents by local outlier factor and doc2vec. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11606 LNAI, pp. 102–109). Springer Verlag. https://doi.org/10.1007/978-3-030-22999-3_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free