Open-set web genre identification using distributional features and nearest neighbors distance ratio

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web genre identification can boost information retrieval systems by providing rich descriptions of documents and enabling more specialized queries. The open-set scenario is more realistic for this task as web genres evolve over time and it is not feasible to define a universally agreed genre palette. In this work, we bring to bear a novel approach to web genre identification underpinned by distributional features acquired by doc2vec and a recently-proposed open-set classification algorithm—the nearest neighbors distance ratio classifier. We present experimental results using a benchmark corpus and a strong baseline and demonstrate that the proposed approach is highly competitive, especially when emphasis is given on precision.

Cite

CITATION STYLE

APA

Pritsos, D., Rocha, A., & Stamatatos, E. (2019). Open-set web genre identification using distributional features and nearest neighbors distance ratio. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11438 LNCS, pp. 3–11). Springer Verlag. https://doi.org/10.1007/978-3-030-15719-7_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free