Multilayer Identification: Combining N-Grams, TF-IDF and Monge-Elkan in Massive Real Time Processing

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In modern societies control is based on information. Nowadays, in many countries, companies are obligated to provide to tax administrations all their invoices and withholders and financial entities to provide information that is used to offer prefilled tax declaration. In the case of Spain, the Tax Agency (AEAT) receives 180 million invoices by month and must process in a few days at the end of January more than 500 millions of registers to prefill Income Tax forms. Hundreds of thousands of these data are not correctly identified by the provider and must be returned to the sender or stored as not identified and analyzed afterwards. Traditionally this process consumed many technical and human resources. AEAT has been able to provide for first time a solution for identification in real time with enormous throughput that fulfil its needs. It is based in a combination of six algorithms, based in three different ideas, n-gram, TI-ILF, and Monge-Elkan that has surpassed any previous expectative.

Cite

CITATION STYLE

APA

González, I., & Mateos, A. (2019). Multilayer Identification: Combining N-Grams, TF-IDF and Monge-Elkan in Massive Real Time Processing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11676 LNAI, pp. 213–223). Springer Verlag. https://doi.org/10.1007/978-3-030-26773-5_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free