Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media. In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results. © 2012 Springer-Verlag.
CITATION STYLE
Garrido, A. L., Gómez, O., Ilarri, S., & Mena, E. (2012). An experience developing a semantic annotation system in a media group. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7337 LNCS, pp. 333–338). https://doi.org/10.1007/978-3-642-31178-9_43
Mendeley helps you to discover research relevant for your work.