In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effectiveandversatile patternminingmethods, e.g.word sequence kernels; (b) the design of specific features capa- ble of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological con- straints, e.g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This prop- erty allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very com- plex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typi- cally established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below