On searching in the real world

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For many, searching is considered a solved problem. Indeed, for text processing, this belief is factually based. The problem is that most real world search applications involve complex documents, and such applications are far from solved. Complex documents, or less formally, real world documents, comprise of a mixture of images, text, signatures, tables, etc., and are often available only in scanned hardcopy formats. Search systems for such document collections are currently unavailable. We describe our complex document information- processing prototype. This prototype integrates point solution (mature) technologies, such as optical character recognition, signature matching and handwritten word spotting techniques, logo detection and recognition, and search and mining approaches, to yield a system capable of searching real world documents. The described prototype validates the adage that the whole is greater than the sum of its parts. Our complex document benchmark development efforts are likewise presented. Having discussed the core approach, we describe some additional point solutions developed at the Illinois Institute of Technology (IIT) Information Retrieval (IR) Laboratory. These include an Arabic stemmer and a natural language source integration fabric called the IIT Intranet Mediator. In terms of stemming, we developed and licensed an Arabic stemmer and search system. Our approach was evaluated using the Arabic TREC collection and favorably compared against the state of the art. We also focused on source integration and ease of user interaction. By integrating structured, semi-structured, and unstructured sources, we developed and licensed our mediator technology that provides a single, natural language interface to querying distributed sources. Rather than providing a set of links as possible answers, the described approach actually answers the posed questions. © 2009 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Frieder, O. (2009). On searching in the real world. In Computational Methods for Counterterrorism (pp. 3–16). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-01141-2_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free