Evolving an algorithm to generate sparse inverted index using hadoop and pig

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Now a day’s users mostly prefer the keyword search method to access the data for the explosion of information. Inverted indexing efficiently plays a very important role for search operation over a large set of data. There are two problems exist in current keyword based searching technique. First, the large set of data is mostly unstructured and does not suite in the existing database systems. Second, the storage in inverted indexing is usually very large and compression techniques used so far is also not so efficient because they increase the processing time. To overcome these problems, Hadoop, which is a distributed framework for large dataset is needed where the required resources could be shared and accessed very easily. In our proposed work, we will join the list of consecutive document id in the inverted index into the intervals to save memory space. For this, we have developed the UDF (User Defined Function) for stemming and stop words for the sparse inverted index in pig latin. It can be observed in the results that our proposed method is efficient than existing techniques.

Cite

CITATION STYLE

APA

Sharma, S., & Singh, S. (2016). Evolving an algorithm to generate sparse inverted index using hadoop and pig. In Smart Innovation, Systems and Technologies (Vol. 51, pp. 499–508). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-30927-9_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free