Enabling Real Time Analytics over Raw XML Data

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The data generated by many applications is in semi structured format, such as XML. This data can be used for analytics only after shredding and storing it in structured format. This process is known as Extract-Transform-Load or ETL. However, ETL process is often time consuming due to which crucial time-sensitive insights can be lost or they may become un-actionable. Hence, this paper poses the following question: How do we expose analytical insights in the raw XML data? We address this novel problem by discovering additional information from the raw semi-structured data repository, called complementary information (CI), for a given user query. Experiments with real as well as synthetic data show that the discovered CI is relevant in the context of the given user query, nontrivial, and has high precision. The recall is also found to be high for most queries. Crowd-sourced feedback on the discovered CI corroborates these findings, showing that our system is able to discover highly relevant and potentially useful CI in real-world XML data repositories. Concepts behind our technique are generic and can be used for other semi-structured data formats as well.

Cite

CITATION STYLE

APA

Agarwal, M. K., Ramamritham, K., & Agarwal, P. (2019). Enabling Real Time Analytics over Raw XML Data. In Lecture Notes in Business Information Processing (Vol. 337, pp. 113–132). Springer. https://doi.org/10.1007/978-3-030-24124-7_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free