A Dockerized String Analysis Workflow for Big Data

Maria Th Kotouza; Fotis E. Psomopoulos; Pericles A. Mitkas

Conference Proceedings

A Dockerized String Analysis Workflow for Big Data

Communications in Computer and Information Science (2019) 1064 564-569

DOI: 10.1007/978-3-030-30278-8_55

1Citations

3Readers

Get full text

Abstract

Nowadays, a wide range of sciences are moving towards the Big Data era, producing large volumes of data that require processing for new knowledge extraction. Scientific workflows are often the key tools for solving problems characterized by computational complexity and data diversity, whereas cloud computing can effectively facilitate their efficient execution. In this paper, we present a generative big data analysis workflow that can provide analytics, clustering, prediction and visualization services to datasets coming from various scientific fields, by transforming input data into strings. The workflow consists of novel algorithms for data processing and relationship discovery, that are scalable and suitable for cloud infrastructures. Domain experts can interact with the workflow components, set their parameters, run personalized pipelines and have support for decision-making processes. As case studies in this paper, two datasets consisting of (i) Documents and (ii) Gene sequence data are used, showing promising results in terms of efficiency and performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Kotouza, M. T., Psomopoulos, F. E., & Mitkas, P. A. (2019). A Dockerized String Analysis Workflow for Big Data. In Communications in Computer and Information Science (Vol. 1064, pp. 564–569). Springer Verlag. https://doi.org/10.1007/978-3-030-30278-8_55

A Dockerized String Analysis Workflow for Big Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions