Parallel Algorithms to Align Multiple Strings in the Context of Web Data Extraction

  • Gfrerer C
  • Vajteršic M
  • Kutil R
N/ACitations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The alignment of multiple strings generated from web pages represents a crucial problem in the processing of daily increasing amounts of data in the Internet. The complexity of this problem grows exponentially with the number of strings. Since it is not possible to achieve practically acceptable results on serial computers even with efficient heuristic approaches, parallel processing seems to be an inevitable option. There already exist emerging parallel solutions for the alignment of multiple strings in areas such as bioinformatics and genome applications. However, to our knowledge, no parallel solution has been published so far for a problem which arises in the context of web data extraction. In this work, we present two algorithms for a parallel solution of this problem, where input web data records are represented as a two-dimensional array of symbols. The algorithms differ in the assignment of the array data to the parallel processes. In the first one a distribution according to symbols is considered, whereas the second one operates by partitioning its columns. Communication among processes is handled via message passing in both cases. The algorithms are analyzed with respect to time and space complexity. We implemented both algorithms and have studied their properties by running them on a multiprocessor system. For the version with distributed columns, we observed that its speedup significantly suffers from the communication overhead. However, the results for the version with data distribution by symbols are convincing. In this case, reasonable performance has been obtained.

Cite

CITATION STYLE

APA

Gfrerer, C., Vajteršic, M., & Kutil, R. (2017). Parallel Algorithms to Align Multiple Strings in the Context of Web Data Extraction (pp. 525–578). https://doi.org/10.1007/978-3-319-46376-6_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free