DNA (Deoxyribose Nucleid Acid) is a series of nucleotide acid proteins that exist in the organism body where DNA will be identical with inheritance. The sequence alignment mechanism is one of the most important methods in finding a match between DNA sequences. This mechanism is also used in the mechanism for making vaccines, where the process takes a large portion of the time. Parallel and distributed systems exist to solve this problem. One of them is using the Hadoop platform, which is currently being used for processing biological data. The making of the COVID-19 vaccine is one example of the implementation of using a distributed and parallel system model, so that the manufacturing process can be done in a fairly short time. In this study, we used MSA (Multiple Sequence Alignment) where one of the algorithms which has a high accuracy value is T-COFFEE (Tree Based Consistency Objective Function for Alignment Evaluation) algorithm. T-COFFEE is an algorithm for multiple sequences which is very suitable for finding similarities in DNA data by focusing on very high accuracy values. Besides having a high accuracy value, T-COFFEE requires a very long time to process. So this research did implementation of T-COFFEE on hadoop parallelization using Spark which has been proven to reduce the execution time.
CITATION STYLE
Prihatiningrum, V., Setyorini, & Karimah, S. A. (2021). T-COFFEE Multiple Sequence Aligner on Hadoop Spark Cluster. In 2021 9th International Conference on Information and Communication Technology, ICoICT 2021 (pp. 259–263). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICoICT52021.2021.9527471
Mendeley helps you to discover research relevant for your work.