With the advances in the genome area, new techniques and automation processes for DNA sequencing, the amount of data produced has increased exponentially. Analyzing this data, in order to identify interesting biological features, is an enormous challenge, especially if it would be done manually. Think about trying to find a specific word in a book, say Don Quixote, and we have to search word by word. How long it would take? Bioinformatics has played an important role trying to help specialists to analyze data of a specific genome. The application of information technology, associated with techniques from applied mathematics, informatics, statistics, and computer science, has allowed the discovering of interesting and important characteristics in genomes, allowing to understand and solve several biological problems, or even to generate more knowledge or insight about the problem and its involved biological processes, what can bring advances in the used techniques. In Computing area, for example, an ordinary type of task is to process texts. There are several problems involving strings, like trying to find a specific word (we could say “to align words”) or a similar one (considering a particular pattern of characters) in a text. When processing genomic data, if it is desired to search for a specific pattern (and its approximations) in DNA sequences, the natural way is to use solutions already implemented. Thus, for pattern (exact or not) search and similar problems, bioinformaticians have developed computational tools that apply techniques and algorithms well-known in Computing area in order to solve these important genomic problems. Sometimes, they need to adapt algorithms for considering specific features of the biological problem. Two good examples of this case are Sequence Aligning and Sequence Assembly, processes resulting of adaptations in algorithms in order to consider insertion, deletion, and substitution of nucleotides in DNA sequences. Some statistical and computational techniques, such as Hidden Markov Models (HMMs), Stochastic Grammars, and Conditional Random Fields (CRFs) have been successfully applied for modeling, analysis, discovery, classification, and alignment of biological sequences (Yoon & Vaidyanathan, 2004, 2005). HMMs (Rabiner, 1989) and Stochastic Grammars (Sakakibara et al., 1994) are forms of generative models to label sequences, assigning a joint probability distribution of, for example, the gene hidden structure y and the 7
CITATION STYLE
Norberto, C., & Souza Serapio, A. B. de. (2010). Bioinformatics: Strategies, Trends, and Perspectives. In New Advanced Technologies. InTech. https://doi.org/10.5772/9441
Mendeley helps you to discover research relevant for your work.