MEGA5: Molecular Evolutionary Gen...
and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com ��� The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology 1 April 12, 2011 Article (Revised) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods Koichiro Tamura1,2, Daniel Peterson2, Nicholas Peterson2, Glen Stecher2, Masatoshi Nei3 and Sudhir Kumar2,4* 1Department of Biological Sciences, Tokyo Metropolitan University, 1-1 Minami-ohsawa, Hachioji, Tokyo 192-0397, Japan 2Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA 3Department of Biology and the Institute of Molecular Evolutionary Genetics, The Pennsylvania State University, University Park, PA 16802, USA 4School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA *Address for Correspondence: Sudhir Kumar Biodesign Institute Building A240 Arizona State University 1001 S. McAllister Avenue Tempe, AZ 85287-5301 Tel: 480-727-6949 E-mail: s.kumar@asu.edu MBE Advance Access published May 4, 2011
2 Abstract Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of MEGA5 (Molecular Evolutionary Genetics Analysis version 5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of Maximum Likelihood (ML) analyses for inferring evolutionary trees, selecting best- fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user-interface has now been enhanced to be activity-driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from www.megasoftware.net.
3 The Molecular Evolutionary Genetics Analysis (MEGA) software was developed with the goal of providing a biologist-centric, integrated suite of tools for statistical analyses of DNA and protein sequence data from an evolutionary standpoint. Over the years, it has grown to include tools for sequence alignment, phylogenetic reconstruction and phylogeny visualization, testing an array of evolutionary hypotheses, estimating sequence divergences, web-based acquisition of sequence data, and expert systems to generate natural language descriptions of the analysis methods and data chosen by the user (Kumar, Tamura and Nei 1994 Kumar and Dudley 2007 Kumar et al. 2008). With the fifth major release, the collection of analysis tools in MEGA has now broadened to include the Maximum Likelihood (ML) methods for molecular evolutionary analysis. Table 1 contains a summary of all statistical methods and models in MEGA5, with new features marked with an asterisk (*). In the following, we provide a brief description of methodological advancements, along with relevant research results, and technical enhancements in MEGA5. Model Selection for Nucleotide and Amino Acid sequences MEGA5 now contains facilities to evaluate the fit of major models of nucleotide and amino acid substitutions, which are frequently desired by researchers (Posada and Crandall 1998 Nei and Kumar 2000 Yang 2006) (Fig 1A). For nucleotide substitutions, the General Time Reversible (GTR) and five nested models are available, whereas six models with and without empirical frequencies have been programmed for the amino acid substitutions (Table 1). MEGA5 provides the goodness-of-fit (see below) of the substitution models with and without assuming the existence of evolutionary rate variation among sites, which is modeled by a discrete Gamma distribution (+G) (Yang 1994) and/or an allowance for the presence of invariant sites (+I) (Fitch and Margoliash 1967 Fitch 1986 Shoemaker and Fitch 1989). This results in an evaluation of 24 and 48 models for nucleotide and amino acid substitutions, respectively. For each of these models, MEGA5 provides the estimated values of shape parameter of the Gamma distribution (���), the proportion of invariant sites, and the substitution rates between bases or residues, as applicable. Depending on the model, the assumed or observed values of the base or amino acid frequencies used in the analysis are also provided. This information enables researchers to quickly examine the robustness of the estimates of evolutionary parameters under different models of substitutions and assumptions about the distribution of evolutionary rates among sites