The neighbor-joining method: a ne...
The Neighbor-joining Method: A New Method for Reconstructing Phylogenetic Trees��� Naruya Saitou2 and Masatoshi Nei Center for Demographic and Population Genetics, The University of Texas Health Science Center at Houston A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [ =neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of anal- ysis, Far-r-is���smethod, Sattath and Tversky���s method, Li���s method, and Tateno et al.���s modified Fan-is method. The new, neighbor-joining method and Sattath and Tversky���s method are shown to be generally better than the other methods. Introduction In the construction of phylogenetic trees, the principle of minimum evolution or maximum parsimony is often used. The standard algorithm of the tree-making methods based on this principle is to examine all possible topologies (branching pat- terns) or a certain number of topologies that are likely to be close to the true tree and to choose one that shows the smallest amount of total evolutionary change as the final tree. This method is quite time consuming, and, when the number of operational taxonomic units (OTUs) is large, only a small proportion of all possible topologies is examined. However, there are methods in which the process of searching for the minimum evolution tree is built into the algorithm, so that a unique final topology is obtained automatically. Some examples are the distance Wagner (DW) method (Farris 1972), modified Farris (MF) methods (Tateno et al. 1982 Faith 1985), and the neigh- borliness methods of Sattath and Tversky (ST method 1977) and Fitch ( 198 1). These methods are not guaranteed to produce the minimum-evolution tree, but their effi- ciency in obtaining the correct tree is often better than that of the standard maximum- parsimony algorithm (Saitou and Nei 1986). In the following we would like to present a new method (the neighbor-joining [NJ] method) that produces a unique final tree under the principle of minimum evolution. This method also does not necessarily produce the minimum-evolution tree, but computer simulations have shown that it I. Key words: phylogenetic tree, neighbor-joining method, minimum-evolution tree, parsimonious tree. 2. Current address: Department of Anthropology, University of Tokyo, Tokyo 113, Japan. Address for corespondence and reprints: Dr. Masatoshi Nei, Center for Demographic and Population Genetics, Graduate School of Biomedical Sciences, The University of Texas Health Science Center at Houston, P.O. Box 20334, Houston, Texas 77225. Mol. Biol. Evol. 4(4):406-425. 1987. 0 1987 by The University of Chicago. All rights reserved. 0737-4038/87/0404-0007$02.00 406
Neighbor-joining Method 407 is quite efficient in obtaining the correct tree topology. It is applicable to any type of evolutionary distance data. Algorithm The algorithm of the NJ method is similar to that of the ST method, whose objective is to construct the topology of a tree. Unlike this method, however, the NJ method provides not only the topology but also the branch lengths of the final tree. Before discussing the algorithm of the present method, let us first define the term ���neighbors.��� A pair of neighbors is a pair of OTUs connected through a single interior node in an unrooted, bifurcating tree. Thus, OTUs 1 and 2 in figure 1 are a pair of neighbors because they are connected through one interior node, A. There are two other pairs of neighbors in this tree (viz., [5, 61 and [7, 81). The number of pairs of neighbors in a tree depends on the tree topology. For a tree with N (24) OTUs, the minimum number is always two, whereas the maximum number is N/2 when N is an even number and (N - 1)/2 when N is an odd number. If we combine OTUs 1 and 2 in figure 1, this combined OTU ( l-2) and OTU 3 become a new pair of neighbors. It is possible to define the topology of a tree by successively joining pairs of neighbors and producing new pairs of neighbors. For example, the topology of the tree in figure 1 can be described by the following pairs of neighbors: [l, 21, [5, 61, 17, 81, [l-2, 31, and [l-2-3, 41. Note that there is another pair of neighbors, [5-6,7-81, that is complementary to [l-2-3,4] in defining the topology. In general, N - 2 pairs of neighbors can be produced from a bifurcating tree of N OTUs. By finding these pairs of neighbors successively, we can obtain the tree topology. Our method of constructing a tree starts with a starlike tree, as given in figure 2(a), which is produced under the assumption that there is no clustering of OTUs. In 2 6 FIG. 1 .-An unrooted tree of eight OTUs, l-8. A-F are interior nodes, and italic numbers are branch lengths.
408 Saitou and Nei practice, some pairs of OTUs are more closely related to each other than other pairs are. Consider a tree that is of the form given in figure 2(b). In this tree there is only one interior branch, XY, which connects the paired OTUs (1 and 2) and the others (3, 4, . . . , N) that are connected by a single node, Y. Any pair of OTUs can take the positions of 1 and 2 in the tree, and there are N(N - 1)/2 ways of choosing them. Among these possible pairs of OTUs, we choose the one that gives the smallest sum of branch lengths. This pair of OTUs is then regarded as a single OTU, and the next pair of OTUs that gives the smallest sum of branch lengths is again chosen. This procedure is continued until all N - 3 interior branches are found. The sum of the branch lengths is computed as follows: Let us define Do and Lab as the distance between OTUs i and j and the branch length between nodes a and b, respectively. The sum of the branch lengths for the tree of figure 2(a) is then given by (1) since each branch is counted N - 1 times when all distances are added. On the other hand, the branch length between nodes X and Y (Lxr) in the tree of figure 2(b) is given by Lxy =-[ 5 (Dlk+D2k)-(N_2)(L1x+L2~)-2 5 &I- 1 2(N-2) k=3 (2) i=3 The first term within the brackets of equation (2) is the sum of all distances that include Lxy, and the other two terms are for excluding irrelevant branch lengths. If we eliminate the interior branch (XY) from figure 2(b), two starlike topologies (one for OTUs 1 and 2 and the other for the remaining N - 2 OTUs) appear. Thus, Llx + L2x and c: 3 Liy can be obtained by applying equation (1): Lx+L2x=D12, W 8 5 LiY=& 2 4. i=3 3si-cj 8 (3b) FIG. 2.-(a), A clustered. starlike tree with no hierarchical structure and (b), a tree in which OTUs 1 and 2 are