MicroRNAs: genomics, biogenesis, ...
Cell, Vol. 116, 281���297, January 23, 2004, Copyright ���2004 by Cell Press Review MicroRNAs: Genomics, Biogenesis, Mechanism, and Function ulation of hematopoietic lineage differentiation in mam- mals (Chen et al., 2004), and control of leaf and flower development in plants (Aukerman and Sakai, 2003 David P. Bartel1,2,* 1Whitehead Institute for Biomedical Research 9 Cambridge Center Cambridge, Massachusetts 02142 Chen, 2003 Emery et al., 2003 Palatnik et al., 2003). Computational approaches for finding messages con- 2 Department of Biology Massachusetts Institute of Technology trolled by miRNAs indicate that these examples repre- sent a very small fraction of the total (Rhoades et al., Cambridge, Massachusetts 02139 2002 Enright et al., 2003 Lewis et al., 2003 Stark et al., 2003). This review highlights what has been learned about MicroRNAs (miRNAs) are endogenous 22 nt RNAs that can play important regulatory roles in animals and miRNAs in the decade since the report of the lin-4 RNA and its regulation of lin-14. The major topics discussed plants by targeting mRNAs for cleavage or transla- tional repression. Although they escaped notice until are miRNA genomics, miRNA biogenesis, miRNA regula- tory mechanisms, and the roles of miRNAs in gene regu- relatively recently, miRNAs comprise one of the more abundant classes of gene regulatory molecules in latory pathways. multicellular organisms and likely influence the output of many protein-coding genes. Genomics: The miRNA Genes For seven years after the discovery of the lin-4 RNA, the genomics of this type of tiny regulatory RNA appeared In an investigation inspiring for both its perseverance simple: there was no evidence for lin-4-like RNAs be- and its scientific insight, Victor Ambros and colleagues, yond nematodes and no sign of any similar noncoding Rosalind Lee and Rhonda Feinbaum, discovered that RNAs within nematodes. This all changed upon the dis- lin-4, a gene known to control the timing of C. elegans covery that let-7, another gene in the C. elegans hetero- larval development, does not code for a protein but chronic pathway, encoded a second 22 nt regulatory instead produces a pair of small RNAs (Lee et al., 1993). RNA. The let-7 RNA acts to promote the transition from One RNA is approximately 22 nt in length, and the other late-larval to adult cell fates in the same way that the is approximately 61 nt the longer one was predicted to lin-4 RNA acts earlier in development to promote the fold into a stem loop proposed to be the precursor of progression from the first larval stage to the second the shorter one. The Ambros and Ruvkun labs then no- (Reinhart et al., 2000 Slack et al., 2000). Furthermore, ticed that these lin-4 RNAs had antisense complemen- homologs of the let-7 gene were soon identified in the tarity to multiple sites in the 3 UTR of the lin-14 gene human and fly genomes, and let-7 RNA itself was de- (Lee et al., 1993 Wightman et al., 1993). This comple- tected in human, Drosophila, and eleven other bilateral mentarity fell in a region of the 3 UTR previously pro- animals (Pasquinelli et al., 2000). posed to mediate the repression of lin-14 by the lin-4 Because of their common roles in controlling the tim- gene product (Wightman et al., 1991). The Ruvkun lab ing of developmental transitions, the lin-4 and let-7 went on to demonstrate the importance of these com- RNAs were dubbed small temporal RNAs (stRNAs), with plementary sites for regulation of lin-14 by lin-4, showing anticipation that additional regulatory RNAs of this type also that this regulation substantially reduces the would be discovered (Pasquinelli et al., 2000). Indeed, amount of LIN-14 protein without noticeable change less than one year later, three labs cloning small RNAs in levels of lin-14 mRNA. Together, these discoveries from flies, worms, and human cells reported a total of supported a model in which the lin-4 RNAs pair to the over one hundred additional genes for tiny noncoding lin-14 3 UTR to specify translational repression of the RNAs, approximately 20 new genes in Drosophila, ap- lin-14 message as part of the regulatory pathway that proximately 30 in human, and approximately 60 in triggers the transition from cell divisions of the first larval worms (Lagos-Quintana et al., 2001 Lau et al., 2001 stage to those of the second (Lee et al., 1993 Wightman Lee and Ambros, 2001). The RNA products of these et al., 1993). genes resembled the lin-4 and let-7 stRNAs in that they The shorter lin-4 RNA is now recognized as the found- were 22 nt endogenously expressed RNAs, potentially ing member of an abundant class of tiny regulatory RNAs processed from one arm of a stem loop precursor (Figure called microRNAs or miRNAs (Lagos-Quintana et al., 1), and they were generally conserved in evolution��� 2001 Lau et al., 2001 Lee and Ambros, 2001). The some quite broadly, others only in more closely related breadth and importance of miRNA-directed gene regula- species such as C. elegans and C. briggsae. But unlike tion are coming into focus as more miRNAs and their lin-4 and let-7 RNAs, many of the newly identified 22 regulatory targets and functions are discovered. Re- nt RNAs were not expressed in distinct stages of devel- cently discovered miRNA functions include control of opment and instead were more likely to be expressed cell proliferation, cell death, and fat metabolism in flies in particular cell types. Thus the term microRNA was (Brennecke et al., 2003 Xu et al., 2003), neuronal pat- used to refer to the stRNAs and all the other tiny RNAs terning in nematodes (Johnston and Hobert, 2003), mod- with similar features but unknown functions (Lagos- Quintana et al., 2001 Lau et al., 2001 Lee and Ambros, 2001). Intensified cloning efforts have revealed numer- *Correspondence: email@example.com
Cell 282 Figure 1. Examples of Metazoan miRNAs Shown are predicted stem loops involving the mature miRNAs (red) and flanking sequence. The miRNAs* (blue) are also shown in cases where they have been experimentally identi- fied (Lim et al., 2003a). (A) Predicted stem loops of the founding miRNAs, lin-4 and let-7 RNAs (Lee et al., 1993 Reinhart et al., 2000). The precise sequences of the mature miRNAs were defined by clon- ing (Lau et al., 2001). Shown are the C. ele- gans stem loops, but close homologs of both have been found in flies and mammals (Pas- quinelli et al., 2000 Lagos-Quintana et al., 2001, 2002). (B) Examples of miRNAs from other metazoan genes, mir-1, mir-34, and mir-124. Shown are the C. elegans stem loops, but close homo- logs of these miRNAs have been found in flies and mammals (Lagos-Quintana et al., 2001, 2002 Lau et al., 2001 Lee and Ambros, 2001). (C) Examples of miRNAs from plant genes, MIR165a, MIR172a2, and JAW. Shown are Arabidopsis stem loops, but close homologs of these miRNAs have been found in rice and other plants (Park et al., 2002 Reinhart et al., 2002 Palatnik et al., 2003). ous additional miRNA genes in mammals, fish, worms, latory scenarios are easy to imagine in which such coor- dinate expression could be useful, which would explain and flies (Lagos-Quintana et al., 2002, 2003 Mourelatos et al., 2002 Ambros et al., 2003b Aravin et al., 2003 the conserved relationships between miRNAs and host mRNAs. A striking example of this conservation involves Dostie et al., 2003 Houbaviy et al., 2003 Kim et al., 2003 Lim et al., 2003a, 2003b Michael et al., 2003). A mir-7, found in the intron of hnRNP K in both insects and mammals (Aravin et al., 2003). registry has been set up to catalog the miRNAs and facilitate the naming of newly identified genes (Griffiths- Other miRNA genes are clustered in the genome with an arrangement and expression pattern implying tran- Jones, 2004). Like C. elegans lin-4 and let-7, most miRNA genes scription as a multi-cistronic primary transcript (Lagos- Quintana et al., 2001 Lau et al., 2001). Although the come from regions of the genome quite distant from previously annotated genes, implying that they derive majority of worm and human miRNA genes are isolated and not clustered (Lim et al., 2003a, 2003b), over half from independent transcription units (Lagos-Quintana et al., 2001 Lau et al., 2001 Lee and Ambros, 2001). of the known Drosophila miRNAs are clustered (Aravin et al., 2003). The miRNAs within a genomic cluster are Nonetheless, a sizable minority (e.g., about a quarter of the human miRNA genes) are in the introns of pre- often, though not always, related to each other and related miRNAs are sometimes but not always clustered mRNAs. These are preferentially in the same orientation as the predicted mRNAs, suggesting that most of these (Lagos-Quintana et al., 2001 Lau et al., 2001). Orthologs of C. elegans lin-4 and let-7 are clustered in the fly and miRNAs are not transcribed from their own promoters but are instead processed from the introns, as seen also human genomes and are coexpressed, sometimes from the same primary transcript, leading to the idea that the for many snoRNAs (Aravin et al., 2003 Lagos-Quintana et al., 2003 Lai et al., 2003 Lim et al., 2003a). This genomic separation of lin-4 from let-7 in nematodes might be unique to the worm lineage (Aravin et al., 2003 arrangement provides a convenient mechanism for the coordinated expression of a miRNA and a protein. Regu- Bashirullah et al., 2003 Sempere et al., 2003). This exam-
Review 283 ple illustrates the possibility that even in cases where portunity for ���micromanaging��� the output of the tran- scriptome. clustered genes have no apparent sequence homology, they may share functional relationships. Another remarkable aspect of miRNA expression is the sheer abundance of certain miRNAs in the cells. For Some of the more interesting genomic locations of miRNA genes include those in the Hox clusters. The example, miR-2, miR-52, and miR-58 are each present on average at more than 50,000 molecules per adult mir-10 gene lies in the Antennapedia complex of insects and in the orthologous locations in two Hox clusters of worm cell���a greater abundance than the U6 snRNA of the spliceosome (Lim et al., 2003a). Whether this high mammals, whereas the mir-iab-4 gene is within the in- sect Bithorax cluster (Aravin et al., 2003 Lagos- expression is attributable to very robust transcription or to slow decay is not yet known. Some miRNAs are Quintana et al., 2003). In light of the roles of other genes of the Hox clusters, the Hox miRNAs are especially good expressed at much lower levels. For instance, miR-124 is present in the adult worm on average at 800 molecules candidates for having interesting functions in animal development. Other interesting loci include the mir-15a- per cell (Lim et al., 2003a). This lower average level (though still higher than that of the typical mRNA) might mir-16 cluster, which falls within a region of human chro- mosome 13 thought to harbor a tumor suppressor gene be due to low expression in many cells or high expres- sion in just a few cells. The finding that the mouse or- because it is the site of the most common structural aberrations in both mantle cell lymphoma and B cell tholog of miR-124 is nearly exclusively expressed in the brain supports the latter explanation (Lagos-Quintana chronic lymphocytic leukemia (Lagos-Quintana et al., 2001 Calin et al., 2002). et al., 2002). Nearly all of the cloned miRNAs are conserved in closely related animals, such as human and mouse, or Genomics: Computational Approaches C. elegans and C. briggsae (Lagos-Quintana et al., 2003 and Gene Number Lim et al., 2003a, 2003b). This statement remains true There has been some speculation as to why miRNAs even when ignoring evolutionary conservation as a crite- were not discovered earlier the answer is clearly not that rion for classifying clones as miRNAs. Many are also they are rare. MicroRNAs and their associated proteins conserved more broadly among the animal lineages appear to be one of the more abundant ribonucleopro- (Ambros et al., 2003b Aravin et al., 2003 Lagos- tein complexes in the cell. Nonetheless, miRNAs whose Quintana et al., 2003 Lim et al., 2003a). For instance, expression is restricted to nonabundant cell types or more than a third of the C. elegans miRNAs have easily specific environmental conditions could still be missed recognized homologs among the human miRNAs (Lim in cloning efforts. Thus, computational approaches have et al., 2003a). When comparing distant lineages, consid- been developed to complement experimental ap- erable expansion or contraction of gene families is ap- proaches to miRNA gene identification. From early on, parent, the most striking example being the let-7 family, homology searches have revealed orthologs and para- which has four identified members in C. elegans and at logs of known miRNA genes (Pasquinelli et al., 2000 least 15 in human, but only one in Drosophila (Pasquinelli Lagos-Quintana et al., 2001 Lau et al., 2001 Lee and et al., 2000 Aravin et al., 2003 Lai et al., 2003 Lim et Ambros, 2001). Another simple approach has been to al., 2003a). search the vicinity of known miRNA genes for other stem loops that might represent additional genes of a genomic cluster (Lau et al., 2001 Aravin et al., 2003 Genomics: miRNA Expression Many miRNAs have intriguing expression patterns. For Seitz et al., 2003 Ohler et al., 2004). This strategy is important because some of the most rapidly evolving example, paralogs and orthologs of the C. elegans lin-4 and let-7 RNAs have stage-specific expression in devel- miRNA genes are present as tandem arrays within op- eron-like clusters, and the divergent sequences of these opment as if they, too, function as stRNAs (Pasquinelli et al., 2000 Lau et al., 2001 Lagos-Quintana et al., 2002 genes make them relatively difficult to spot using the more general approaches. Bashirullah et al., 2003 Lim et al., 2003a). Other interest- ing examples include miR-1, which is primarily found in Gene-finding approaches that do not depend on ho- mology or proximity to known genes have also been the mammalian heart (Lee and Ambros, 2001 Lagos- Quintana et al., 2002) miR-122, which is primarily in the developed and applied to entire genomes (Ambros et al., 2003b Grad et al., 2003 Lai et al., 2003 Lim et al., liver (Lagos-Quintana et al., 2002) miR-223, which is primarily in the granulocytes and macrophages of 2003a). They typically start by identifying conserved ge- nomic segments that both fall outside of predicted pro- mouse bone marrow (Chen et al., 2004) miRNAs of the mir-35���mir-42 cluster, which are preferentially in the C. tein-coding regions and potentially could form stem loops and then score these candidate miRNA stem loops elegans embryo (Lau et al., 2001) and those of the mir- 290���mir-295 cluster, which are expressed in mouse em- for the patterns of conservation and pairing that charac- terize known miRNAs genes. So far, the two most sensi- bryonic stem cells but not in differentiated cells (Hou- baviy et al., 2003). Expression array technology has been tive computational scoring tools are MiRscan, which has been systematically applied to nematode and vertebrate adapted to examine miRNAs and has revealed distinct expression patterns in different developmental stages candidates (Lim et al., 2003a, 2003b), and miRseeker, which has been systematically applied to insect candi- or regions of the mammalian brain (Krichevsky et al., 2003). With all the different genes and expression pat- dates (Lai et al., 2003). Both MiRscan and miRseeker have identified dozens of genes that were subsequently terns, it is reasonable to propose that every metazoan cell type at each developmental stage might have a (or concurrently) verified experimentally. Because of their relatively high sensitivity, MiRscan and miRseeker distinct miRNA expression profile���providing ample op-