Revisiting Mendelian disorders th...
REVIEW PAPER Revisiting Mendelian disorders through exome sequencing Chee-Seng Ku ��� Nasheen Naidoo ��� Yudi Pawitan Received: 29 November 2010 / Accepted: 3 February 2011 / Published online: 18 February 2011 �� Springer-Verlag 2011 Abstract Over the past several years, more focus has been placed on dissecting the genetic basis of complex diseases and traits through genome-wide association stud- ies. In contrast, Mendelian disorders have received little attention mainly due to the lack of newer and more pow- erful methods to study these disorders. Linkage studies have previously been the main tool to elucidate the genetics of Mendelian disorders however, extremely rare disorders or sporadic cases caused by de novo variants are not amendable to this study design. Exome sequencing has now become technically feasible and more cost-effective due to the recent advances in high-throughput sequence capture methods and next-generation sequencing technol- ogies which have offered new opportunities for Mendelian disorder research. Exome sequencing has been swiftly applied to the discovery of new causal variants and can- didate genes for a number of Mendelian disorders such as Kabuki syndrome, Miller syndrome and Fowler syndrome. In addition, de novo variants were also identified for spo- radic cases, which would have not been possible without exome sequencing. Although exome sequencing has been proven to be a promising approach to study Mendelian disorders, several shortcomings of this method must be noted, such as the inability to capture regulatory or evo- lutionary conserved sequences in non-coding regions and the incomplete capturing of all exons. Introduction Over the past two decades, much progress has been made in identifying the causal variants or mutations and candi- date genes for Mendelian (single gene or monogenic) dis- orders through mainly traditional linkage studies (Botstein and Risch 2003). The terms ���variant��� and ���mutation��� have been used interchangeably throughout the literature how- ever, ���variant��� will be used consistently throughout this article. Mendelian or monogenic disorders encompass ���classical��� disorders such as Freeman���Sheldon syndrome (Ng et al. 2009), Fowler syndrome (Lalonde et al. 2010) and the monogenic form of complex diseases such as autosomal-dominant amyotrophic lateral sclerosis (Johnson et al. 2010b) and hypercholesterolemia (Rios et al. 2010). Currently, causal variants for approximately 3,000 Men- delian disorders have been identified (Online Mendelian Inheritance in Man, http://www.ncbi.nlm.nih.gov/omim). Genome-wide linkage studies followed by positional cloning have been very successful in identifying causal variants for Mendelian disorders because of the perfect segregation pattern of the causal variant with the disorder according to Mendelian inheritance patterns (e.g. autoso- mal dominant, autosomal recessive and X-linked). This perfect segregation pattern is due to complete or almost- complete penetrance of the causal variant. In genome-wide linkage studies no prior hypothesis is needed as evenly distributed genetic markers, for example several hundred microsatellites or several thousand single polymorphisms C.-S. Ku (&) N. Naidoo Department of Epidemiology and Public Health, Centre for Molecular Epidemiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore e-mail: email@example.com Y. Pawitan (&) Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden e-mail: firstname.lastname@example.org 123 Hum Genet (2011) 129:351���370 DOI 10.1007/s00439-011-0964-2
(SNPs) are sufficient to cover the whole genome. There are only a limited number of recombination events within a family or pedigree. The genetic markers will reveal geno- mic regions which are co-segregated in affected individu- als. This could then be followed up by positional cloning to identify the causal variants and candidate genes within the genomic regions, which can be up to tens of centimorgans (cM). On the contrary, candidate-gene based linkage studies require a prior hypothesis and are not designed to reveal novel genomic regions for Mendelian disorders (Botstein and Risch 2003). Classical linkage studies are the main tool for eluci- dating the genetics of Mendelian disorders however, not all of these disorders are amendable to this study design. Homozygosity-mapping, on the other hand, is a more powerful and effective approach to study recessive disor- ders in consanguineous families (Harville et al. 2010 Pang et al. 2010 Iseri et al. 2010 Collin et al. 2010). For those disorders that are not amendable to these two conventional approaches, their causal variants remain elusive. These disorders include (a) ���extremely rare��� Mendelian disorders where only a small number of cases are available, (b) unrelated cases from different families and (c) sporadic cases due to de novo variants. For some Mendelian disor- ders, cases can occur sporadically by a de novo or new variant arising during meiosis and which is undetected in the parents (Table 1). We use the term ���extremely rare��� to distinguish those Mendelian disorders which cannot be investigated by linkage studies due to their low incidence in the population from ���rare��� disorders where an adequate sample size can still be collected for linkage studies. For extremely rare disorders, usually only several affected siblings in one family or several unrelated cases from different families are available for investigation. However, exome (the collection of all exons in the human genome) sequencing now offers new opportunities to study extre- mely rare disorders and sporadic cases (Table 1) as well as complex diseases (Li et al. 2010b). Two recent review papers on exome sequencing of Mendelian disorders focused on variant filtering strategies (Ng et al. 2010c) and novel genomic techniques (Ku- hlenbaumer �� et al. 2011). However, we review this area in a broader context and focus on several topics which have not been comprehensively discussed previously. In this paper, we start by discussing the need for exome sequencing of Mendelian disorders and the technological developments leading to the feasibility of this approach. We also recall the importance and value of interrogating the genetics of Mendelian disorders which tend to have been given less emphasis in the era of genome-wide association studies (GWAS) and then further elaborate on the application of exome sequencing in elucidating the genetics of Mendelian disorders and the recent advances achieved in the field. The pros and cons of currently employed variant filtering strategies will also be discussed. We also examine the advantages and challenges of exome sequencing in iden- tifying causal variants for Mendelian disorders. Finally, as most of the known causal variants were found in exons (protein coding regions), we share our views on whether whole-genome sequencing is needed for Mendelian disor- der research. Why exome sequencing is needed The linkage study design is unsuitable for extremely rare Mendelian disorders because of the difficulty in collection of an adequate number of affected individuals (of multi- generational pedigree) and families for a statistically powerful study. This approach is also not applicable for sporadic cases, for example Kabuki syndrome, an extre- mely rare autosomal-dominant Mendelian disorder with an estimated incidence of 1 in 32,000, where the majority of reported cases are sporadic (Ng et al. 2010a). As a result, the causal variant and candidate gene for Kabuki syndrome have remained unknown until recently. A total of 33 different causal variants in MLL2 were identified by Ng et al. (2010a) in 35 of 53 individuals affected with Kabuki syndrome. Additionally, in 12 of these individuals whose parental samples were available, their variants in MLL2 were found to have occurred de novo. Only ten of these individuals were investigated in the discovery study using exome sequencing to identify the causal variants in MLL2, and the exons of this gene were then screened in an additional 43 cases using Sanger sequencing (Ng et al. 2010a). Similarly, most of the cases of Schinzel���Giedion syn- drome have occurred sporadically suggesting that hetero- zygous de novo variants may have caused the disorder. This has now been further supported by identifying de novo causal variants in SETBP1 in four individuals affec- ted with this disorder through exome sequencing (Hoischen et al. 2010). These de novo causal variants would not have been otherwise identified without exome sequencing. In contrast, although none of the causal variants in DHODH appeared to have occurred de novo for Miller Syndrome, it is still an extremely rare disorder (Ng et al. 2010b). Therefore, these disorders are intractable to the linkage study design. Collectively, these studies have demonstrated the advantages of exome sequencing over the linkage study design in situations where a small number of unrelated samples or sporadic cases are available. Up to ten samples have been previously interrogated by exome sequencing in discovery studies (Table 1). Furthermore, the linkage study design is also not robust enough for Mendelian disorders with genetic heterogeneity 352 Hum Genet (2011) 129:351���370 123
Table 1 Summaries of exome and whole-genome sequencing studies of Mendelian disorders Study, Mendelian disorder and sample Variant filtering methodology and analysis strategy Major results (A) Exome sequencing of unrelated individuals Ng et al. 2009 Freeman���Sheldon syndrome Four unrelated individuals ��� First investigated how many genes had one or more non-synonymous cSNPs, splice site disruptions or coding indels in one or several exomes ��� Then applied filters to remove presumably common variants, removing dbSNP-catalogued variants from consideration reduced the number of candidates considerably ��� The eight HapMap exomes provided a filter nearly equivalent to dbSNP ��� Combining the two catalogs had a synergistic effect, such that the candidate list could be narrowed to a single gene ��� MYH3 is the only gene where (1) at least one non-synonymous cSNP, splice-site disruption or coding indel is observed in all four individuals (2) the mutations are neither in dbSNP, nor in the eight HapMap exomes Ng et al. 2010a Kabuki syndrome Ten unrelated individuals ��� Focused primarily on nonsynonymous variants, splice acceptor and donor site mutations and coding indels ��� Defined variants as previously unidentified if they were absent from all datasets used for comparison, including dbSNP129, the 1000 Genomes Project, exome data from 16 individuals and 10 exomes sequenced as part of the Environmental Genome Project (EGP) ��� To allow for a modest degree of genetic heterogeneity and/or missing data, conducted a less stringent analysis by looking for candidate genes shared among subsets of affected individuals ��� Searched for subsets of x out of 10 exomes having C1 previously unidentified variant in the same gene, with x = 1 to x = 10 ��� Genotypic and/or phenotypic stratification would facilitate the prioritization of candidate genes identified by subset analysis ��� Assigned a categorical rank to each individual with Kabuki syndrome based on a subjective assessment of the presence of, or similarity to, the canonical facial characteristics of Kabuki syndrome and the presence of developmental delay and/or major birth defects ��� Identified a nonsense substitution or frameshift indel in MLL2 in seven of the ten individuals with Kabuki syndrome ��� Further analyzed the three cases in which did not initially find a loss-of-function variant in MLL2 ��� Sanger sequencing did identify frameshift indels in two of these three cases ��� Screened all 54 exons of MLL2 in 43 additional cases by Sanger sequencing ��� Previously unidentified nonsynonymous, nonsense or frameshift mutations in MLL2 were found in 26 of these 43 cases Hoischen et al. 2010 Schinzel-Giedion syndrome Four unrelated individuals ��� On average, 21,800 genetic variants were identified per individual, including 5,351 nonsynonymous changes ��� A comparison with the NCBI dbSNP build130 as well as with recently released SNP data from other groups and in-house SNP data showed that [95% of all variants investigated here were previously reported SNPs ��� Focused on the 12 genes for which all four individuals studied carried variants and found that only two genes showed variants at different genomic positions ��� One of these two candidate genes, CTBP2, was excluded from further analysis because it contained numerous variants found during different in-house exome sequencing experiments ��� The second candidate was SETBP1 ��� Validation of all four variants in this gene by Sanger sequencing confirmed that these variants were indeed present in a heterozygous state in all four affected individuals ��� Tested the DNA of the parents of the affected individuals, which showed that all mutations occurred de novo ��� Using Sanger sequencing, identified SETBP1 mutations in eight out of nine additional individuals ��� For six of the eight follow-up cases, parental DNA was available, and the mutations present in the affected individuals were again shown to have occurred de novo Hum Genet (2011) 129:351���370 353 123