Human gene mutation database-a biomedical information and research resource.
- PubMed: 10612821
Abstract
Although 20 years have elapsed since the first single basepair substitution underlying an inherited disease in humans was characterised at the DNA level, the initiative has only recently been taken to establish central database resources for pathological genetic variants. Disease-associated gene lesions are currently collected and publicised by the Human Gene Mutation Database (HGMD) in Cardiff, locus-specific mutation databases, and to some extent also by the Genome Database (GDB) and Online Mendelian Inheritance in Man (OMIM). To date, HGMD represents the only comprehensive and publicly available database of gene lesions underlying human inherited disease. By July 1999, HGMD contained over 18,000 different mutations from some 900 human genes, the majority being single basepair substitutions. In addition to its potential as an information resource for clinicians and genetic counsellors, HGMD has allowed molecular geneticists to address a variety of biological questions through meta-analysis of the collated data. HGMD also promises to assist research workers in optimising mutation search strategies for a given gene. A questionnaire sent out to, and answered by, the editors of 20 key journals revealed that human genetics journals are increasingly reluctant to publish mutation reports. Electronic data submission and publication facilities are therefore urgently required. The World Wide Web (WWW) provides an excellent medium within which to combine the centralised management of basic mutation data, including rigorous quality control, with the possibility of publishing additional mutation-related information. In response to these needs, HGMD has both instituted a collaboration with Springer-Verlag GmbH, Heidelberg, to potentiate free online submission and electronic publication of human gene mutation data and developed links with the curators of locus-specific mutation databases.
Human gene mutation database-a biomedical information and research resource.
© 2000 WILEY-LISS, INC.
MDI SPECIAL ARTICLE
Human Gene Mutation DatabaseA Biomedical
Information and Research Resource
Michael Krawczak,* Edward V. Ball, Iain Fenton, Peter D. Stenson, Shaun Abeysinghe,
Nick Thomas, and David N. Cooper
Institute of Medical Genetics, University of Wales College of Medicine, Heath Park, UK
Although 20 years have elapsed since the first single basepair substitution underlying an inherited
disease in humans was characterised at the DNA level, the initiative has only recently been taken
to establish central database resources for pathological genetic variants. Disease-associated gene
lesions are currently collected and publicised by the Human Gene Mutation Database (HGMD) in
Cardiff, locus-specific mutation databases, and to some extent also by the Genome Database (GDB)
and Online Mendelian Inheritance in Man (OMIM). To date, HGMD represents the only compre-
hensive and publicly available database of gene lesions underlying human inherited disease. By
July 1999, HGMD contained over 18,000 different mutations from some 900 human genes, the
majority being single basepair substitutions. In addition to its potential as an information resource
for clinicians and genetic counsellors, HGMD has allowed molecular geneticists to address a vari-
ety of biological questions through meta-analysis of the collated data. HGMD also promises to
assist research workers in optimising mutation search strategies for a given gene. A questionnaire
sent out to, and answered by, the editors of 20 key journals revealed that human genetics journals
are increasingly reluctant to publish mutation reports. Electronic data submission and publication
facilities are therefore urgently required. The World Wide Web (WWW) provides an excellent
medium within which to combine the centralised management of basic mutation data, including
rigorous quality control, with the possibility of publishing additional mutation-related informa-
tion. In response to these needs, HGMD has both instituted a collaboration with Springer-Verlag
GmbH, Heidelberg, to potentiate free online submission and electronic publication of human
gene mutation data and developed links with the curators of locus-specific mutation databases.
Hum Mutat 15:45–51, 2000. © 2000 Wiley-Liss, Inc.
KEY WORDS: mutation database; MDI; HGMD; inherited disease; electronic publication; polymor-
phism; association
Received 19 July 1999; accepted revised manuscript 7 Octo-
ber 1999.
*Correspondence to: Prof. Michael Krawczak, Institute of Medi-
cal Genetics, University of Wales College of Medicine, Heath Park
CF14 4XN, UK. E-mail: krawczak@cardiff.ac.uk
Contract grant sponsors: SmithKline Beecham; Pfizer;
Deutsche Forschungsgemeinschaft Heisenberg; Contract grant
number: Kr 1093/5-1.
HISTORY
The Cardiff-based Human Gene Mutation Data-
base (HGMD) represents the only comprehensive
and publicly available collation of germline muta-
tions underlying human inherited disease [Cooper
and Krawczak, 1996; Krawczak and Cooper, 1997].
Its conceptual origin probably dates back to the 9
th
International Human Gene Mapping Conference,
held in Paris in 1987. On that occasion, two of us
(DNC and MK) were staying at the same hotel,
which offered a lovely garden and a location many
miles away from the actual conference site. This pro-
vided an idyllic setting for informal discussions and
exchange of ideas and, over a few glasses of wine on
a pleasant September evening, we started to explore
how a biologist (DNC) and a mathematician (MK)
could cooperate effectively in the field of human
molecular genetics.
In 1979, the first single basepair substitution in
a human gene underlying a genetic disorder had
been reported: an AAG to TAG nonsense muta-
tion in codon 17 of the β-globin gene resulting in
β-thalassemia [Chang and Kan, 1979]. Following
this breakthrough, further germline mutations
underlying human inherited disease were char-
acterised at the molecular level. However, even
by the mid-1980s results were still being published
in the medical and biological scientific literature
in comparatively small quantities. This notwith-
standing, it was clearly apparent that, with time,
human gene mutation data would steadily accu-
mulate and come to represent an invaluable re-
source for use in both diagnostics and research.
When analysed with the appropriate statistical
techniques, human pathological germline muta-
tions could provide valuable information about the
underlying molecular mechanisms of mutagenesis.
The development and subsequent application of
the methodology necessary for such meta-analy-
ses was therefore deemed to be eminently feasible.
However, no comprehensive compendium of the
approximately 200 human gene mutations pub-
lished by that time was readily available. In order
to potentiate the envisaged meta-analyses, we
therefore had to both initiate and maintain a col-
lection of mutation data ourselves.
Initially, the data being entered into the data-
base came almost exclusively from regular manual
surveys of a handful of the most important scien-
tific journals. Ten years ago, such a procedure was
still sufficient to ensure virtually comprehensive
coverage of human gene mutation reports. Data
retrieval was confined to those mutation types for
which reports were accumulating at rates high
enough to warrant early scientific attention. To this
end, the categories collated comprised single
basepair substitutions, as well as microdeletions
and microinsertions in coding regions and splice-
relevant regions of genes. A series of articles [Coo-
per and Krawczak, 1990, 1991; Krawczak and
Cooper, 1991; Krawczak et al., 1992] and a book
[Cooper and Krawczak, 1993] summarising the
results of these scientific analyses were published
in the early 1990s at a time when the datasets used
were approximately 10% of their current sizes. The
major outcome of these studies was the recogni-
tion that human gene mutation is a highly se-
quence-specific process, a notion which has
important implications for the nature, prevalence,
and therefore diagnosis of human genetic disease.
Since certain DNA sequences were found to be
hypermutable, clues could in principle be obtained
as to the endogenous mutational mechanisms in-
volved in different types of lesions. Further, knowl-
edge of these mechanisms could be important both
for basic research and for molecular diagnostic
medicine insofar as it could facilitate improvements
in the design and efficacy of mutation search pro-
cedures.
After a more or less accidental professional re-
union in South Wales in 1996, the two founding
curators of HGMD decided not only to continue
maintenance of the database but also to expand
its scope so as to include mutation types not pre-
viously covered. In addition, the collation of other
mutation-related information such as the reference
cDNA sequences of the genes affected by muta-
tion (see below) was initiated to facilitate the qual-
ity control of the mutation data and to provide a
basis for subsequent scientific studies.
Although originally established for the study of
mutational mechanisms in human genes, the da-
tabase soon acquired a much broader utility by
providing information of practical importance to
researchers in human molecular genetics as well
as genetic counsellors and physicians interested in
a specific inherited condition. Thus, an overview
of the mutational spectra as hitherto determined
could be easily obtained for a large number of genes.
The types of lesions as well as their spatial distri-
bution were readily discernible in concise format;
references to the first literature reports allowed
additional information to be obtained without a
need for extensive literature searches. Finally, even
in 1996, the database turned out to represent the
only fully comprehensive repository of human gene
mutation data in existence worldwide. Although
literature reports of lesions in human genes had
been referred to before in Online Mendelian In-
heritance in Man (OMIM) [see Hamosh et al.,
2000], their data coverage had been far from ex-
haustive and was largely phenotype-driven. In view
of this general utility, the database was made pub-
licly available (under the name HGMD) in April
1996. It is now accessible, free of charge, through
the World Wide Web (HGMD: http://www.uwcm.
ac.uk/uwcm/mg/hgmd0.html) and is accessed over
4,000 times per week. HGMD was picked as a
“Cool Site” by Netscape in July 1996 and a “Top
Site” by BioMedNet in June 1998.
CURRENT STATUS
HGMD comprises published single basepair sub-
stitutions in coding, regulatory, and splicing-rel-
evant regions of human nuclear genes, as well as
deletions, duplications, insertions, repeat expan-
sions, combined microinsertions and -deletions
(“indels”), and complex rearrangements (but not
somatic lesions or mitochondrial genome muta-
tions). Data are accessible via the Internet on a
gene-wise basis; every gene has been allocated one
page per mutation type, if data of that type are
present. Mutations are entered only once to avoid
confusion between recurrent and identical-by-de-
scent lesions. This is because reliable discrimina-
tion between these two alternatives requires
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


