Nitrogen-fixing microbes affect biogeochemical cycling, agricultural productivity, and microbial ecosystems, and their distributions have been investigated intensively using genomic and metagenomic sequencing. Currently, insights into nitrogen fixers in the environment have been acquired by homology searches against nitrogenase genes, particularly the nifH gene, in public databases. Nitrogen fixation, a distinct process incorporating the inactive atmospheric nitrogen into the active biological processes, has been a major topic in biological and geochemical studies. Currently, insights into diversity and distribution of nitrogen-fixing microbes are dependent upon homology-based analyses of nitrogenase genes, especially the nifH gene, which are broadly conserved in nitrogen-fixing microbes. Here, we report the pitfall of using nifH as a marker of microbial nitrogen fixation. We exhaustively analyzed genomes in RefSeq (231,908 genomes) and KEGG (6,509 genomes) and cooccurrence and gene order patterns of nitrogenase genes (including nifH ) therein. Up to 20% of nifH -harboring genomes lacked nifD and nifK , which encode essential subunits of nitrogenase, within 10 coding sequences upstream or downstream of nifH or on the same genome. According to a phenotypic database of prokaryotes, no species and strains harboring only nifH possess nitrogen-fixing activities, which shows that these nifH genes are “pseudo”- nifH genes. Pseudo- nifH sequences mainly belong to anaerobic microbes, including members of the class Clostridia and methanogens. We also detected many pseudo- nifH reads from metagenomic sequences of anaerobic environments such as animal guts, wastewater, paddy soils, and sediments. In some samples, pseudo- nifH overwhelmed the number of “true” nifH reads by 50% or 10 times. Because of the high sequence similarity between pseudo- and true- nifH , pronounced amounts of nifH -like reads were not confidently classified. Overall, our results encourage reconsideration of the conventional use of nifH for detecting nitrogen-fixing microbes, while suggesting that nifD or nifK would be a more reliable marker. IMPORTANCE Nitrogen-fixing microbes affect biogeochemical cycling, agricultural productivity, and microbial ecosystems, and their distributions have been investigated intensively using genomic and metagenomic sequencing. Currently, insights into nitrogen fixers in the environment have been acquired by homology searches against nitrogenase genes, particularly the nifH gene, in public databases. Here, we report that public databases include a significant amount of incorrectly annotated nifH sequences (pseudo- nifH ). We exhaustively investigated the genomic structures of nifH -harboring genomes and found hundreds of pseudo- nifH sequences in RefSeq and KEGG. Over half of these pseudo- nifH sequences belonged to members of the class Clostridia , which is supposed to be a prominent nitrogen-fixing clade. We also found that the abundance of nitrogen fixers in metagenomes could be overestimated by 1.5 to >10 times due to pseudo- nifH recorded in public databases. Our results encourage reconsideration of the prevalent use of nifH as a marker of nitrogen-fixing microbes.
CITATION STYLE
Mise, K., Masuda, Y., Senoo, K., & Itoh, H. (2021). Undervalued Pseudo- nifH Sequences in Public Databases Distort Metagenomic Insights into Biological Nitrogen Fixers. MSphere, 6(6). https://doi.org/10.1128/msphere.00785-21
Mendeley helps you to discover research relevant for your work.