This article contains data related to the research article ‘Application of bi-clustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials’ (Williams and Halappanavar, 2015) . The presence of diverse types of nanomaterials (NMs) in commerce has grown significantly in the past decade and as a result, human exposure to these materials in the environment is inevitable. The traditional toxicity testing approaches that are reliant on animals are both time- and cost- intensive; employing which, it is not possible to complete the challenging task of safety assessment of NMs currently on the market in a timely manner. Thus, there is an urgent need for comprehensive understanding of the biological behavior of NMs, and efficient toxicity screening tools that will enable the development of predictive toxicology paradigms suited to rapidly assessing the human health impacts of exposure to NMs. In an effort to predict the long term health impacts of acute exposure to NMs, in Williams and Halappanavar (2015) , we applied bi-clustering and gene set enrichment analysis methods to derive essential features of altered lung transcriptome following exposure to NMs that are associated with lung-specific diseases. Several datasets from public microarray repositories describing pulmonary diseases in mouse models following exposure to a variety of substances were examined and functionally related bi-clusters showing similar gene expression profiles were identified. The identified bi-clusters were then used to conduct a gene set enrichment analysis on lung gene expression profiles derived from mice exposed to nano-titanium dioxide, carbon black or carbon nanotubes (nano-TiO2, CB and CNTs) to determine the disease significance of these data-driven gene sets. The results of the analysis correctly identified all NMs to be inflammogenic, and only CB and CNTs as potentially fibrogenic. Here, we elaborate on the details of the statistical methods and algorithms used to derive the disease relevant gene signatures. These details will enable other investigators to use the gene signature in future Gene Set Enrichment Analysis studies involving NMs or as features for clustering and classifying NMs of diverse properties.
Williams, A., & Halappanavar, S. (2017). Application of bi-clustering of gene expression data and gene set enrichment analysis methods to identify potentially disease causing nanomaterials. Data in Brief, 15, 933–940. https://doi.org/10.1016/j.dib.2017.10.060