Big Data Analytics (BDA) is important to utilize data from hospital systems to reduce healthcare costs. BDA enable queries of large volumes of patient data in an interactively dynamic way for healthcare. The study objective was high performance establishment of interactive BDA platform of hospital system. A Hadoop/MapReduce framework was established at University of Victoria (UVic) with Compute Canada/Westgrid to form a Healthcare BDA (HBDA) platform with HBase (NoSQL database) using hospital-specific metadata and file ingestion. Patient data profiles and clinical workflow derived from Vancouver Island Health Authority (VIHA), Victoria, BC, Canada. The proof-of-concept implementation tested patient data representative of the entire Provincial hospital systems. We cross-referenced all data profiles and metadata with real patient data used in clinical reporting. Query performance tested Apache tools in Hadoop’s ecosystem. At optimized iteration, Hadoop Distributed File System (HDFS) ingestion required three seconds but HBase required four to twelve hours to complete the Reducer of MapReduce. HBase bulkloads took a week for one billion (10TB) and over two months for three billion (30TB). Simple and complex query results showed about two seconds for one and three billion, respectively. Apache Drill outperformed Apache Spark. However, it was restricted to running more simplified queries with poor usability for healthcare. Jupyter on Spark offered high performance and customization to run all queries simultaneously with high usability. BDA platform of HBase distributed over Hadoop successfully; however, some inconsistencies of MapReduce limited operational efficiencies. Importance of Hadoop/MapReduce on representation of platform performance discussed.
Chrimes, D., Moa, B., Kuo, M.-H. (Alex), & Kushniruk, A. (2017). Operational Efficiencies and Simulated Performance of Big Data Analytics Platform over Billions of Patient Records of a Hospital System. Advances in Science, Technology and Engineering Systems Journal, 2(1), 23–41. https://doi.org/10.25046/aj020104