Extracting knowledge from Big Data is the process of transforming this data into actionable information. The exponential growth of data has initiated a myriad of new opportunities, and made data become the most valuable raw material of production for many organizations. Mining Big Data is coupled with some challenges, known as the 3V's of Big Data: Volume, Variety and Velocity. However, a major challenge that needs to be addressed, and often is ignored in the literature, concerns reliability. Actually, data is agglomerated from multiple disparate sources, and each of Knowledge Discovery (KDD) process steps may be carried out by different organizations. These considerations lead us to ask a critical question that is weather the information we have at each step is reliable enough to proceed to the next one? This paper therefore aims to provide a framework that automatically assesses reliability of the knowledge discovery process. We focus on Linked Open Data (LOD) as a source of data, as it constitutes a relevant data provider in many Big Data applications. However, our framework can also be adapted for unstructured data. This framework will assist scientists to automatically and efficiently measure the reliability of each KDD process stage as well as detect unreliable steps that should be revised. Following this methodology, KDD process will be optimized and therefore produce knowledge with higher quality.
Safhi, H. M., Frikh, B., & Ouhbi, B. (2019). Assessing reliability of Big Data Knowledge Discovery process. In Procedia Computer Science (Vol. 148, pp. 30–36). Elsevier B.V. https://doi.org/10.1016/j.procs.2019.01.005