Objective: With using natural language processing (NLP) technology to analyze and process the text of 'Treatise on Febrile Diseases (TFDs)' for the sake of finding important information, this paper attempts to apply NLP in the field of text mining of traditional Chinese medicine (TCM) literature. Materials and Methods: Based on the Python language, the experiment invoked the NLP toolkit such as Jieba, nltk, gensim, and sklearn library, and combined with Excel and Word software. The text of 'TFDs' was sequentially cleaned, segmented, and moved the stopped words, and then implementing word frequency statistics and analysis, keyword extraction, named entity recognition (NER) and other operations, finally calculating text similarity. Results: Jieba can accurately identify the herbal name in 'TFDs.' Word frequency statistics based on the word segmentation found that 'warm therapy' is an important treatment of 'TFDs.' Guizhi decoction is the main prescription, and five core decoctions are identified. Keyword extraction based on the term 'frequency-inverse document frequency' algorithm is ideal. The accuracy of NER in 'TFDs' is about 86%; latent semantic indexing model calculating the similarity, 'Understanding of Synopsis of Golden Chamber (SGC)' is much more similar with 'SGC' than with 'TFDs.' The results meet expectation. Conclusions: It lays a research foundation for applying NLP to the field of text mining of unstructured TCM literature. With the combination of deep learning technology, NLP as an important branch of artificial intelligence will have broader application prospective in the field of text mining in TCM literature and construction of TCM knowledge graph as well as TCM knowledge services.
CITATION STYLE
Zhao, K., Shi, N., Sa, Z., Wang, H. X., Lu, C. H., & Xu, X. Y. (2020). Text mining and analysis of treatise on febrile diseases based on natural language processing. World Journal of Traditional Chinese Medicine, 6(1), 67–73. https://doi.org/10.4103/wjtcm.wjtcm_28_19
Mendeley helps you to discover research relevant for your work.