Cryptocurrencies are highly anonymous, poorly regulated in many countries, and can issue tokens at near-zero cost using existing platforms. As a result, there is no shortage of fraudulent cryptocurrencies that raise large sums of money through hype, then disappear and do little actual project development. The prevalence of fraudulent cryptocurrencies not only harms investors but can also prevent sound companies from raising funds. To remedy this situation, it would be useful to develop a method to determine whether a particular cryptocurrency is fraudulent or not. The information in cryptocurrency whitepapers could be useful in detecting fraudulent cryptocurrency, but there are no clear criteria to evaluate the reliability and feasibility of their content. Besides, most studies analyzing whitepapers focus on the success or failure of ICO ”fundraising” and fail to adequately consider the ongoing development and operation of the project. On the other hand, a few studies have attempted to detect fraudulent cryptocurrencies from whitepapers, but their results suggest the possibility of identifying fraud with high accuracy. The objective of this paper is to build a model to detect fraudulent cryptocurrencies from whitepapers using natural language processing and machine learning techniques, and to verify whether the model has sufficient predictive accuracy in detecting fraud, after solving the problems of previous studies. We collected 250 cryptocurrency whitepapers consisting of 150 frauds and 100 controls, extracted features, and applied multiple machine learning methods to classify frauds and controls. Then analyzed the feature differences between the fraud and control groups, and examined the tendency of fraudulent cryptocurrency whitepapers. We observed 0.841 F1 Score for the best prediction model, which outperforms previous studies. Furthermore, the performance of K-Means, which is unsupervised learning, was not significantly lower than that of other machine learning methods, and a certain level of accuracy was confirmed. Therefore, there is a possibility that K-Means can be used in cases where fraud criteria cannot be clearly defined. We also found that fraudulent cryptocurrency whitepapers used relatively more business and finance-related words. On the other hand, whitepapers in the control group tended to use more blockchain-related technical terms.
CITATION STYLE
Ueno, M., Sano, T., Honda, H., & Nakamura, S. (2023). Detecting Fraudulent Cryptocurrencies Using Natural Language Processing Techniques. Transactions of the Japanese Society for Artificial Intelligence, 38(5). https://doi.org/10.1527/tjsai.38-5_E-N34
Mendeley helps you to discover research relevant for your work.