Financial forecasting using character n-gram analysis and readability scores of annual reports

Matthew Butler; Vlado Kešelj

Conference Proceedings

Financial forecasting using character n-gram analysis and readability scores of annual reports

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5549 LNAI 39-51

DOI: 10.1007/978-3-642-01818-3_7

25Citations

64Readers

Get full text

Abstract

Two novel Natural Language Processing (NLP) classification techniques are applied to the analysis of corporate annual reports in the task of financial forecasting. The hypothesis is that textual content of annual reports contain vital information for assessing the performance of the stock over the next year. The first method is based on character n-gram profiles, which are generated for each annual report, and then labeled based on the CNG classification. The second method draws on a more traditional approach, where readability scores are combined with performance inputs and then supplied to a support vector machine (SVM) for classification. Both methods consistently outperformed a benchmark portfolio, and their combination proved to be even more effective and efficient as the combined models yielded the highest returns with the fewest trades. © 2009 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Butler, M., & Kešelj, V. (2009). Financial forecasting using character n-gram analysis and readability scores of annual reports. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5549 LNAI, pp. 39–51). https://doi.org/10.1007/978-3-642-01818-3_7

Financial forecasting using character n-gram analysis and readability scores of annual reports

Abstract

Author supplied keywords

Cite

Register to see more suggestions