A Methodology on Converting 10-K Filings into a Machine Learning Dataset and Its Applications

1Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Companies listed on the stock exchange are required to share their annual reports with the U.S. Securities and Exchange Commission (SEC) within the first three months following the fiscal year. These reports, namely 10-K Filings, are presented to public interest by the SEC through an Electronic Data Gathering, Analysis, and Retrieval database. 10-K Filings use standard file formats (xbrl, html, pdf) to publish the financial reports of the companies. Although the file formats propose a standard structure, the content and the meta-data of the financial reports (e.g. tag names) is not strictly bound to a pre-defined schema. This study proposes a data collection and data preprocessing method to semantify the financial reports and use the collected data for further analysis (i.e. machine learning). The analysis of eight different datasets, which were created during the study, are presented using the proposed data transformation methods. As a use case, based on the datasets, five different machine learning algorithms were utilized to predict the existence of the corresponding company in the S&P 500 index. According to the strong machine learning results, the dataset generation methodology is successful and the datasets are ready for further use.

Cite

CITATION STYLE

APA

Sami Kacar, M., Yumusak, S., & Kodaz, H. (2023). A Methodology on Converting 10-K Filings into a Machine Learning Dataset and Its Applications. IEICE Transactions on Information and Systems, E106D(4), 477–487. https://doi.org/10.1587/TRANSINF.2022IIP0001

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free