Abstract
The first step towards making the text documents machine-readable is vectorization. Vectorisation allows the machines to understand textual content by transforming it into meaningful numerical representations. This study proposes a modified Bayesian vectorization and employing the Laplace smoothing method to reduce the dimensionality of features and improve the classification accuracy. Dataset of news articles was used in building the model and was evaluated across the metrics of precision, recall, F1-score, and accuracy. To validate the effectiveness of the enhancement, the model was compared to the Term Frequency and Inverse Document Frequency (TF-IDF) method. The results revealed that the proposed enhancement has significantly better results having 98% classification accuracy compared to 81% classification accuracy of the TF-IDF vectorization technique.
Cite
CITATION STYLE
Sueno, H. T. (2020). Converting Text to Numerical RepresentationusingModified Bayesian Vectorization Technique for Multi-Class Classification. International Journal of Advanced Trends in Computer Science and Engineering, 9(4), 5618–5623. https://doi.org/10.30534/ijatcse/2020/211942020
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.