Tokenization is the mechanism of splitting or fragmenting the sentences and words to its possible smallest morpheme called as token. Morpheme is smallest possible word after which it cannot be broken further. As the tokenization is initial phase and as well very crucial phase of Part-Of-Speech (POS) tagging in Natural Language Processing (NLP). Tokenization could be sentence level and word level. This paper analyzes the possible tokenization methods that can be applied to tokenize the word efficiently.
CITATION STYLE
Rai, A., & Borah, S. (2021). Study of various methods for tokenization. In Lecture Notes in Networks and Systems (Vol. 137, pp. 193–200). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-6198-6_18
Mendeley helps you to discover research relevant for your work.