Study of various methods for tokenization

Abigail Rai; Samarjeet Borah

Book Chapter

Study of various methods for tokenization

Rai A
Borah S

Springer Science and Business Media Deutschland GmbH, (2021), 193-200

DOI: 10.1007/978-981-15-6198-6_18

18Citations

42Readers

Get full text

Abstract

Tokenization is the mechanism of splitting or fragmenting the sentences and words to its possible smallest morpheme called as token. Morpheme is smallest possible word after which it cannot be broken further. As the tokenization is initial phase and as well very crucial phase of Part-Of-Speech (POS) tagging in Natural Language Processing (NLP). Tokenization could be sentence level and word level. This paper analyzes the possible tokenization methods that can be applied to tokenize the word efficiently.

Author supplied keywords

Etc
Morpheme
Natural Language Processing (NLP)
Part-Of-Speech tagging (POS)
Token
Tokenization

Cite

CITATION STYLE

APA

Rai, A., & Borah, S. (2021). Study of various methods for tokenization. In Lecture Notes in Networks and Systems (Vol. 137, pp. 193–200). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-6198-6_18

Study of various methods for tokenization

Abstract

Author supplied keywords

Cite

Register to see more suggestions