Abstract
Thepaperdiscusses thecurrentstateofSindhicorpus construction in detail. Sindhi corpus development issues including corpusacquisition,preprocessing,andtokenizationare discussedindetail.Preliminary resultsandobservationswhich includeletterunigram,bigram andtrigram frequencies;word frequenciesandwordbigram frequenciesarepresented.Current stateofSindhicorpuswithitslimitationsandfuturework isalso discussed.Thepaperalsoexplorestheorthography andscriptof Sindhilanguage with referenceto corpus development.
Cite
CITATION STYLE
Rahman, M. U. (2015). Towards Sindhi Corpus Construction. Linguistics and Literature Review, 1(1), 39–47. https://doi.org/10.32350/llr/11/04
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.