Shabd: A psycholinguistic database for Hindi

4Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present Shabd, a psycholinguistic database in Hindi. It is based on a corpus of 1.4 billion words from electronic newspapers and news websites. Word frequencies and part of speech information have been derived and are made available in a cleaned list of 34 thousand hand-selected words, and a list of 96 thousand words observed with a frequency of more than 100 times in the corpus. Next to the Shabd database, we also make a list with all 2.3 million word types available and a list with the 2.5 million most frequent word pairs (word bigrams). The quality of the word frequency measure was tested in two lexical decision tasks. We observed that the Shabd word frequencies outperform existing frequencies based on smaller corpora of newspapers but not the Worldlex word frequencies based on an analysis of blogs. We also observed that word frequency accounts for as much variance as contextual diversity (operationalized as the number of documents in which the words were observed). The Shabd database is freely available for research.

Cite

CITATION STYLE

APA

Verma, A., Sikarwar, V., Yadav, H., Jaganathan, R., & Kumar, P. (2022). Shabd: A psycholinguistic database for Hindi. Behavior Research Methods, 54(2), 830–844. https://doi.org/10.3758/s13428-021-01625-2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free