Shabd: A psycholinguistic database for Hindi

Ark Verma; Vivek Sikarwar; Himanshu Yadav; Ranjith Jaganathan; Pawan Kumar

Journal ArticleOPEN ACCESS

Shabd: A psycholinguistic database for Hindi

Behavior Research Methods (2022) 54(2) 830-844

DOI: 10.3758/s13428-021-01625-2

11Citations

15Readers

Abstract

We present Shabd, a psycholinguistic database in Hindi. It is based on a corpus of 1.4 billion words from electronic newspapers and news websites. Word frequencies and part of speech information have been derived and are made available in a cleaned list of 34 thousand hand-selected words, and a list of 96 thousand words observed with a frequency of more than 100 times in the corpus. Next to the Shabd database, we also make a list with all 2.3 million word types available and a list with the 2.5 million most frequent word pairs (word bigrams). The quality of the word frequency measure was tested in two lexical decision tasks. We observed that the Shabd word frequencies outperform existing frequencies based on smaller corpora of newspapers but not the Worldlex word frequencies based on an analysis of blogs. We also observed that word frequency accounts for as much variance as contextual diversity (operationalized as the number of documents in which the words were observed). The Shabd database is freely available for research.

Author supplied keywords

Cite

CITATION STYLE

APA

Verma, A., Sikarwar, V., Yadav, H., Jaganathan, R., & Kumar, P. (2022). Shabd: A psycholinguistic database for Hindi. Behavior Research Methods, 54(2), 830–844. https://doi.org/10.3758/s13428-021-01625-2

Shabd: A psycholinguistic database for Hindi

Abstract

Author supplied keywords

Cite

Register to see more suggestions