This paper adds some theory to the growing literature of semantic space models. We motivate semantic space models from the perspective of distributional linguistics and show how an explicit mathematical formulation can provide a better understanding of existing models and suggest changes and improvements. In addition to pro- viding a theoretical framework for current models, we consider the implications of statistical aspects of language data that have not been addressed in the psychological modeling literature. Statistical approaches to language must deal principally with count data, and this data will typically have a highly skewed frequency distribution due to Zipf’s law. We consider the consequences of these facts for the construction of semantic space models, and present methods for removing frequency biases from se- mantic space models.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below