This paper presents a large-scale language model for daily-generated large-size text corpora using Hadoop in a cloud environment for improving the performance of a human-robot interaction system. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented through a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). We performed trigram count extraction using Hadoop MapReduce to adapt our large-scale language model. Three hours are estimated on six servers to extract trigram counts for a large text corpus of 200 million word Twitter texts, which is the approximate number of daily-generated Twitter texts. © 2013 Springer Science+Business Media Dordrecht.
CITATION STYLE
Jung, D. Y., Lee, H. J., Park, S. Y., Koo, M. W., Kim, J. H., Park, J. S., … Lee, Y. K. (2013). Implementation of a large-scale language model in a cloud environment for human-robot interaction. In Lecture Notes in Electrical Engineering (Vol. 253 LNEE, pp. 957–965). Springer Verlag. https://doi.org/10.1007/978-94-007-6996-0_101
Mendeley helps you to discover research relevant for your work.