Working with corpus construction becomes an interesting alternative to different applications of natural language processing, such as, question-answering, machine translation, information retrieval, etc. Similarly, with the heterogeneous data and the user demands for the accurate information, many studies have accentuated the need of the Web to highlight the corpus construction. As well as, Arabic doesn't have an equivalent number of linguistic corpuses as compared to other languages like English. In this paper, we focus on building our corpus of Arab questions-texts. We present a method for recovering text passages. This method is based on a real automatic interrogation of Google, in order to generate passages of texts and answer the factual questions. The first part of this paper describes the formal details about this method; the second part presents some experiments and results that validate our method.
Bakari, W., Bellot, P., & Neji, M. (2016). AQA-WebCorp: Web-based Factual Questions for Arabic. In Procedia Computer Science (Vol. 96, pp. 275–284). Elsevier B.V. https://doi.org/10.1016/j.procs.2016.08.140