In this era of information overload, text segmentation can be used effectively to locate and extract information specific to users’ need within the huge collection of documents. Text segmentation refers to the task of dividing a document into smaller labeled text fragments according to the semantic commonality of the contents. Due to the presence of rich semantic information in legal text, text segmentation becomes very crucial in legal domain for information retrieval. But such supervised classification requires huge training data for building efficient classifier. Collecting and manually annotating gold standards in NLP is very expensive. In recent past the question of whether we can satisfactorily replace them with automatically annotated data is arising more and more interest. This work presents two approaches entirely based in domain knowledge for automatic generation of training data which can further be used for segmentation of court judgments.
CITATION STYLE
Wagh, R. S., & Anand, D. (2020). A novel approach of augmenting training data for legal text segmentation by leveraging domain knowledge. In Advances in Intelligent Systems and Computing (Vol. 910, pp. 53–63). Springer Verlag. https://doi.org/10.1007/978-981-13-6095-4_4
Mendeley helps you to discover research relevant for your work.