SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts

Ruben Kruiper; Ioannis Konstas; Alasdair Gray; Farhad Sadeghineko; Richard Watson; Bimal Kumar

Conference ProceedingsOPEN ACCESS

SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts

Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop (2021) 129-143

DOI: 10.18653/v1/2021.nllp-1.14

5Citations

47Readers

Abstract

Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPAR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).

Cite

CITATION STYLE

APA

Kruiper, R., Konstas, I., Gray, A., Sadeghineko, F., Watson, R., & Kumar, B. (2021). SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts. In Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop (pp. 129–143). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.nllp-1.14

SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts

Abstract

Cite

Register to see more suggestions