Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of 'naturalness' in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient. © 2009 Springer Berlin Heidelberg.
CITATION STYLE
Di Iorio, A., Schirinzi, M., Vitali, F., & Marchetti, C. (2009). A natural and multi-layered approach to detect changes in tree-based textual documents. In Lecture Notes in Business Information Processing (Vol. 24 LNBIP, pp. 90–101). Springer Verlag. https://doi.org/10.1007/978-3-642-01347-8_8
Mendeley helps you to discover research relevant for your work.