Although deep web analysis has been studied extensively, there is no succinct formalism to describe user interactions with AJAX-enabled web applications. Toward this end, we introduce OXPath as a superset of XPath 1.0. Beyond XPath, OXPath is able (1) to fill web forms and trigger DOM events, (2) to access dynamically computed CSS attributes, (3) to navigate between visible form fields, and (4) to mark relevant information for extraction. This way, OXPath expressions can closely simulate the human interaction relevant for navigation rather than rely exclusively on the HTML structure. Thus, they are quite resilient against technical changes. We demonstrate the expressiveness and practical efficacy of OXPath to tackle a group flight planning problem. We use the OXPath implementation and visual interface to access the popular, highly-scripted travel site Kayak. We show, how to formulate OXPath expressions to extract all booking information with just a few lines of code.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below