Taking the OXPath down the deep web

  • Sellers A
  • Furche T
  • Gottlob G
 et al. 
  • 17

    Readers

    Mendeley users who have this article in their library.
  • 2

    Citations

    Citations of this article.

Abstract

Although deep web analysis has been studied extensively, there is no succinct formalism to describe user interactions with AJAX-enabled web applications. Toward this end, we introduce OXPath as a superset of XPath 1.0. Beyond XPath, OXPath is able (1) to fill web forms and trigger DOM events, (2) to access dynamically computed CSS attributes, (3) to navigate between visible form fields, and (4) to mark relevant information for extraction. This way, OXPath expressions can closely simulate the human interaction relevant for navigation rather than rely exclusively on the HTML structure. Thus, they are quite resilient against technical changes. We demonstrate the expressiveness and practical efficacy of OXPath to tackle a group flight planning problem. We use the OXPath implementation and visual interface to access the popular, highly-scripted travel site Kayak. We show, how to formulate OXPath expressions to extract all booking information with just a few lines of code.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Andrew Sellers

  • Tim Furche

  • Georg Gottlob

  • Giovanni Grasso

  • Christian Schallhart

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free