A flight meta-search engine with metamorph
Proceedings of the 18th international conference on World wide web WWW 09 (2009)
- ISSN: 08963207
- ISBN: 9781605584874
- DOI: 10.1145/1526709.1526860
Available from portal.acm.org
or
Author-supplied keywords
Available from portal.acm.org
Page 1
A flight meta-search engine with metamorph
A Flight Meta-Search Engine with Metamorph ∗
Bernhard Krüpl
Wolfgang Holzinger
Yansen Darmaputra
DBAI Group
TU Wien, Austria
{holzing,kruepl,darmap}@dbai.tuwien.ac.at
Robert Baumgartner
Lixto Software GmbH
Vienna, Austria
baumgartner@lixto.com
ABSTRACT
We demonstrate a flight meta-search engine that is based
on the Metamorph framework. Metamorph provides mech-
anisms to model web forms together with the interactions
which are needed to fulfil a request, and can generate in-
teraction sequences that pose queries using these web forms
and collect the results. In this paper, we discuss an interest-
ing new feature that makes use of the forms themselves as
an information source. We show how data can be extracted
from web forms (rather than the data behind web forms) to
generate a graph of flight connections between cities.
The flight connection graph allows us to vastly reduce the
number of queries that the engine sends to airline websites in
the most interesting search scenarios; those that involve the
controversial practice of creative ticketing, in which agen-
cies attempt to find lower price fares by using more than
one airline for a journey. We describe a system which at-
tains data from a number of websites to identify promising
routes and prune the search tree. Heuristics that make use
of geographical information and an estimation of cost based
on historical data are employed. The results are then made
available to improve the quality of future search requests.
Categories and Subject Descriptors: H.3.4 [Informa-
tion Storage and Retrieval]: Systems and Software
General Terms: Algorithms, Design, Experimentation.
Keywords: Hidden Web, Web Data Extraction, Web Form
Mapping, Web Form Extraction.
1. INTRODUCTION
A typical flight meta-search engine forwards the route
query that it receives to other websites in its domain. The
retrieved results are aggregated and presented to the user.
We implemented a system that mimicks the so called cre-
ative ticketing practice of travel agencies by searching for
the more complex itineraries that involve changing airlines
during the journey. Many special air fare offers are only
available through the airline’s websites; but each request to
an airline website is a costly procedure. We thus imple-
mented a system that uses a flight connection graph and a
query planner to limit the number of requests by checking
only the most promising paths. Our system has been built
on top of Metamorph, a meta-search framework aiming to
∗This research is supported in part by the Austrian Forschungs-
fo¨rderungsgesellschaft FFG under project grant 812991.
Copyright is held by the author/owner(s).
WWW 2009, April 20–24, 2009, Madrid, Spain.
ACM 978-1-60558-487-4/09/04.
generate maintainable vertical deep web search engines [1].
We use Metamorph’s ontology and rule based form interac-
tion modelling to map and fill out the web search forms.
In related research, the WISE project [2] aims to integrate
access to web databases. The MetaQuerier [3] uses the reg-
ularities of web forms and automatically matches interfaces.
Metamorph focusses on modelling form interactions, and our
meta-search system retrieves its initial route knowledge from
web forms themselves and concentrates on creative ticketing
heuristics via hub identification.
2. EXTRACTING DATA FROM FORMS
Web forms are the gateway to the hidden or deep Web.
It is quite clear that any automated search system has to
model the meaning of these forms in order to be able to
pose queries. There is a wealth of literature about map-
ping forms, and a number of the suggested solutions makes
use of the information on the forms themselves such as la-
bels. Many modern web forms though employ Javascript
and AJAX to provide instant feedback about the validity
of data, or even suggest possible data. A technique called
dynamic dependent drop-down lists populates a drop-down
list according to what a user selected in another drop-down
list. In the flight search domain, this is often used when
a destination airport list is automatically updated when a
user selects an origin airport.
In the process of building a model for a web search form,
Metamorph uses its ontologies to identify and tag known
concepts in that form. We can use data extraction from
forms to extract relations between such concepts; in flight
search, the most important relation is the one between origin
and destination airport. Form data extraction works like
this: The connector, which has access to a full-featured web
browser component, issues a click event on the origin airport
list and sequentially selects each of the options on that list.
After that, it adds an event hook to monitor the destination
airport list for possible changes. The changes could be done
by running client-side Javascript code or by reloading from
the server. If changes happen, the destination airport list
is analysed for airport or IATA names and stored in the
local database together with the origin airport selection it
depends on. This method works on large number of airline
websites, and is good enough to bootstrap the system with
initial knowledge about flight connections.
Currently, we are generalizing our form extraction ap-
proach by automatically detecting dependencies between form
controls and integrating the results directly into the Meta-
morph form ontology.
WWW 2009 MADRID! Poster Sessions: Wednesday, April 22, 2009
1069
Bernhard Krüpl
Wolfgang Holzinger
Yansen Darmaputra
DBAI Group
TU Wien, Austria
{holzing,kruepl,darmap}@dbai.tuwien.ac.at
Robert Baumgartner
Lixto Software GmbH
Vienna, Austria
baumgartner@lixto.com
ABSTRACT
We demonstrate a flight meta-search engine that is based
on the Metamorph framework. Metamorph provides mech-
anisms to model web forms together with the interactions
which are needed to fulfil a request, and can generate in-
teraction sequences that pose queries using these web forms
and collect the results. In this paper, we discuss an interest-
ing new feature that makes use of the forms themselves as
an information source. We show how data can be extracted
from web forms (rather than the data behind web forms) to
generate a graph of flight connections between cities.
The flight connection graph allows us to vastly reduce the
number of queries that the engine sends to airline websites in
the most interesting search scenarios; those that involve the
controversial practice of creative ticketing, in which agen-
cies attempt to find lower price fares by using more than
one airline for a journey. We describe a system which at-
tains data from a number of websites to identify promising
routes and prune the search tree. Heuristics that make use
of geographical information and an estimation of cost based
on historical data are employed. The results are then made
available to improve the quality of future search requests.
Categories and Subject Descriptors: H.3.4 [Informa-
tion Storage and Retrieval]: Systems and Software
General Terms: Algorithms, Design, Experimentation.
Keywords: Hidden Web, Web Data Extraction, Web Form
Mapping, Web Form Extraction.
1. INTRODUCTION
A typical flight meta-search engine forwards the route
query that it receives to other websites in its domain. The
retrieved results are aggregated and presented to the user.
We implemented a system that mimicks the so called cre-
ative ticketing practice of travel agencies by searching for
the more complex itineraries that involve changing airlines
during the journey. Many special air fare offers are only
available through the airline’s websites; but each request to
an airline website is a costly procedure. We thus imple-
mented a system that uses a flight connection graph and a
query planner to limit the number of requests by checking
only the most promising paths. Our system has been built
on top of Metamorph, a meta-search framework aiming to
∗This research is supported in part by the Austrian Forschungs-
fo¨rderungsgesellschaft FFG under project grant 812991.
Copyright is held by the author/owner(s).
WWW 2009, April 20–24, 2009, Madrid, Spain.
ACM 978-1-60558-487-4/09/04.
generate maintainable vertical deep web search engines [1].
We use Metamorph’s ontology and rule based form interac-
tion modelling to map and fill out the web search forms.
In related research, the WISE project [2] aims to integrate
access to web databases. The MetaQuerier [3] uses the reg-
ularities of web forms and automatically matches interfaces.
Metamorph focusses on modelling form interactions, and our
meta-search system retrieves its initial route knowledge from
web forms themselves and concentrates on creative ticketing
heuristics via hub identification.
2. EXTRACTING DATA FROM FORMS
Web forms are the gateway to the hidden or deep Web.
It is quite clear that any automated search system has to
model the meaning of these forms in order to be able to
pose queries. There is a wealth of literature about map-
ping forms, and a number of the suggested solutions makes
use of the information on the forms themselves such as la-
bels. Many modern web forms though employ Javascript
and AJAX to provide instant feedback about the validity
of data, or even suggest possible data. A technique called
dynamic dependent drop-down lists populates a drop-down
list according to what a user selected in another drop-down
list. In the flight search domain, this is often used when
a destination airport list is automatically updated when a
user selects an origin airport.
In the process of building a model for a web search form,
Metamorph uses its ontologies to identify and tag known
concepts in that form. We can use data extraction from
forms to extract relations between such concepts; in flight
search, the most important relation is the one between origin
and destination airport. Form data extraction works like
this: The connector, which has access to a full-featured web
browser component, issues a click event on the origin airport
list and sequentially selects each of the options on that list.
After that, it adds an event hook to monitor the destination
airport list for possible changes. The changes could be done
by running client-side Javascript code or by reloading from
the server. If changes happen, the destination airport list
is analysed for airport or IATA names and stored in the
local database together with the origin airport selection it
depends on. This method works on large number of airline
websites, and is good enough to bootstrap the system with
initial knowledge about flight connections.
Currently, we are generalizing our form extraction ap-
proach by automatically detecting dependencies between form
controls and integrating the results directly into the Meta-
morph form ontology.
WWW 2009 MADRID! Poster Sessions: Wednesday, April 22, 2009
1069
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
9 Readers on Mendeley
by Discipline
11% Philosophy
11% Mathematics
by Academic Status
56% Ph.D. Student
22% Researcher (at an Academic Institution)
11% Other Professional
by Country
22% Austria
11% Netherlands
11% China


