Sign up & Download
Sign in

Approximate Semantic Matching of Heterogeneous Events

by Souleiman Hasan, Sean O'Riain, Edward Curry
6th ACM International Conference on Distributed EventBased Systems DEBS 2012 ()

Abstract

Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with a maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.

Cite this document (BETA)

Available from Edward Curry's profile on Mendeley.
Page 1
hidden

Approximate Semantic Matching of ...

Approximate Semantic Matching of Heterogeneous Events Souleiman Hasan Digital Enterprise Research Institute (DERI) National University of Ireland, Galway souleiman.hasan@deri.org Sean O’Riain Digital Enterprise Research Institute (DERI) National University of Ireland, Galway sean.oriain@deri.org Edward Curry Digital Enterprise Research Institute (DERI) National University of Ireland, Galway ed.curry@deri.org ABSTRACT Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over a structured representation of Wikipedia and Freebase events. Initial evaluations show that the approach matches events with a maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach. Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability---data mapping, interface definition languages H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval---information filtering. General Terms Algorithms, Experimentation, Human Factors, Languages. Keywords Approximate Event Matching, Semantic Decoupling, Semantic Event Matching. 1. INTRODUCTION Event-based technology is becoming more widely needed with the rise of new applications ranging from smart homes to smart cities and the Internet-of-Things [1]. Event-based systems enable a decoupled mode of interaction between participants making it suitable for large scale distributed environments [10]. There are estimates that by the end of 2020 fifty billion devices will be connected to mobile networks [22] which would push event-based technology to its limits. While event-based systems are decoupled in space, time, and synchronization [10], scaling out to include participants from diverse domains poses a challenge with the semantic interpretation of events. Current systems assume mutual agreement on event semantics which adds explicit dependencies between interacting parties. This ties event subscriptions and processing languages to crisp and well understood schema and semantics of events. This can limit the scalability of an event- based system to that of the events for which the schema and semantic interpretation is known. The requirement of an upfront understanding of the event semantics creates semantic coupling that can limit scalability especially in environments with high levels of semantic heterogeneity. It also puts a barrier between non-technical users who do not fully understand the used semantics and event-based systems. That constrains usability by non-technical users and limits it to IT specialists. Thus, there is a need to recognize event semantics as a fourth dimension of coupling if event-based systems are to scale out to highly heterogeneous environments such as the Internet of Things [1]. Semantic decoupling of events and user’s subscriptions requires an appropriate method for matching and processing of events. One approach to event matching is approximate semantic matching which uses a mechanism for ranking events according to their relevance to users’ subscriptions. We propose in this paper a model for approximate semantic matching that addresses event semantic decoupling requirement. We instantiate our model using a hybrid matching approach based on both thesauri and distributional semantics-based semantic similarity and relatedness measures. A novel evaluation that leverages heterogeneous real world events created by human and extracted from Wikipedia and Freebase is conducted with promising matching results. The rest of this paper is organized as follows: Section 2 motivates the problem of semantic coupling in an enterprise scenario and an open web scenario while Section 3 discusses decoupling in event- based systems. Section 4 explains the proposed approach and Section 5 details an instantiation of the proposed event, subscription, and matching models. The approach is evaluated in Section 6. Section 7 analyses related work. Potential future Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DEBS’12, July 16–20, 2012, Berlin, Germany. Copyright 2012 ACM 978-1-4503-1315-5...$10.00.
Page 2
hidden
directions are identified in Section 8, and Section 9 concludes the paper. 2. MOTIVATIONAL SCENARIOS 2.1 Enterprise Scenario The chief sustainability officer (CSO) is a part of the upper management and responsible for the company social responsibility programmes. The CSO is interested in a simple metric that gives in real-time the company’s performance from a carbon emissions perspective with regard to international standards. The CSO is not a technical person so the task is forwarded to the IT department which starts identifying the different potential sources that affects the companies CO2 [8]. A medium size organization typically has multiple information systems to manage assets, human resources, orders, etc. Heating, ventilation, and air conditioning (HVAC) are managed by a building management system. Energy consumption sensors exist for lights, laptops and data centre. The IT department instruments different emitters with sensors that publish events to an event- based infrastructure. Because energy consumption information comes from heterogeneous sources and generated by devices from different manufactures, it is highly likely that different schemas and values are used. They might use the terms “energy consumption” and “energy usage” to refer to the same thing. Locations of devices might be described differently as “rooms”, “spaces”, “wings”, etc. A web service from the power utility is used to determine the carbon emissions from power usage. The IT department also creates a rule-based situation assessment (SA) agent to consume raw events, aggregate events according to the different schemas and values and generate overall performance events which are consumed by a dashboard that is shown to the CSO. The diversity of schema and values results in a large number of rules to process events. That makes the cost of maintainability of the event infrastructure very high when changes in event schemas or value semantics occur or if a new event source is added or changed. E.g. if the external web service starts using “wind” instead of “renewable”, the SA agent will not be able to match the web service events. The SA agent might stop working for a while until the IT specialists determine the reason and make the necessary changes. Similarly, when a new set of smart fridges is added in the building and they start publishing events with the term “kitchen” instead of “room”, they will not be accounted automatically in the SA node until special rules are manually added for them. 2.2 Open Web Scenario A tourist agency is running a website that gathers real-time feeds from the web about interesting events such as sporting games, concerts, circuses, etc. The site allows users to register their interests in some types of events with some characteristics in their planned destinations of trips. The website subscribes to RSS web feeds from thousands of sources such as museums websites, football clubs websites and others. Feeds contain typical RSS items such as “title” and the publication date (“pubDate”). They may also contain other information items like “namespace1#club” or “namespace2#team” that conform to the publishers’ own descriptions of football matches. When a user subscribes to the agency’s website, she prefers using expressions with no restrictions on possible vocabulary that can be used, such as the subscription in Example 1. Example 1 event type "Football Match" event team "Barcelona" Since feeds use different terms such as “Soccer Match” instead of “Football Match”, “club” instead of “team” or “FCB” instead of “Barcelona”, the user misses some events that are relevant to the subscription if the website assumes conjunction between the statements. If the website assumes disjunction, the user may get many events that are played by some team from Barcelona but are not football matches. They would be considered equally relevant by the website although the user may want to have basketball games played by Barcelona ordered first if no football matches are detected. 3. SEMANTIC COUPLING WITHIN EVENT SYSTEMS The event-based interaction paradigm is based on decoupling producers and consumers of events. The main advantage of decoupling the production and consumption of events is an increased scalability by “removing explicit dependencies between the interacting participants” [10]. The three common dimensions of coupling between event producers and consumers are space, time and synchronization:  Space decoupling suggests that the interacting parties do not need to know each other. Publishers do not hold references to consumers or know how many of them are actually interacting and vice versa.  Time decoupling means that participants do not need to be actively involved in the interaction at the same time.  Synchronization decoupling suggests that event producers are not blocked while producing events and consumers get notified of an event occurrence while performing some concurrent activity [10]. However, event-based systems that support space, time and synchronization dimensions of decoupling can be still tightly coupled by the semantic of events they exchange. If an event system assumes mutual agreement on event types, properties, and values, this agreement is an explicit dependency between parties. Semantic coupling limits the scalability of event-based systems within deployment environments with high-levels of event heterogeneity. In these environments there is a large cost to define and maintain the whole subscriptions and rules needed by event consumers. X Space X Time X Semantic Type, Property, Value Event Consumer Event Producer X Synchronization Figure 1. Four dimensions of event decoupling Mutual agreement between event producers and event consumers suggests a semantic coupling that has three dimensions:

Authors on Mendeley

Readership Statistics

18 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
61% Ph.D. Student
 
22% Researcher (at an Academic Institution)
 
6% Student (Bachelor)
by Country
 
17% Brazil
 
11% United States
 
11% Germany

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in