Information fusion consists of organizing a set of data for correlation in time, association over collections, and estimation in space. There exist many methods for object tracking and classification; however, video analytics systems suffer from robust methods that perform well in all operating conditions (i.e., scale changes, occlusions, high signal-to-noise ratios, etc.). Challenging scenarios where context can play a role includes: object labeling, track correlation/stitching through dropouts, and activity recognition. In this chapter we propose a novel framework to fuse video data with text data for enhanced simultaneous tracking and identification. The need for such methodology resides in answering user queries, linking information over different collections, and providing meaningful product reports. For example, text data can establish that a pedestrian is crossing the road in a low-resolution video and/or the activity type is the object turning. Together, physics-derived and human-derived fusion (PHF) enhances situation awareness, provides situation understanding, and affords situation assessment. PHF is an example of hard (e.g., video) and soft (i.e., text) data fusion that links Level 5 user refinement to Level 1 object tracking and characterization. A demonstrated example for multimodal text and video sensing is shown where context provides the means for associating the multimode data aligned in space and time.
CITATION STYLE
Blasch, E., Hammoud, R. I., Ling, H., Shen, D., Nagy, J., & Chen, G. (2016). Context-Based Fusion of Physical and Human Data for Level 5 Information Fusion. In Advances in Computer Vision and Pattern Recognition (pp. 479–505). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-28971-7_18
Mendeley helps you to discover research relevant for your work.