Segmenting moving objects in a video sequence has been a challenging problem and critical to outdoor robotic navigation. While recent literature has laid focus on regularizing object labels over a sequence of frames, exploiting the spatio-temporal features for motion segmentation has been scarce. Particularly in real world dynamic scenes, existing approaches fail to exploit temporal consistency in segmenting moving objects with large camera motion. In this paper, we present an approach for exploiting semantic information and temporal constraints in a joint framework for motion segmentation in a video. We propose a formulation for inferring per-frame joint semantic and motion labels using semantic potentials from dilated CNN framework and motion potentials from depth and geometric constraints. We integrate the potentials obtained into a 3D (space-time) fully connected CRF framework with overlapping/connected blocks. We solve for a feature space embedding in the spatio-temporal space by enforcing temporal constraints using optical flow and long term tracks as a least-squares problem. We evaluate our approach on outdoor driving benchmarks - KITTI and Cityscapes dataset.
Haque, N., Reddy, N. D., & Krishna, M. (2018). Temporal semantic motion segmentation using spatio temporal optimization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10746 LNCS, pp. 93–108). Springer Verlag. https://doi.org/10.1007/978-3-319-78199-0_7