Title | : | Video Object Segmentation using Adversarial Techniques |
Speaker | : | Saptakatha Adak (IITM) |
Details | : | Thu, 16 May, 2019 4:00 PM @ A M Turing Hall |
Abstract: | : | Video Object Segmentation has recently emerged as a popular semi-supervised learning problem in the field of Computer Vision. It aims to segment objects in a video sequence under challenging situations such as change of object appearance, occlusions, camera view change, background clutter and motion blur, given some user annotation which indicates the object of interest(s). The popularity of this domain lies in the fact that it has profound impact in the fields of bio-medical research, self-driving cars, video editing, robotics, surveillance etc. Although Convolutional Neural Networks (CNNs) have been used in the past for the purpose of foreground segmentation in videos, adversarial training based methods have not been explored thoroughly, in spite of its extensive use for solving many other problems in Computer Vision. Here, in this talk three such approaches will be discussed. The first approach deals with a GAN based framework along with the use of an Intersection-over-Union score based novel cost function (PSDL) for training the model, to solve the problem of foreground object segmentation in videos. The network in spite of processing the sequences of video frames independently, is still able to maintain the temporal coherency between them without the use of any explicit trajectory-based information. The proposed method, when evaluated on popular real-world video segmentation datasets viz. DAVIS, SegTrack-v2 and YouTube-Objects, exhibits substantial performance gain over the recent state-of-the-art methods. The second approach improves over the previously mentioned PSDL loss by using optical vectors which aids in capturing motion features between consecutive frames in videos and thus in turn enhances the segmentation quality. On the other hand, the Inter-frame Temporal Loss (IFTL) function along with its long-range variant captures the temporal information from the sequence of video frames. The incorporation of these temporal information-based objective functions thus stabilizes the training process to generate improved segmentation over other state-of-the-art methods proposed earlier in literature. Finally, we introduce a space-time graph based adversarially trained network, with intermediate features extracted from the pixels (over a local region) represented as vertices and the relationships among them as edges. Moreover, a motion orientation-based attention mechanism is incorporated which efficiently captures long range dependencies without the need of any explicit specialized objective functions. This method elegantly handles the complex issue of modelling multiple objects moving in various directions and with dissimilar speeds, by learning directional relationships between pixels in the original scene. Performance analysis on DAVIS-2017 and FBMS datasets shows the effectiveness of our proposed method in foreground segmentation of objects in a video, both qualitatively as well as quantitatively. |