Abstract: Despite significant progress in computer vision and machine learning, personalized autonomous agents often still don’t participate robustly and safely in our environment. We think this is largely because they lack an ability to anticipate. To develop this technology to anticipate, we think answers to four foundational questions are needed: (1) How can methods accurately forecast high-dimensional observations?; (2) How can algorithms holistically understand objects, e.g., when reasoning about occluded parts?; (3) How can accurate probabilistic models be recovered from limited amounts of labeled data and for rare events?; and (4) How can autonomous agents be trained effectively to collaborate?
In this talk we present vignettes of our research to address those four questions. Specifically, we first discuss panoptic forecasting, a new task to study algorithms for high-dimensional forecasting. We then illustrate methods for holistic object understanding, addressing tasks like semantic a-modal instance-level video object segmentation (SAIL-VOS). In a third part we discuss methods for training with limited labeled data. Time permitting, in the fourth part, we sketch recent advances to train collaborative embodied agents. For additional info and questions, please browse to http://alexander-schwing.de or https://twitter.com/alexschwing.