Learning about activities, spatial relations and spatial language from video
In this talk I will present work undertaken at Leeds on building models of activity from video and other sensors, using both supervised and unsupervised techniques. The representations exploit qualitative spatio-temporal relations to provide symbolic models at a relatively high level of abstraction. I will discuss techniques for handling noise in the video data and I will also show how objects can be "functionally categorised" according to their spatio-temporal behaviour. Finally I will present very recent results on learning and grounding language from video-sentence pairs.