Visual attention: How to guide a machine in the world around us

Miguel Ángel Fernández Torres

A great world full of visible information is opened to us, and visual attention allows humans either to highlight the most conspicuous areas in a particular context (e.g. an airport, a highway, a hospital, etc.) or to select those that aid to solve a particular task (e.g. video surveillance, driving, a surgery, etc.).

In this talk, we will show how we can train a machine to perform the visual attention task, as well as its advantages when dealing with large amounts of information in complex and crowded scenarios. For that purpose, we will divide the presentation in two parts.

In the first part of the talk, we will briefly introduce how to model some of the attributes (e.g. color, orientation, motion, etc.) and objects that guide attention, both using traditional computer vision techniques and recent Convolutional Neural Networks (CNNs). Then, we will present a model able to learn comprehensible representations of visual attention. Drawing on the first enumerated attributes and the information provided by human eye fixations, these representations attempt either to predict where people look or to understand how visual attention works.

In the second part of the talk, we will mention some of the most outstanding video scenarios where visual attention could be useful to solve a particular application. In these contexts, modeling visual attention would allow to guide the latter processing to spatial regions and time segments of special importance. We will put special emphasis on the anomaly detection task performed by CCTV operators in video surveillance scenarios, which implies watching many hours of footage from large arrays of cameras.

Medio Español Inteligencia Artificial Big Data / Data Science Ingeniería Ciencia / Investigación

Jueves 14/03/2019

10:00 - 10:50

Video disponible próximamente

Suscríbete a nuestro canal

Consigue tu entrada gratis para acceder a esta charla

Sobre el ponente

Miguel Ángel Fernández Torres

Universidad Carlos III de Madrid

Miguel-Ángel Fernández-Torres received the Audiovisual Systems Engineering degree and the Master degree in Multimedia and Communications from Universidad Carlos III de Madrid, Madrid, Spain, in 2013 and 2014, respectively. At present, he is Assistant Professor at the Signal Theory and Communications Department of this University, at the same time he is pursuing his Ph.D. degree in the Multimedia Processing Group, which is related to spatio-temporal visual attention modeling and understanding, applying both Bayesian networks and deep learning.

In addition to his research on visual attention, he has participated in projects related to some of his other interests within the field of Computer Vision, which include image and video analysis, and medical image classification. He had also the opportunity to study at Technische Universität Wien, Vienna, Austria, during the Bachelor degree, in 2013, and to do a Ph.D. stay at the Visual Perception Laboratory of Purdue University, West Lafayette, Indiana, USA, in 2016.