Charla

Visual attention: How to guide a machine in the world around us

Miguel Ángel Fernández Torres

A great world full of visible information is opened to us, and visual attention allows humans either to highlight the most conspicuous areas in a particular context (e.g. an airport, a highway, a hospital, etc.) or to select those that aid to solve a particular task (e.g. video surveillance, driving, a surgery, etc.).

In this talk, we will show how we can train a machine to perform the visual attention task, as well as its advantages when dealing with large amounts of information in complex and crowded scenarios. For that purpose, we will divide the presentation in two parts.

In the first part of the talk, we will briefly introduce how to model some of the attributes (e.g. color, orientation, motion, etc.) and objects that guide attention, both using traditional computer vision techniques and recent Convolutional Neural Networks (CNNs). Then, we will present a model able to learn comprehensible representations of visual attention. Drawing on the first enumerated attributes and the information provided by human eye fixations, these representations attempt either to predict where people look or to understand how visual attention works.

In the second part of the talk, we will mention some of the most outstanding video scenarios where visual attention could be useful to solve a particular application. In these contexts, modeling visual attention would allow to guide the latter processing to spatial regions and time segments of special importance. We will put special emphasis on the anomaly detection task performed by CCTV operators in video surveillance scenarios, which implies watching many hours of footage from large arrays of cameras.

Medio Español Inteligencia Artificial Big Data / Data Science Ingeniería Ciencia / Investigación

Slides y material adicional

Jueves 14/03/2019

10:00 - 10:50

Track 4 (4.1.E03)

Sobre el ponente

Miguel Ángel Fernández Torres

Universidad Carlos III de Madrid

Miguel-Ángel Fernández-Torres recibió el Grado en Ingeniería de Sistemas Audiovisuales y el Master in Multimedia and Communications de la Universidad Carlos III de Madrid, España, en 2013 y 2014, respectivamente. Recientemente, en febrero de 2019, ha recibido el título de Doctor en Multimedia y Comunicaciones de la Universidad Carlos III de Madrid. Durante su etapa como estudiante de doctorado, su investigación se ha basado fundamentalmente en el modelado y la interpretación de la atención visual espacio-temporal, aplicando tanto modelos Bayesianos como aprendizaje profundo (Deep Learning). En la actualidad, trabaja como Profesor Ayudante en el Departamento de Teoría de la Señal y Comunicaciones, a la vez que continúa su investigación en el Grupo de Procesado Multimedia.

Además de su trabajo sobre atención visual, ha participado en proyectos relacionados con otros de sus intereses dentro del campo de la Visión Artificial, los cuales incluyen el análisis de imágenes y vídeo, así como la clasificación de imágenes médicas. También ha tenido la oportunidad de estudiar en Technische Universität Wien, Vienna, Austria, durante sus estudios de Grado, en 2013, así como de realizar una estancia predoctoral en el Grupo de Percepción Visual de Purdue University, West Lafayette, Indiana, USA, en 2016.


Miguel-Ángel Fernández-Torres received the Audiovisual Systems Engineering degree and the Master degree in Multimedia and Communications from Universidad Carlos III de Madrid, Madrid, Spain, in 2013 and 2014, respectively. He recently obtained his Ph.D. degree in Multimedia and Communications. During the Ph.D. period, his research has been related to spatio-temporal visual attention modeling and understanding, applying both Bayesian networks and deep learning. At present, he is Assistant Professor at the Signal Theory and Communications Department of this University, at the same time he continues researching in the Multimedia Processing Group.

In addition to his work on visual attention, he has participated in projects related to some of his other interests within the field of Computer Vision, which include image and video analysis, and medical image classification. He had also the opportunity to study at Technische Universität Wien, Vienna, Austria, during the Bachelor degree, in 2013, and to do a Ph.D. stay at the Visual Perception Laboratory of Purdue University, West Lafayette, Indiana, USA, in 2016.