This project is supervised by Pr M. Cord from LIP6 and was selected under in the call launched in July 2019 by the National Research Agency (ANR) for the creation of research and teaching chairs in AI.
Our main objective is to tackle complex vision-driven recognition and understanding tasks. We propose to investigate tasks of visual reasoning beyond merely large scale image classification. It is required to decline some reasoning processes in the visual analysis scheme. We intend to explore the combination of elementary reasoning blocks into deep architectures. We want to question the type of blocks, structures and rules (if any) we can include. The main requirement in terms of types of structures we consider is to get a final hybrid (Explicit/Implicit) architecture that is end-to-end trainable. We already experimented in several contexts how performance increases using fully trainable architectures. Getting a differentiable function for the final DNN greatly constraints the type of combination or the nature of reasoning. We will consider different contexts including the VQA task to experiment our propositions. We also want to measure or visualize how the information is processed inside the deep nets. It is a key to our deep reasoning models being tooled up of explanation capacities. In particular, we will investigate different visualizing processes in the context of autonomous driving where building machines explaining their decisions is of critical importance.