Responsible AI

Schedule

A series of workshops and collaborative sessions where
researchers can learn about recent developments
in generative AI.

Schedule

The responsible AI workshop series will focus this Spring on the multimodal capabilities of generative AI models. These sessions will be of interest to anyone working with images, video, or audio materials. We’ll investigate the new capabilities of vision-language models and connect them with the needs of academic researchers.

Visual reasoning and chain of thought

Recent models can reason about the visual contents of images. These models can “think aloud” about the meaning and relationships between objects. This capability enables more effective recognition of signs and other visual information, including their contextual information within the image. How might this capability further visual analysis, interpretation, and distant viewing?


Visual tool calling

Multimodal AI models can include visual tools that enable them to manipulate images or retrieve external information. A zoom tool can be used to focus on a specific section of a painting. A reverse image search tool can find similar images across the Web. This visual search can retrieve metadata that improves the recognition and interpretation of visual information. We will begin with existing visual tools and consider which additional tools could aid research.


Video and temporal understanding

Vision-language models can process video and image series. Additionally, they are trained to ground their response in time. This allows them to indicate when a particular event or shift occurs in the film. This session will explore the use of vision-language models for the analysis and interpretation of moving images.


Location

Commons Library Classroom (D112)