Schedule
The AY 2025–26 workshop series on multimodal generative AI is now complete — browse the materials in the Past Sessions archive below. We'll return in fall 2026 with a new series for AY 2026–27.
AY 2026–27 series — coming soon
New sessions will be announced in fall 2026. Topics and dates to be confirmed.
Past Sessions
Browse notebooks, slides, and other materials from previous workshops. Each session includes a read-only preview, a one-click link to run the notebook in Google Colab, and a portable version you can run in any Jupyter environment.
Multimodal AI — Video and temporal understanding
Vision-language models can process video and image series, grounding their responses in time to indicate when particular events or shifts occur in a film. This session explored the use of vision-language models for the analysis and interpretation of moving images.
AI for humanities research?
A collaborative session exploring how large language models and vision-language tools can support humanities scholarship — from working with archival image collections via IIIF to contextual analysis of primary sources.
DiScho Discovery Hours — Translating secondary sources
Three practical approaches to translating scholarly texts: quick paragraph-level translation via Google Translate / DeepL, offline reproducible translation with MarianMT, and context-aware scholarly translation with an LLM. Attendees compared outputs on the same passage.
Multimodal AI — Visual tool calling
Multimodal AI models can include visual tools that enable them to manipulate images or retrieve external information. A zoom tool can focus on a section of a painting; reverse image search retrieves metadata. We also built a custom image restoration tool and covered practical document-to-text workflows.
DiScho Discovery Hours — LLM Steering
An exploration of LLM activation steering — adding abstract concept vectors to a model's hidden state to alter its output. We tinkered with the technique using nnsight and sparse autoencoders, and discussed what it reveals about how models represent concepts internally.
Multimodal AI — Visual reasoning and chain of thought
Recent models can reason about the visual contents of images, "thinking aloud" about meaning and relationships between objects. This capability enables more effective recognition of signs and contextual information within images. We explored how this might further visual analysis, interpretation, and distant viewing.