InkSight: A Multimodal STEM Lecture Video Dataset and Data Labeling Tool

Published in Work In Progress, 2025

This ongoing research develops a multimodal STEM lecture video dataset and annotation framework for training and evaluating vision-language models on educational understanding, including CLIP-style encoders and instruction-tuned VLMs (BLIP-2, Flamingo-style, LLaVA, Qwen-VL/InternVL). The InkSight Data Labeler is an AI-assisted labeling tool enabling frame-level diagram/handwriting/speech alignment and accessibility analysis for lecture content.

Recommended citation: Nicole Hao. (2025). "InkSight: A Multimodal STEM Lecture Video Dataset and Data Labeling Tool." Work In Progress.
Download Paper