Browsing Computer Sciences by Title
Now showing items 5-8 of 8
-
Resolution Matters: An Effective Approach to Anomaly Detection
(2025)Unsupervised anomaly detection has been profoundly impacted by the advent of large-scale Vision Foundation Models (VFMs). The prevailing paradigm leverages features from a pre-trained encoder, where anomalies manifest as ... -
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
(2025)"Multimodal large language models (MLLMs) are expected to jointly interpret vision, audio, and language, yet existing video benchmarks rarely assess fine-grained reasoning about human speech. Many tasks remain visually ... -
Storypair: Supporting Co-Reading in Bilingual Immigrant Families through Generative Language
(2025)Reading is an essential yet challenging skill in child education, particularly for immigrant families navigating bilingual environments. This study explores how generative language models (LLMs) can enhance parent-child ... -
Unveiling Bias in Multimodal Models
(2025)Vision Language Models (VLMs) have significantly advanced multimodal understanding by effectively combining visual and textual modalities for various applications, including image captioning, visual question answering, and ...
