Browsing Computer Sciences by Author "Nguyen, Le Thien Phuc"
Now showing items 1-1 of 1
-
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Nguyen, Le Thien Phuc (2025)"Multimodal large language models (MLLMs) are expected to jointly interpret vision, audio, and language, yet existing video benchmarks rarely assess fine-grained reasoning about human speech. Many tasks remain visually ...
