Do Select Python Questions MCQ

LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

Recent advancements in omnimodal large language models (OmniLLMs) have significantly improved the comprehension of audio and video inputs. However, current evaluations primarily focus on short audio ...

GitHub

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

AV-SpeakerBench is a curated benchmark of 3,212 multiple-choice questions that tests speaker-centric audiovisual reasoning in real-world videos. Unlike prior video datasets where many tasks are ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Trending now