About
Experiences
Education
News
- Our paper MovieChat+: Question-aware Sparse Memory for Long Video Question Answering is accepted by IEEE TPAMI.
- Our paper Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark is accepted by ICCV 2025 Findings.
- Our paper Bringing RNNs Back to Efficient Open-Ended Video Understanding is accepted by ICCV 2025.
- We are hosting two CVPR 2025 Video Understanding Challenge @ LOVE Track 1A and LOVE Track 1B.
- We release Video-MMLU, a Massive Multi-Discipline Lecture Understanding Benchmark.
- One paper accepted to CVPR 2025 workshop@Efficient Large Vision Models.
- Two papers are accepted by ICLR 2025.
Selected Publications and Manuscripts
* Equal contribution.
Also see Google Scholar.

Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
ICCVW, 2025
Video-MMLU is a massive benchmark designed to evaluate the capabilities of LMMs in understanding Multi-Discipline Lectures.

AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
ICCV, 2025
Video-MMLU uses a linear RNN language model that handles input sequence of arbitrary length with constant-size hidden states to solve long video understanding tasks.
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
ICLR, 2025
AuroraCap is a multimodal LLM designed for image and video detailed captioning. We also release VDC, the first benchmark for detailed video captioning.

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
CVPR, 2024
MovieChat achieves state-of-the-art performace in extra long video (more than 10K frames) understanding by introducing memory mechanism.
Teaching Assistant
     Teaching Assistant (TA), with Prof. Gaoang Wang
Selected Honors & Awards
- National Scholarship, 2024 (Zhejiang University)
- National Scholarship, 2021 (Dalian University of Technology)
Top