Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Apple releases adapted SlowFast-LLaVA model for long-form video analysis (9to5mac.com)
3 points by Terretta 4 months ago | hide | past | favorite | 1 comment


Paper: SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding:

“We introduce SlowFast-LLaVA-1.5 (abbreviated as SF-LLaVA-1.5), a family of video large language models (LLMs) offering a token-efficient solution for long-form video understanding... Experimental results demonstrate that SF-LLaVA-1.5 achieves superior performance on a wide range of video and image tasks, with robust results at all model sizes (ranging from 1B to 7B). Notably, SF-LLaVA-1.5 achieves state-of-the-art results in long-form video understanding (e.g., LongVideoBench and MLVU) and excels at small scales across various video benchmarks.” -- https://arxiv.org/abs/2503.18943

Github: https://github.com/apple/ml-slowfast-llava

Hugging Face: https://huggingface.co/papers/2503.18943




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: