Loading...
Loading...
The HappyHorse model — a 40-layer single-stream Transformer with 8-step inference and no CFG — topped the Video Arena in both text-to-video and image-to-video. Generate stunning videos now.
Blind-test Elo scores from Artificial Analysis Video Arena — HappyHorse 1.0 topped both text-to-video and image-to-video leaderboards.
A radical rethink of video generation: a single 40-layer Self-Attention Transformer that jointly models text, video, and audio — with no Cross-Attention and no CFG.
One unified Self-Attention Transformer handles text, video, and audio tokens — no Cross-Attention overhead.
Consistency distillation compresses hundreds of diffusion steps into just 8, with no Classifier-Free Guidance penalty.
Chinese, English, Japanese, Korean, German, and French — all natively supported without translation wrappers.
Sound and picture are the same token sequence. Lip sync, speech coordination, and ambient audio are baked in.
HappyHorse 1.0 is purpose-built for digital humans, short dramas, and avatar content — not just scenic reels. Its image-to-video Elo of 1392 is largest precisely because animating a real face is its core strength.
HappyHorse 1.0 is a text-and-image-to-video generation model that topped the Artificial Analysis Video Arena in April 2026, achieving Elo 1392 in image-to-video and Elo 1333 in text-to-video — beating Seedance 2.0, Kling 3.0, and PixVerse V6.
It uses a 40-layer single-stream Self-Attention Transformer with no Cross-Attention. All modalities — text, video, and audio — share one token sequence and one attention space. Inference requires only 8 denoising steps and no Classifier-Free Guidance (CFG).
HappyHorse 1.0 natively supports six languages: Chinese, English, Japanese, Korean, German, and French — with no translation wrapper required.
Both V1 and V2 were removed within days of topping the Arena. The most likely explanations are either an anonymous A/B test run by the developer team, or a deliberate early-access withdrawal before an official open-source release.
The official site states that the base model, distilled model, super-resolution model, and inference code will be fully open-sourced. As of early April 2026, the GitHub and model hub pages are marked "Coming Soon."