Stable Video Diffusion は、オープンソースの Stable Video Diffusion モデルファミリー(SVD、SVD-XT)をベースとした独立したAI動画ツールです。Stability AI とは一切関係ありません。すべての商標はそれぞれの所有者に帰属します。

Turn a prompt into a Stable Video Diffusion clip

Text to Video

Text to Video pairs a text-to-image step with Stable Video Diffusion: your written prompt becomes a starting frame, then SVD animates it into a short clip — no source photo required.

74/500

How Text to Video works

1

Describe the scene

Write a prompt describing the look you want for the opening frame.

2

Generate the frame

A text-to-image step renders your prompt into a still that seeds the video.

3

Animate with SVD

Stable Video Diffusion turns that frame into a short, coherent motion clip.

Why use our Text to Video

No source photo

Start entirely from a prompt — the model creates the frame for you.

Prompt-driven look

Your description sets the scene, style, and composition of the starting frame.

Coherent motion

SVD keeps the generated motion smooth and anchored to the seed frame.

Text to Video FAQ

Your prompt is first turned into a still image, then Stable Video Diffusion conditions on that frame and animates it into a short clip.

No. Text to Video generates the starting frame from your prompt, so no source photo is required.

Describe subject, setting, lighting, and camera feel. The clearer the prompt, the more on-target the seed frame and resulting motion.

Clips are short — around two to four seconds — matching Stable Video Diffusion's 14 and 25-frame outputs.

Start free with included credits, then continue with a paid pack.

Ready to try Text to Video?

Start free and create your first clip in minutes.