Stable Video Diffusion は、オープンソースの Stable Video Diffusion モデルファミリー(SVD、SVD-XT)をベースとした独立したAI動画ツールです。Stability AI とは一切関係ありません。すべての商標はそれぞれの所有者に帰属します。

25-frame image-to-video for longer, smoother clips

Stable Video Diffusion XT image to video

Stable Video Diffusion XT (SVD-XT) is the extended image-to-video model from Stability AI. Built on the same latent video diffusion backbone as the base model, it is fine-tuned to generate 25 frames from a single still, producing longer and noticeably smoother motion. You steer the result with motion bucket and fps conditioning instead of a text prompt.

70/500

Stable Video Diffusion XT at a glance

4K
Max resolution
60s
Max duration
Full
Commercial license
Minutes
Typical render time

What Stable Video Diffusion XT can do

Everything you need to turn a still image into motion.

25-frame output

SVD-XT extends the base model to 25 frames per generation, giving you longer clips with more room for motion to develop.

Latent video diffusion

A temporal latent diffusion model animates your image directly in latent space, keeping frames coherent without per-frame flicker.

Motion bucket control

Dial the motion bucket id up or down to set how much movement the model adds, from subtle drift to energetic action.

fps conditioning

Condition generation on target frames-per-second so playback speed and motion smoothness match your shot.

How Stable Video Diffusion XT works

1

Upload an image

Start from any clear still — a portrait, product shot, or landscape.

2

Describe the motion

Pick a preset or write a short prompt to direct how the scene moves.

3

Generate & download

Render in minutes and download a ready-to-share video.

Stable Video Diffusion XT FAQ

SVD-XT is fine-tuned to generate 25 frames at 576x1024 from a single input image, compared with 14 frames in the base SVD model. The extra frames make the resulting clip longer and the motion smoother.

No. Stable Video Diffusion is image-to-video, so it animates an input still rather than reading a text description. You control the result with motion bucket id and fps conditioning instead of a prompt.

The motion bucket id is a conditioning value that tells the model how much motion to introduce. Lower values keep the scene calm and subtle; higher values produce stronger, faster movement.

Choose SVD-XT when you want the longest, smoothest clip Stable Video Diffusion can produce from one image. Its 25-frame output suits cinematic loops and shots where motion needs space to breathe.

Animate your first image with Stable Video Diffusion XT

Upload a photo, describe the motion, and get a cinematic clip in minutes.