Stable Video Diffusion は、オープンソースの Stable Video Diffusion モデルファミリー(SVD、SVD-XT)をベースとした独立したAI動画ツールです。Stability AI とは一切関係ありません。すべての商標はそれぞれの所有者に帰属します。
Stable Video Diffusion XT image to video
Stable Video Diffusion XT (SVD-XT) is the extended image-to-video model from Stability AI. Built on the same latent video diffusion backbone as the base model, it is fine-tuned to generate 25 frames from a single still, producing longer and noticeably smoother motion. You steer the result with motion bucket and fps conditioning instead of a text prompt.
Stable Video Diffusion XT at a glance
What Stable Video Diffusion XT can do
Everything you need to turn a still image into motion.
25-frame output
SVD-XT extends the base model to 25 frames per generation, giving you longer clips with more room for motion to develop.
Latent video diffusion
A temporal latent diffusion model animates your image directly in latent space, keeping frames coherent without per-frame flicker.
Motion bucket control
Dial the motion bucket id up or down to set how much movement the model adds, from subtle drift to energetic action.
fps conditioning
Condition generation on target frames-per-second so playback speed and motion smoothness match your shot.
How Stable Video Diffusion XT works
Upload an image
Start from any clear still — a portrait, product shot, or landscape.
Describe the motion
Pick a preset or write a short prompt to direct how the scene moves.
Generate & download
Render in minutes and download a ready-to-share video.
Stable Video Diffusion XT FAQ
SVD-XT is fine-tuned to generate 25 frames at 576x1024 from a single input image, compared with 14 frames in the base SVD model. The extra frames make the resulting clip longer and the motion smoother.
No. Stable Video Diffusion is image-to-video, so it animates an input still rather than reading a text description. You control the result with motion bucket id and fps conditioning instead of a prompt.
The motion bucket id is a conditioning value that tells the model how much motion to introduce. Lower values keep the scene calm and subtle; higher values produce stronger, faster movement.
Choose SVD-XT when you want the longest, smoothest clip Stable Video Diffusion can produce from one image. Its 25-frame output suits cinematic loops and shots where motion needs space to breathe.
Explore more models & tools
Pick the right model or creative tool for your next clip.
Stable Video Diffusion 1.1
Refined 14-frame base model with consistent motion
Stable Video Diffusion 1.1 is the refined release of Stability AI's base image-to-video model. It generates 14 frames from a single still using latent video diffusion, fine-tuned at fixed conditioning (6 fps, motion bucket 127) for more consistent, predictable motion and fewer artifacts than the original checkpoint.
ExploreImage to Video
Animate a still with Stable Video Diffusion
Image to Video uses Stable Video Diffusion to turn a single still image into a short, coherent motion clip. Upload a frame and SVD conditions on it to generate smooth movement while keeping your subject anchored.
Open toolText to Video
Turn a prompt into a Stable Video Diffusion clip
Text to Video pairs a text-to-image step with Stable Video Diffusion: your written prompt becomes a starting frame, then SVD animates it into a short clip — no source photo required.
Open toolMotion Control
Tune motion bucket and frame count
Motion Control gives you direct access to Stable Video Diffusion's key dials — the motion bucket, frame count, and target fps — so you can fine-tune exactly how much a still image moves.
Open toolShowcase gallery
Browse real Stable Video Diffusion creations made with our models.
View showcaseAnimate your first image with Stable Video Diffusion XT
Upload a photo, describe the motion, and get a cinematic clip in minutes.