Stable Video Diffusion#

Stable Video Diffusion (SVD) is a latent diffusion model developed by Stability AI. It’s designed to generate short video clips from a still image. Specifically, the model can create 25 frames at a resolution of 576x1024 from a context frame of the same size.

This document demonstrates how to create a video generation server with SVD and BentoML.

All the source code in this tutorial is available in the BentoSVD GitHub repository.

Prerequisites#

Python 3.9+ and pip installed. See the Python downloads page to learn more.
You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read Quickstart first.
To run this BentoML Service locally, you need a Nvidia GPU with at least 16G VRAM.
(Optional) We recommend you create a virtual environment for dependency isolation. See the Conda documentation or the Python documentation for details.

Install dependencies#

Clone the project repository and install all the dependencies.

git clone https://github.com/bentoml/BentoSVD.git
cd BentoSVD
pip install -r requirements.txt

Create a BentoML Service#

Create a BentoML Service in a service.py file to define the serving logic of the model. You can use this example file in the cloned project:

service.py#

from __future__ import annotations

import os
import typing as t
from pathlib import Path
from PIL.Image import Image

import bentoml

MODEL_ID = "stabilityai/stable-video-diffusion-img2vid-xt"


@bentoml.service(
    resources={
        "gpu": 1,
        "gpu_type": "nvidia-l4",
    },
    traffic={"timeout": 600},
)
class StableDiffusionVideo:

    def __init__(self) -> None:
        import torch
        import diffusers

        self.pipe = diffusers.StableVideoDiffusionPipeline.from_pretrained(
            MODEL_ID, torch_dtype=torch.float16, variant="fp16"
        )
        self.pipe.to("cuda")


    @bentoml.api
    def generate(
            self, context: bentoml.Context,
            image: Image,
            decode_chunk_size: int = 2,
            seed: t.Optional[int] = None,
    ) -> t.Annotated[Path, bentoml.validators.ContentType("video/*")]:
        import torch
        from diffusers.utils import load_image, export_to_video

        generator = torch.manual_seed(seed) if seed is not None else None
        image = image.resize((1024, 576))
        image = image.convert("RGB")
        output_path = os.path.join(context.temp_dir, "output.mp4")

        frames = self.pipe(
            image, decode_chunk_size=decode_chunk_size, generator=generator,
        ).frames[0]
        export_to_video(frames, output_path)
        return Path(output_path)

A breakdown of the Service code:

It defines a BentoML Service StableDiffusionVideo using the @bentoml.service decorator, with specified GPU requirements for deployment on BentoCloud, and a timeout of 600 seconds. See Configurations for details.
During initialization, the Service loads the model into the StableVideoDiffusionPipeline and moves it to GPU for efficient computation.
It defines an endpoint for video generation using the @bentoml.api decorator, taking the following parameters:
- image: A base image for generating video, which will be resized and converted to RGB format for the SVD model.
- decode_chunk_size: The number of frames that are decoded at once. A lower decode_chunk_size value means reduced memory consumption but may lead to inconsistencies between frames, such as flickering. Set this value based on your GPU resources.
- seed: A randomly generated number when not specified. Every time you generate a video with the same seed and input image, you will get the exact same output. This is particularly useful for generating reproducible results.
- context: bentoml.Context lets you access information about the existing Service context. The temp_dir property provides a temporary directory to store the generated file.
export_to_video from the diffusers package converts the frames into a video file stored at output_path.
The method returns a Path object pointing to the generated video file. The return type is annotated with a content type validator, indicating that the endpoint returns a video file.

Run bentoml serve to start the BentoML server.

$ bentoml serve service:StableDiffusionVideo

2024-02-28T01:01:17+0000 [WARNING] [cli] Converting 'StableDiffusionVideo' to lowercase: 'stablediffusionvideo'.
2024-02-28T01:01:18+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:StableDiffusionVideo" listening on http://localhost:3000 (Press CTRL+C to quit)

The server is active at http://localhost:3000. You can interact with it in different ways.

CURL

curl -X 'POST' \
    'http://localhost:3000/generate' \
    -H 'accept: video/*' \
    -H 'Content-Type: multipart/form-data' \
    -F 'image=@assets/girl-image.png;type=image/png' \
    -o generated.mp4 \
    -F 'decode_chunk_size=2' \
    -F 'seed=null'

Python client

This client returns the image as a Path object. You can use it to access, read, or process the file. See Clients for details.

import bentoml
from pathlib import Path

with bentoml.SyncHTTPClient("http://localhost:3000") as client:
    result = client.generate(
        decode_chunk_size=2,
        image=Path("girl-image.png"),
        seed=0,
    )

Swagger UI

Visit http://localhost:3000, scroll down to Service APIs, click the generate endpoint, specify the parameters, and click Execute.

Expected output:

Deploy to BentoCloud#

After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. Sign up for a BentoCloud account and get $30 in free credits.

First, specify a configuration YAML file (bentofile.yaml) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:

bentofile.yaml#

service: "service:StableDiffusionVideo"
labels:
  owner: bentoml-team
  project: gallery
include:
  - "*.py"
python:
  requirements_txt: "./requirements.txt"
docker:
  distro: debian
  system_packages:
    - ffmpeg
    - git

Create an API token with Developer Operations Access to log in to BentoCloud, then run the following command to deploy the project.

bentoml deploy .

Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.

Note

For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.