Stable Diffusion XL Turbo#

Stable Diffusion XL Turbo (SDXL Turbo) is a distilled version of SDXL 1.0 and is capable of creating images in a single step, with improved real-time text-to-image output quality and sampling fidelity.

This document demonstrates how to create an image generation application with SDXL Turbo and BentoML.


  • Python 3.8+ and pip installed. See the Python downloads page to learn more.

  • You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read Quickstart first.

  • To run this BentoML Service locally, you need a Nvidia GPU with at least 12G VRAM.

  • (Optional) We recommend you create a virtual environment for dependency isolation. See the Conda documentation or the Python documentation for details.

Install dependencies#

Clone the project repository and install all the dependencies.

git clone
cd BentoSDXLTurbo
pip install -r requirements.txt

Create a BentoML Service#

Create a BentoML Service in a file to define the serving logic of the model. You can use this example file in the cloned project:
import bentoml
from PIL.Image import Image

MODEL_ID = "stabilityai/sdxl-turbo"

sample_prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

    traffic={"timeout": 300},
        "gpu": 1,
        "gpu_type": "nvidia-l4",
        # You can also specify GPU memory requirement:
        # "memory": "16Gi",
class SDXLTurbo:
    def __init__(self) -> None:
        from diffusers import AutoPipelineForText2Image
        import torch

        self.pipe = AutoPipelineForText2Image.from_pretrained(

    def txt2img(
            prompt: str = sample_prompt,
            num_inference_steps: int = 1,
            guidance_scale: float = 0.0,
    ) -> Image:
        image = self.pipe(
        return image

In the Service code, the @bentoml.service decorator is used to define the SDXLTurbo class as a BentoML Service. It loads the pre-trained model (MODEL_ID) using the torch.float16 data type. The model pipeline (self.pipe) is moved to a CUDA-enabled GPU device for efficient computation.

The txt2img method is an API endpoint that takes a text prompt, number of inference steps, and a guidance scale as inputs. It uses the model pipeline to generate an image based on the given prompt and parameters.


SDXL Turbo is capable of performing inference with just a single step. Therefore, setting num_inference_steps to 1 is typically sufficient for generating high-quality images. Additionally, you need to set guidance_scale to 0.0 to deactivate it as the model was trained without it. See the official release notes to learn more.

Run bentoml serve to start the BentoML server.

$ bentoml serve service:SDXLTurbo

2024-01-19T07:20:29+0000 [WARNING] [cli] Converting 'SDXLTurbo' to lowercase: 'sdxlturbo'.
2024-01-19T07:20:29+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:SDXLTurbo" listening on http://localhost:3000 (Press CTRL+C to quit)

The server is active at http://localhost:3000. You can interact with it in different ways.

curl -X 'POST' \
    'http://localhost:3000/txt2img' \
    -H 'accept: image/*' \
    -H 'Content-Type: application/json' \
    --output output.png \
    -d '{
    "prompt": "A cinematic shot of a baby racoon wearing an intricate italian priest robe.",
    "num_inference_steps": 1,
    "guidance_scale": 0

This client returns the image as a Path object. You can use it to access, read, or process the file. See Clients for details.

import bentoml

with bentoml.SyncHTTPClient("http://localhost:3000") as client:
        result = client.txt2img(
            prompt="A cinematic shot of a baby racoon wearing an intricate italian priest robe.",

Visit http://localhost:3000, scroll down to Service APIs, specify the parameters, and click Execute.


Expected output:


Deploy to BentoCloud#

After the Service is ready, you can deploy the project to BentoCloud for better management and scalability. Sign up for a BentoCloud account and get $30 in free credits.

First, specify a configuration YAML file (bentofile.yaml) to define the build options for your application. It is used for packaging your application into a Bento. Here is an example file in the project:

service: "service:SDXLTurbo"
  owner: bentoml-team
  project: gallery
- "*.py"
  requirements_txt: "./requirements.txt"

Create an API token with Developer Operations Access to log in to BentoCloud, then run the following command to deploy the project.

bentoml deploy .

Once the Deployment is up and running on BentoCloud, you can access it via the exposed URL.



For custom deployment in your own infrastructure, use BentoML to generate an OCI-compliant image.