Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
BentoML
Light Logo Dark Logo
BentoML

Get Started

  • Hello world
  • Adaptive batching
  • Model composition
  • Async task queues
  • Packaging for deployment
  • Cloud deployment

Learn by Examples

  • Overview
  • LLM inference: vLLM
  • Agent: Function calling
  • Agent: LangGraph
  • LLM safety: ShieldGemma
  • RAG: Document ingestion and search
  • Stable Diffusion XL Turbo
  • ComfyUI: Deploy workflows as APIs
  • ControlNet
  • MLflow
  • XGBoost

Build with BentoML

  • Create online API Services
  • Define input and output types
  • Load and manage models
  • Work with GPUs
  • Call an API endpoint
  • Parallelize requests handling
  • Define the runtime environment
  • Run distributed Services
  • Configure template arguments
  • Configure lifecycle hooks
  • Mount ASGI applications
  • Stream responses
  • Define a WebSocket endpoint
  • Add a UI with Gradio
  • Observability
    • Monitoring
    • Logging
    • Metrics
    • Tracing
  • Customize error responses
  • Test API endpoints

Scale with BentoCloud

  • Deployment
    • Create Deployments
    • Configure Deployments
    • Manage Deployments
    • Call Deployment endpoints
    • Batch inference jobs
    • Build CI/CD pipelines
  • Scaling
    • Concurrency and autoscaling
  • Manage secrets
  • Manage API tokens
  • Develop with Codespaces
  • Administering
    • Manage users
    • Bring Your Own Cloud
    • Configure standby instances

References

  • BentoML
    • Bento and model APIs
    • BentoML SDK
    • Bento build options
    • BentoML CLI
    • Client API
    • Framework APIs
      • Diffusers
      • ONNX
      • Scikit-Learn
      • Transformers
      • Flax
      • TensorFlow
      • TorchScript
      • XGBoost
      • Picklable Model
      • PyTorch
      • LightGBM
      • MLflow
      • CatBoost
      • fast.ai
      • EasyOCR
      • Keras
      • Ray
      • Detectron
    • Configurations
    • Batch inference
    • Exceptions
    • Container APIs
    • Types
  • BentoCloud
    • Deployment details
    • BentoCloud CLI
    • BentoCloud API
Back to top
View this page
Edit this page

BentoML¶

This section contains detailed API specifications. Use them to dig deeper into BentoML APIs and learn about all the options they provide.

  • Bento and model APIs
  • BentoML SDK
  • Bento build options
  • BentoML CLI
  • Client API
  • Framework APIs
  • Configurations
  • Batch inference
  • Exceptions
  • Container APIs
  • Types
Next
Bento and model APIs
Previous
Configure standby instances
Copyright © 2022-2025, bentoml.com
Made with Furo