Model Serving Made Easy
BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.
Supports Multiple ML frameworks, including Tensorflow, PyTorch, Keras, XGBoost and more
Cloud native deployment with Docker, Kubernetes, AWS, Azure and many more
High-Performance online API serving and offline batch serving
Web dashboards and APIs for model registry and deployment management
BentoML bridges the gap between Data Science and DevOps. By providing a standard interface for describing a prediction service, BentoML abstracts away how to run model inference efficiently and how model serving workloads can integrate with cloud infrastructures. See how it works!
👩💻 Star/Watch/Fork the BentoML Github Repository.
- Getting Started
- Core Concepts
- Advanced Guides
- Deployment Guides
- Deploying to AWS Lambda
- Deploying to AWS SageMaker
- Deploying to AWS EC2
- Deploying to Azure Functions
- Deploying to Kubernetes Cluster
- Deploying to AWS ECS(Elastic Container Service)
- Deploying to Heroku
- Deploying to Google Cloud Run
- Deploying to Azure Container Instance
- Deploying to KNative
- Deploying to Kubeflow
- Deploying to KFServing
- Deploying to Clipper Cluster
- Deploying to SQL Server Machine Learning Services
- Example Projects
- API Reference
- CLI Reference
- Frequently Asked Questions
- Why BentoML?
- How does BentoML compare to Tensorflow-serving?
- How does BentoML compare to Clipper?
- How does BentoML compare to AWS SageMaker?
- How does BentoML compare to MLFlow?
- Does BentoML do horizontal scaling?
- How does BentoML compare with Cortex?
- How does BentoML compare to Seldon?
- Is there a plan for R support?