BentoML is an open-source platform for high-performance machine learning model serving. It makes it easy to build production API endpoints for trained ML models and supports all major machine learning frameworks, including Tensorflow, Keras, PyTorch, XGBoost, scikit-learn, fastai, etc.
BentoML comes with a high-performance API model server with adaptive micro-batching support, bringing the advantage of batch processing to online serving workloads. It also provides batch serving, model management and model deployment functionality, which gives ML teams an end-to-end model serving solution with baked-in DevOps best practices.
💻 Get started with BentoML: Quickstart Guide.
👩💻 Star/Watch/Fork the BentoML Github Repository.
👉 Join the BentoML Slack to follow the latest development updates and roadmap discussions.
What does BentoML do?¶
Turn trained ML model into production API endpoint with a few lines of code
Support all major machine learning training frameworks
End-to-end model serving solution with DevOps best practices baked-in
Micro-batching support, bringing the advantage of batch processing to online serving
Model management for teams, providing CLI access and Web UI dashboard
Flexible model deployment orchestration supporting Docker, Kubernetes, AWS Lambda, SageMaker, Azure ML and more
Getting Machine Learning models into production is hard. Data Scientists are not experts in building production services and DevOps best practices. The trained models produced by a Data Science team are hard to test and hard to deploy. This often leads us to a time consuming and error-prone workflow, where a pickled model or weights file is handed over to a software engineering team.
BentoML is an end-to-end solution for model serving, making it possible for Data Science teams to build production-ready model serving endpoints, with common DevOps best practices and performance optimizations baked in.
Check out Frequently Asked Questions page on how does BentoML compares to Tensorflow-serving, Clipper, AWS SageMaker, MLFlow, etc.
- Getting Started
- Core Concepts
- Example Projects
- Advanced Guides
- Offline Batch Serving
- Monitoring with Prometheus
- Request Logging
- Understanding BentoML adaptive micro batching
- 1. The overall architecture of BentoML’s micro-batching server
- 2. parameter tuning best practices & recommendations
- 3. How to implement batch mode for custom handlers
- 4. Comparison
- Adding Custom Model Artifact
- Customizing BentoHandler
- Deploy yatai server behind NGINX
- Deployment Guides
- Deploying to AWS Lambda
- Deploying to AWS SageMaker
- Deploying to Clipper Cluster
- Deploying to AWS ECS(Elastic Container Service)
- Deploying to Google Cloud Run
- Deploying to Azure Container Instance
- Deploying to Kubernetes Cluster
- Deploying to KNative
- Deploying to Kubeflow
- Deploying to KFServing
- Deploying to Heroku
- API Reference
- CLI Reference
- Frequently Asked Questions
- Why BentoML?
- How does BentoML compare to Tensorflow-serving?
- How does BentoML compare to Clipper?
- How does BentoML compare to AWS SageMaker?
- How does BentoML compare to MLFlow?
- Does BentoML do horizontal scaling?
- How does BentoML compare with Cortex?
- How does BentoML compare to Seldon?
- Is there a plan for R support?