BentoML is an open-source framework for high-performance machine learning model serving. It makes it easy to build production API endpoints for trained ML models and supports all major machine learning frameworks, including Tensorflow, Keras, PyTorch, XGBoost, scikit-learn, fastai, etc.
BentoML comes with a high-performance API model server with adaptive micro-batching support, bringing the advantage of batch processing to online serving workloads. It also provides batch serving, model registry and cloud deployment functionality, which gives ML teams an end-to-end model serving solution with baked-in DevOps best practices.
💻 Get started with BentoML: Quickstart Guide.
👩💻 Star/Watch/Fork the BentoML Github Repository.
What does BentoML do?¶
Create API endpoint serving trained models with just a few lines of code
Support all major machine learning training frameworks
High-Performance online API serving with adaptive micro-batching support
Model Registry for teams, providing Web UI dashboard and CLI/API access
Flexible deployment orchestration with DevOps best practices baked-in, supporting Docker, Kubernetes, Kubeflow, Knative, AWS Lambda, SageMaker, Azure ML, GCP and more
Getting Machine Learning models into production is hard. Data Scientists are not experts in building production services and DevOps best practices. The trained models produced by a Data Science team are hard to test and hard to deploy. This often leads us to a time consuming and error-prone workflow, where a pickled model or weights file is handed over to a software engineering team.
BentoML is an end-to-end solution for model serving, making it possible for Data Science teams to build production-ready model serving endpoints, with common DevOps best practices and performance optimizations baked in.
Check out Frequently Asked Questions page on how does BentoML compares to Tensorflow-serving, Clipper, AWS SageMaker, MLFlow, etc.
- Getting Started
- Core Concepts
- Example Projects
- Advanced Guides
- Offline Batch Serving
- Monitoring with Prometheus
- Request Logging
- Understanding BentoML adaptive micro batching
- 1. The overall architecture of BentoML’s micro-batching server
- 2. parameter tuning best practices & recommendations
- 3. How to implement batch mode for custom input adapters
- 4. Comparison
- Adding Custom Model Artifact
- Customizing InputAdapter(Former BentoHandler)
- Deploy yatai server behind NGINX
- Deployment Guides
- Deploying to AWS Lambda
- Deploying to AWS SageMaker
- Deploying to Azure Functions
- Deploying to Clipper Cluster
- Deploying to AWS ECS(Elastic Container Service)
- Deploying to Google Cloud Run
- Deploying to Azure Container Instance
- Deploying to Kubernetes Cluster
- Deploying to KNative
- Deploying to Kubeflow
- Deploying to KFServing
- Deploying to Heroku
- API Reference
- CLI Reference
- Frequently Asked Questions
- Why BentoML?
- How does BentoML compare to Tensorflow-serving?
- How does BentoML compare to Clipper?
- How does BentoML compare to AWS SageMaker?
- How does BentoML compare to MLFlow?
- Does BentoML do horizontal scaling?
- How does BentoML compare with Cortex?
- How does BentoML compare to Seldon?
- Is there a plan for R support?