Unified Model Serving Framework#
What is BentoML?#
BentoML is an open-source framework for serving ML models at production scale. Data Scientists and ML Engineers use BentoML to:
Accelerate and standardize the process of taking ML models to production across teams
Build reliable, scalable, and high performance model serving systems
Provide a flexible MLOps platform that grows with your Data Science needs
The BentoML version 1.0 is currently under beta preview release. For our most recent stable release, see the 0.13-LTS documentation.
A simple example of using BentoML in action. In under 10 minutes, you’ll be able to serve your ML model over an HTTP API endpoint, and build a docker image that is ready to be deployed in production.
A step-by-step tour of BentoML’s components and introduce you to its philosophy. After reading, you will see what drives BentoML’s design, and know what bento and runner stands for.
Best practices and example usages by the ML framework used for model training.
Example projects demonstrating BentoML usage in a variety of different scenarios.
Dive into BentoML’s advanced features, internals, and architecture, including GPU support, inference graph, monitoring, and performance optimization.
Join us in our Slack community where hundreds of ML practitioners are contributing to the project, helping other users, and discuss all things MLOps.
For MLOps engineers:
The BentoML Blog and @bentomlai on Twitter are the official source for updates from the BentoML team. Anything important, including major releases and announcements, will be posted there. We also frequently share tutorials, case studies, and community updates there.
Why are we building BentoML?#
Model deployment is one of the last and most important stages in the machine learning life cycle: only by putting a machine learning model into a production environment and making predictions for end applications, the full potential of ML can be realized.
Sitting at the intersection of data science and engineering, model deployment introduces new operational challenges between these teams. Data scientists, who are typically responsible for building and training the model, often don’t have the expertise to bring it into production. At the same time, engineers, who aren’t used to working with models that require continuous iteration and improvement, find it challenging to leverage their know-how and common practices (like CI/CD) to deploy them. As the two teams try to meet halfway to get the model over the finish line, time-consuming and error-prone workflows can often be the result, slowing down the pace of progress.
We at BentoML want to get your ML models shipped in a fast, repeatable, and scalable way. BentoML is designed to streamline the handoff to production deployment, making it easy for developers and data scientists alike to test, deploy, and integrate their models with other systems.
With BentoML, data scientists can focus primarily on creating and improving their models, while giving deployment engineers peace of mind that nothing in the deployment logic is changing and that production service is stable.
BentoML has a thriving open source community where hundreds of ML practitioners are contributing to the project, helping other users and discuss all things MLOps. 👉 Join us on slack today!