Both bentoml.keras and bentoml.tensorflow support Keras models. bentoml.keras utilizes the native model format and will give a better development experience to users who are more familiar with Keras models. However, the native model format of Keras is not optimized for production inference. There are known reports of memory leaks during serving time at the time of BentoML 1.0 release, so bentoml.tensorflow is recommended in production environments. You can read bentoml.tensorflow documentation for more information.

You can also convert a Keras model to ONNX model and use bentoml.onnx to serve in production. Refer bentoml.onnx documentation and tensorflow-onnx (tf2onnx) documentation for more information.


BentoML requires TensorFlow version 2.7.3 or higher to be installed.

Saving a Keras Model#

The following example loads a pre-trained ResNet50 model.

import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50

# Use pre-trained ResNet50 weights
model = ResNet50(weights='imagenet')

# try a sample input with created model
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions

img_path = 'ade20k.jpg'

img = image.load_img(img_path, target_size=(224, 224))

x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

preds = model.predict(x)
print('Keras Predicted:', decode_predictions(preds, top=3)[0])

# output:
# Keras Predicted: [('n04285008', 'sports_car', 0.3447785)]

After the Keras model is ready, use save_model to save the model instance to BentoML model store.

bentoml.keras.save_model("keras_resnet50", model)

Keras model can be loaded with load_model to verify that the saved model can be loaded properly.

model = bentoml.keras.load_model("keras_resnet50:latest")


Building a Service using Keras#

See also

See Building a Service for more information on creating a prediction service with BentoML.

The following service example creates a predict API endpoint that accepts an image as input and return JSON data as output. Within the API function, Keras model runner created from the previously saved ResNet50 model is used for inference.

import bentoml

import numpy as np
from import Image
from import JSON

runner = bentoml.keras.get("keras_resnet50:latest").to_runner()

svc = bentoml.Service("keras_resnet50", runners=[runner])

@svc.api(input=Image(), output=JSON())
def predict(img):

    from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions

    img = img.resize((224, 224))
    arr = np.array(img)
    arr = np.expand_dims(arr, axis=0)
    arr = preprocess_input(arr)
    preds =
    return decode_predictions(preds, top=1)[0]

When constructing a bentofile.yaml, there are two ways to include Keras as a dependency, via python (if using pip) or conda:

    - tensorflow
  - conda-forge
  - tensorflow

Using Runners#

See also

See Using Runners doc for a general introduction to the Runner concept and its usage. is generally a drop-in replacement for model.predict for executing the prediction in the model runner. When predict is the only prediction method exposed by runner model, you can just use instead of