Detectron2 is Facebook AI Research’s (FAIR) next generation library that provides state-of-the-art detection and segmentation algorithms. It is the successor of Detectron and maskrcnn-benchmark. It supports a number of computer vision research projects and production applications in Facebook.

This guide will provide an overview of how to save and load Detectron2 models with BentoML.


Since the official way to install Detectron2 from their installation guide is from the repository git source, The current implementation has been tested with Detectron2 since this commit

Saving a Detectron2 model#

The following example is excerpted from Detectron2’s official demo.

We will be using a Masked-RCNN trained on the COCO datasets.

import detectron2.config as Cf
import detectron2.model_zoo as ModelZoo
import detectron2.modeling as Modeling

cfg = Cf.get_cfg()
cfg.MODEL.WEIGHTS = ModelZoo.get_checkpoint_url(model_url)
cfg = cfg.clone()
cfg.MODEL.DEVICE = "cpu"  # Change device to run on CPU
model = Modeling.build_model(cfg)

To run an example inputs, use the following image from the COCO dataset:

import requests
import PIL.Image
import numpy as np
from import transforms as Transforms

url = ""
augmentation = Transforms.ResizeShortestEdge([800,800], 1333)

im = np.asarray(, stream=True).raw).convert("RGB"))
arr = augmentation.get_transform(im).apply_image(im)

# Running prediction
outputs = model({'image': torch.as_tensor(arr.transpose(2, 0, 1))})

To save the above model, pass in both the model and the config instance to save_model:

bentoml.detectron.save_model("coco-masked-rcnn", model, config=cfg)

Detectron2 also provides a DefaultPredictor class that can be used to run inference on the model quickly.

BentoML also supports saving this predictor with bentoml.detectron.save_model. Do the following:

from detectron2.engine import DefaultPredictor

predictor = DefaultPredictor(cfg)
bentoml.detectron.save_model("coco-masked-rcnn-predictor", predictor)


Remarks: External python classes or utility functions required by the Detectron models/custom models must be referenced in <module>.<class> format, and such modules should be passed to bentoml.detectron.save_model via external_modules.

For example:

import dit

predictor = dit.get_predictor()

bentoml.detectron.save_model("dit-predictor", predictor, external_modules=[dit])

This is due to a limitation from PyTorch model serialisation, where PyTorch requires the model’s source code to restore it.

The signatures used for creating a Runner is {"__call__": {"batchable": False}}. This means by default, BentoML’s Adaptive Batching is disabled when using save_model(). If you want to utilize adaptive batching behavior and know your model’s dynamic batching dimension, make sure to pass in signatures as follow:

bentoml.detectron.save_model("dit-predictor", predictor, signatures={"__call__": {"batch_dim": 0, "batchable": True}})

About custom_objects

BentoML will save the Detectron config ConfigNode instance into the custom_objects attribute via config. Make sure when using custom_objects to not overwrite this naming.

Building a Service#

Create a BentoML service with the previously saved coco-masked-rcnn-predictor model using the bentoml.detectron framework APIs.
import bentoml
import PIL.Image
import numpy as np
from import MetadataCatalog
from detectron2.utils.visualizer import ColorMode
from detectron2.utils.visualizer import Visualizer

bentomodel = bentoml.detectron.get("coco-masked-rcnn-predictor")
cfg = model.custom_objects['config']
runner = bentomodel.to_runner()

svc = bentoml.Service(name="masked-rcnn", runners=[runner])

async def predict(im: PIL.Image.Image) -> PIL.Image.Image:
    md = MetadataCatalog.get(cfg.DATASETS.TEST[0])
    tensor = np.array(im)
    output = await predictor.async_run(tensor)
    instances = output['instances']
    v = Visualizer(
        tensor[:, :, ::-1], md, scale=1.0, instance_mode=ColorMode.SEGMENTATION
    res = v.draw_instance_predictions("cpu"))
    return Image.fromarray(res.get_image()[:, :, ::-1])

Adaptive Batching#

Most Detectron models can accept batched data as input. If batched interence is supported, it is recommended to enable batching to take advantage of the adaptive batching capability to improve the throughput and efficiency of the model. Enable adaptive batching by overriding the signatures argument with the method name and providing batchable and batch_dim configurations when saving the model to the model store.

See also

See Adaptive Batching to learn more.


You can find more examples for Detectron in our bentoml/examples directory.