fastai is a popular deep learning library which provides high-level components for practioners to get state-of-the-art results in standard deep learning domains, as well as low-level components for researchers to build new approaches. To learn more about fastai, visit their documentation.

BentoML provides native support for fastai, and this guide provides an overview of how to use BentoML with fastai.


BentoML requires fastai version 2 or higher to be installed.

BentoML does not support fastai version 1. If you are using fastai version 1, consider using Custom Runner.

Saving a trained fastai learner#

This example is based on Transfer Learning with text from fastai.

from fastai.basics import URLs
from fastai.metrics import accuracy
from import DataBlock
from import TextBlock
from import untar_data
from import CategoryBlock
from fastai.text.models import AWD_LSTM
from fastai.text.learner import text_classifier_learner
from import parent_label
from import get_text_files
from import GrandparentSplitter

# Download IMDB dataset
path = untar_data(URLs.IMDB)

# Create IMDB DataBlock
imdb = DataBlock(
    blocks=(TextBlock.from_folder(path), CategoryBlock),
dls = imdb.dataloaders(path)

# define a Learner object
learner = text_classifier_learner(
     dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy

# quickly fine tune the model
learner.fine_tune(4, 1e-2)

# output:
# epoch     train_loss  valid_loss  accuracy  time
# 0         0.453252    0.395130    0.822080  36:45

learner.predict("I really liked that movie!")

# output:
# ('pos', TensorText(1), TensorText([0.1216, 0.8784]))

After training, use save_model to save the Learner instance to BentoML model store.

bentoml.fastai.save_model("fastai_sentiment", learner)

To verify that the saved learner can be loaded properly:

learner = bentoml.fastai.load_model("fastai_sentiment:latest")

learner.predict("I really liked that movie!")

Building a Service using fastai#

See also

Building a Service: more information on creating a prediction service with BentoML.

import bentoml

import numpy as np

from import Text
from import NumpyNdarray

runner = bentoml.fastai.get("fastai_sentiment:latest").to_runner()

svc = bentoml.Service("fast_sentiment", runners=[runner])

@svc.api(input=Text(), output=NumpyNdarray())
def classify_text(text: str) -> np.ndarray:
   # returns sentiment score of a given text
   res =
   return np.asarray(res[-1])

When constructing a bentofile.yaml, there are two ways to include fastai as a dependency, via python or conda:

    - fastai
  - fastchan
  - fastai

Using Runners#

See also

See Using Runners doc for a general introduction to the Runner concept and its usage. is generally a drop-in replacement for learner.predict regardless of the learner type for executing the prediction in the model runner. A fastai runner will receive the same inputs type as the given learner.

For example, Runner created from a Tabular learner model will accept a pandas.DataFrame as input, where as a Text learner based runner will accept a str as input.

Using PyTorch layer#

Since fastai is built on top of PyTorch, it is also possible to use PyTorch models from within a fastai learner directly for inference. Note that by using the PyTorch layer, you will not be able to use the fastai Learner’s features such as .predict(), .get_preds(), etc.

To get the PyTorch model, access it via learner.model:

import bentoml

   "my_pytorch_model", learner.model, signatures={"__call__": {"batchable": True}}

Learn more about using PyTorch with BentoML here.

Using GPU#

Since fastai doesn’t support using GPU for inference, BentoML can only support CPU inference with fastai models.

Additionally, if the model uses mixed_precision, then the loaded model will also be converted to FP32. See mixed precision to learn more about mixed precision.

If you need to use GPU for inference, you can use the PyTorch layer.

Adaptive batching#

fastai’s Learner#predict does not support taking batch input for inference, hence the adaptive batching feature in BentoML is not available for fastai models.

The default signature has batchable set to False.

If you need to use adaptive batching for inference, you can use the PyTorch layer.