About this page

This is an API reference for πŸ€— Transformers in BentoML. Please refer to Transformers guide for more information about how to use Hugging Face Transformers in BentoML.

bentoml.transformers.save_model(name: str, pipeline: ext.TransformersPipeline, task_name: str | None = None, task_definition: dict[str, t.Any] | None = None, *, signatures: ModelSignaturesType | None = None, labels: dict[str, str] | None = None, custom_objects: dict[str, t.Any] | None = None, external_modules: t.List[ModuleType] | None = None, metadata: dict[str, t.Any] | None = None) bentoml.Model[source]#

Save a model instance to BentoML modelstore.

  • name – Name for given model instance. This should pass Python identifier check.

  • pipeline –

    Instance of the Transformers pipeline to be saved.

    See module src/transformers/pipelines/ for more details.

  • task_name – Name of pipeline task. If not provided, the task name will be derived from pipeline.task.

  • task_definition –

    Task definition for the Transformers custom pipeline. The definition is a dictionary

    consisting of the following keys:

    • impl (str): The name of the pipeline implementation module. The name should be the same as the pipeline passed in the pipeline argument.

    • tf (tuple[AnyType]): The name of the Tensorflow auto model class. One of tf and pt auto model class argument is required.

    • pt (tuple[AnyType]): The name of the PyTorch auto model class. One of tf and pt auto model class argument is required.

    • default (Dict[str, AnyType]): The names of the default models, tokenizers, feature extractors, etc.

    • type (str): The type of the pipeline, e.g. text, audio, image, multimodal.


    task_definition = {
        "impl": Text2TextGenerationPipeline,
        "tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),
        "pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),
        "default": {"model": {"pt": "t5-base", "tf": "t5-base"}},
        "type": "text",

    See module src/transformers/pipelines/ for more details.

  • signatures – Methods to expose for running inference on the target model. Signatures are used for creating Runner instances when serving model with Service

  • labels – User-defined labels for managing models, e.g. team=nlp, stage=dev.

  • custom_objects –

    Custom objects to be saved with the model. An example is {"my-normalizer": normalizer}.

    Custom objects are currently serialized with cloudpickle, but this implementation is subject to change.

  • external_modules (List[ModuleType], optional, default to None) – user-defined additional python modules to be saved alongside the model or custom objects, e.g. a tokenizer module, preprocessor module, model configuration module

  • metadata – Custom metadata for given model.


Both arguments task_name and task_definition must be provided to set save a custom pipeline.


A tag with a format name:version where name is the user-defined model’s name, and a generated version.

Return type:



import bentoml

from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
model = AutoModelForCausalLM.from_pretrained("distilgpt2")
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
bento_model = bentoml.transformers.save_model("text-generation-pipeline", generator)
bentoml.transformers.load_model(bento_model: str | Tag | Model, **kwargs: t.Any) ext.TransformersPipeline[source]#

Load the Transformers model from BentoML local modelstore with given name.

  • bento_model (str | Tag | Model) – Either the tag of the model to get from the store, or a BentoML ~bentoml.Model instance to load the model from.

  • kwargs (Any) – Additional keyword arguments to pass to the model.


The Transformers pipeline loaded from the model store.

Return type:



import bentoml
pipeline = bentoml.transformers.load_model('my_model:latest')
bentoml.transformers.get(tag_like: str | bentoml._internal.tag.Tag) Model[source]#

Get the BentoML model with the given tag.


tag_like – The tag of the model to retrieve from the model store.


A BentoML Model with the matching tag.

Return type:



import bentoml
# target model must be from the BentoML model store
model = bentoml.transformers.get("my_pipeline:latest")