Transformers

Users can now use Transformers with BentoML with the following API: load, save, and load_runner as follow:

import bentoml

# `import` a pretrained model and retrieve coresponding tag:
tag = bentoml.transformers.import_from_huggingface_hub("distilbert-base-uncased-finetuned-sst-2-english")

# retrieve metadata with `bentoml.models.get`:
metadata = bentoml.models.get(tag)

# Load a given model under `Runner` abstraction with `load_runner`
runner = bentoml.transformers.load_runner(tag, tasks="text-classification")

batched_sentence = [
   "I love you and I want to spend my whole life with you",
   "I hate you, Lyon, you broke my heart.",
]
runner.run_batch(batched_sentence)

We also offer import_from_huggingface_hub which enables users to import model from HuggingFace Models and use it with BentoML:

import bentoml
import requests
from PIL import Image

tag = bentoml.transformers.import_from_huggingface_hub("google/vit-large-patch16-224")

runner = bentoml.transformers.load_runner(
    tag,
    tasks="image-classification",
    device=-1,
    feature_extractor="google/vit-large-patch16-224",
    model_store=modelstore,
)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
res = runner.run_batch(image)

Note

You can find more examples for Transformers in our gallery repo.

bentoml.transformers.save(name, obj, *, tokenizer=None, feature_extractor=None, labels=None, custom_objects=None, metadata=None)

Save a model instance to BentoML modelstore.

Parameters
  • name (str) – Name for given model instance. This should pass Python identifier check.

  • obj (Union[transformers.PreTrainedModel, transformers.TFPreTrainedModel, transformers.FlaxPreTrainedModel]) – Model/Pipeline instance provided by transformers. This can be retrieved from their AutoModel class. You can also use any type of models/automodel provided by transformers. Refers to Models API for more information.

  • tokenizer (Union[transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast], optional, default to None) – Tokenizer instance provided by transformers. This can be retrieved from their their AutoTokenizer class. You can also use any type of Tokenizer accordingly to your use case provided by transformers. Refers to Tokenizer API for more information

  • feature_extractor (transformers.PreTrainedFeatureExtractor, optional, default to None) – Feature Extractor instance provided by transformers. This can be retrieved from their their AutoFeatureExtractor class. You can also use any type of Feature Extractor accordingly to your use case provided by transformers. Refers to Feature Extractor API for more information

  • labels (Dict[str, str], optional, default to None) – user-defined labels for managing models, e.g. team=nlp, stage=dev

  • custom_objects (Dict[str, Any]], optional, default to None) – user-defined additional python objects to be saved alongside the model, e.g. a tokenizer instance, preprocessor function, model configuration json

  • metadata (Dict[str, Any], optional, default to None) – Custom metadata for given model.

  • model_store (ModelStore, default to BentoMLContainer.model_store) – BentoML modelstore, provided by DI Container.

Returns

A tag with a format name:version where name is the user-defined model’s name, and a generated version by BentoML.

Return type

Tag

Examples:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import bentoml
model = AutoModelForQuestionAnswering.from_pretrained("gpt2", from_flax=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2", from_flax=True)

# transfer training and modification goes here
...

tag = bentoml.transformers.save("flax_gpt2", model=model, tokenizer=tokenizer)
bentoml.transformers.load(tag, from_tf=False, from_flax=False, *, return_config=False, model_store=<simple_di.providers.SingletonFactory object>, **kwargs)

Load a model from BentoML local modelstore with given name.

Parameters
  • tag (Union[str, Tag]) – Tag of a saved model in BentoML local modelstore.

  • model_store (ModelStore, default to BentoMLContainer.model_store) – BentoML modelstore, provided by DI Container.

  • from_tf (bool, optional, defaults to False) – Load the model weights from a TensorFlow checkpoint save file.

  • from_flax (bool, optional, defaults to False) – Load the model weights from a Flax checkpoint save file

  • return_config (bool, optional, default to False) – Whether or not to return configuration of the Transformers model.

  • config_kwargs (Dict[str, Any], optional) – Kwargs to pass into Config object.

  • model_kwargs (Dict[str, Any], optional) – Kwargs to pass into Model object.

  • tokenizer_kwargs (Dict[str, Any], optional) – Kwargs to pass into Tokenizer object.

  • feature_extractor_kwargs (Dict[str, Any], optional) – Kwargs to pass into FeatureExtractor object.

  • kwargs (Dict[str, Any], optional) – Other kwargs that can be parsed to transformers that is neither configs, model, tokenizer, and feature extractor.

  • warnings:: (..) – Make sure to add the corresponding kwargs for your Transformers Model, Tokenizer, Config, FeatureExtractor to the correct kwargs dict.

  • warnings:: – Currently kwargs accepts all kwargs for corresponding Pipeline.

Returns

either returning a pipeline or a tuple containing PretrainedConfig, Model class object defined by transformers, with an optional Tokenizer class, or FeatureExtractor class for the given model saved in BentoML modelstore.

Return type

Union[Pipeline, Tuple[Optional[PretrainedConfig], Union[PreTrainedModel, TFPreTrainedModel, FlaxPreTrainedModel], Optional[Union[PreTrainedTokenizer, PreTrainedTokenizerFast, PreTrainedFeatureExtractor]]]]

Examples:

import bentoml
model, tokenizer = bentoml.transformers.load('custom_gpt2')

If you want to returns an config object:

import bentoml
config, model, tokenizer = bentoml.transformers.load('custom_gpt2', return_config=True, tokenizer_kwargs={"use_fast":True})

If the pipeline is saved with bentoml.transformers.save(), then load() will return pipeline objects:

import bentoml
pipeline = bentoml.transformers.load("roberta_text_classification", return_all_scores=True)
bentoml.transformers.load_runner(tag, *, tasks, framework='pt', device=- 1, name=None, **pipeline_kwargs)

Runner represents a unit of serving logic that can be scaled horizontally to maximize throughput. load_runner() implements a Runner class that wrap around a transformers pipeline, which optimize it for the BentoML runtime.

Warning

load_runner() will try to load the model from given tag. If the model does not exists, then BentoML will fallback to initialize pipelines from transformers, thus files will be loaded from huggingface cache.

Parameters
  • tag (Union[str, Tag]) – Tag of a saved model in BentoML local modelstore.

  • tasks (str) – Given tasks for pipeline. Refers to Task Summary for more information.

  • framework (str, default to pt) – Given frameworks supported by transformers: PyTorch, Tensorflow

  • device (int, optional, default to -1) – Default GPU devices to be used by runner.

  • **pipeline_kwargs (Any) – Refers to Pipeline Docs for more information on kwargs that is applicable for your specific pipeline.

Returns

Runner instances for bentoml.transformers model

Return type

Runner

Examples:

import transformers
import bentoml
runner = bentoml.transformers.load_runner("gpt2:latest", tasks='zero-shot-classification', framework=tf)
runner.run_batch(["In today news, ...", "The stocks market seems ..."])