Transformers¶
Users can now use Transformers with BentoML with the following API: load
, save
, and load_runner
as follow:
import bentoml
# `import` a pretrained model and retrieve coresponding tag:
tag = bentoml.transformers.import_from_huggingface_hub("distilbert-base-uncased-finetuned-sst-2-english")
# retrieve metadata with `bentoml.models.get`:
metadata = bentoml.models.get(tag)
# Load a given model under `Runner` abstraction with `load_runner`
runner = bentoml.transformers.load_runner(tag, tasks="text-classification")
batched_sentence = [
"I love you and I want to spend my whole life with you",
"I hate you, Lyon, you broke my heart.",
]
runner.run_batch(batched_sentence)
We also offer import_from_huggingface_hub
which enables users to import model from HuggingFace Models and use it with BentoML:
import bentoml
import requests
from PIL import Image
tag = bentoml.transformers.import_from_huggingface_hub("google/vit-large-patch16-224")
runner = bentoml.transformers.load_runner(
tag,
tasks="image-classification",
device=-1,
feature_extractor="google/vit-large-patch16-224",
model_store=modelstore,
)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
res = runner.run_batch(image)
Note
You can find more examples for Transformers in our gallery repo.
- bentoml.transformers.save(name, obj, *, tokenizer=None, feature_extractor=None, labels=None, custom_objects=None, metadata=None)¶
Save a model instance to BentoML modelstore.
- Parameters
name (
str
) – Name for given model instance. This should pass Python identifier check.obj (
Union[transformers.PreTrainedModel, transformers.TFPreTrainedModel, transformers.FlaxPreTrainedModel]
) – Model/Pipeline instance provided bytransformers
. This can be retrieved from theirAutoModel
class. You can also use any type of models/automodel provided bytransformers
. Refers to Models API for more information.tokenizer (
Union[transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast]
, optional, default to None) – Tokenizer instance provided bytransformers
. This can be retrieved from their theirAutoTokenizer
class. You can also use any type of Tokenizer accordingly to your use case provided bytransformers
. Refers to Tokenizer API for more informationfeature_extractor (
transformers.PreTrainedFeatureExtractor
, optional, default to None) – Feature Extractor instance provided bytransformers
. This can be retrieved from their theirAutoFeatureExtractor
class. You can also use any type of Feature Extractor accordingly to your use case provided bytransformers
. Refers to Feature Extractor API for more informationlabels (
Dict[str, str]
, optional, default toNone
) – user-defined labels for managing models, e.g. team=nlp, stage=devcustom_objects (
Dict[str, Any]]
, optional, default toNone
) – user-defined additional python objects to be saved alongside the model, e.g. a tokenizer instance, preprocessor function, model configuration jsonmetadata (
Dict[str, Any]
, optional, default toNone
) – Custom metadata for given model.model_store (
ModelStore
, default toBentoMLContainer.model_store
) – BentoML modelstore, provided by DI Container.
- Returns
A
tag
with a format name:version where name is the user-defined model’s name, and a generated version by BentoML.- Return type
Examples:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer import bentoml model = AutoModelForQuestionAnswering.from_pretrained("gpt2", from_flax=True) tokenizer = AutoTokenizer.from_pretrained("gpt2", from_flax=True) # transfer training and modification goes here ... tag = bentoml.transformers.save("flax_gpt2", model=model, tokenizer=tokenizer)
- bentoml.transformers.load(tag, from_tf=False, from_flax=False, *, return_config=False, model_store=<simple_di.providers.SingletonFactory object>, **kwargs)¶
Load a model from BentoML local modelstore with given name.
- Parameters
tag (
Union[str, Tag]
) – Tag of a saved model in BentoML local modelstore.model_store (
ModelStore
, default toBentoMLContainer.model_store
) – BentoML modelstore, provided by DI Container.from_tf (
bool
, optional, defaults toFalse
) – Load the model weights from a TensorFlow checkpoint save file.from_flax (
bool
, optional, defaults toFalse
) – Load the model weights from a Flax checkpoint save filereturn_config (
bool
, optional, default toFalse
) – Whether or not to return configuration of the Transformers model.config_kwargs (
Dict[str, Any]
, optional) – Kwargs to pass intoConfig
object.model_kwargs (
Dict[str, Any]
, optional) – Kwargs to pass intoModel
object.tokenizer_kwargs (
Dict[str, Any]
, optional) – Kwargs to pass intoTokenizer
object.feature_extractor_kwargs (
Dict[str, Any]
, optional) – Kwargs to pass intoFeatureExtractor
object.kwargs (
Dict[str, Any]
, optional) – Other kwargs that can be parsed to transformers that is neither configs, model, tokenizer, and feature extractor.warnings:: (..) – Make sure to add the corresponding kwargs for your Transformers
Model
,Tokenizer
,Config
,FeatureExtractor
to the correct kwargs dict.warnings:: – Currently
kwargs
accepts all kwargs for corresponding Pipeline.
- Returns
either returning a pipeline or a tuple containing
PretrainedConfig
,Model
class object defined bytransformers
, with an optionalTokenizer
class, orFeatureExtractor
class for the given model saved in BentoML modelstore.- Return type
Union[Pipeline, Tuple[Optional[PretrainedConfig], Union[PreTrainedModel, TFPreTrainedModel, FlaxPreTrainedModel], Optional[Union[PreTrainedTokenizer, PreTrainedTokenizerFast, PreTrainedFeatureExtractor]]]]
Examples:
import bentoml model, tokenizer = bentoml.transformers.load('custom_gpt2')
If you want to returns an config object:
import bentoml config, model, tokenizer = bentoml.transformers.load('custom_gpt2', return_config=True, tokenizer_kwargs={"use_fast":True})
If the pipeline is saved with
bentoml.transformers.save()
, thenload()
will return pipeline objects:import bentoml pipeline = bentoml.transformers.load("roberta_text_classification", return_all_scores=True)
- bentoml.transformers.load_runner(tag, *, tasks, framework='pt', device=- 1, name=None, **pipeline_kwargs)¶
Runner represents a unit of serving logic that can be scaled horizontally to maximize throughput.
load_runner()
implements a Runner class that wrap around a transformers pipeline, which optimize it for the BentoML runtime.Warning
load_runner()
will try to load the model from giventag
. If the model does not exists, then BentoML will fallback to initialize pipelines from transformers, thus files will be loaded from huggingface cache.- Parameters
tag (
Union[str, Tag]
) – Tag of a saved model in BentoML local modelstore.tasks (
str
) – Given tasks for pipeline. Refers to Task Summary for more information.framework (
str
, default topt
) – Given frameworks supported by transformers: PyTorch, Tensorflowdevice (int, optional, default to
-1
) – Default GPU devices to be used by runner.**pipeline_kwargs (Any) – Refers to Pipeline Docs for more information on
kwargs
that is applicable for your specific pipeline.
- Returns
Runner instances for
bentoml.transformers
model- Return type
Runner
Examples:
import transformers import bentoml runner = bentoml.transformers.load_runner("gpt2:latest", tasks='zero-shot-classification', framework=tf) runner.run_batch(["In today news, ...", "The stocks market seems ..."])