PyTorch

Users can now use PyTorch with BentoML with the following API: load, save, and load_runner as follow:

import pandas as pd
import torch
import bentoml

class NGramLanguageModeler(nn.Module):

   def __init__(self, vocab_size, embedding_dim, context_size):
         super(NGramLanguageModeler, self).__init__()
         self.embeddings = nn.Embedding(vocab_size, embedding_dim)
         self.linear1 = nn.Linear(context_size * embedding_dim, 128)
         self.linear2 = nn.Linear(128, vocab_size)

   def forward(self, inputs):
         embeds = self.embeddings(inputs).view((1, -1))
         out = F.relu(self.linear1(embeds))
         out = self.linear2(out)
         log_probs = F.log_softmax(out, dim=1)
         return log_probs

# `save` a given classifier and retrieve coresponding tag:
tag = bentoml.pytorch.save("ngrams", NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE))

# retrieve metadata with `bentoml.models.get`:
metadata = bentoml.models.get(tag)

# `load` the model back in memory:
model = bentoml.pytorch.load("ngrams:latest")

# Run a given model under `Runner` abstraction with `load_runner`
input_data = pd.from_csv("/path/to/csv")
runner = bentoml.pytorch.load_runner(tag)
runner.run(pd.DataFrame("/path/to/csv"))

Note

You can find more examples for PyTorch in our gallery repo.

bentoml.pytorch.save(name, model, *, labels=None, custom_objects=None, metadata=None)

Save a model instance to BentoML modelstore.

Parameters
  • name (str) – Name for given model instance. This should pass Python identifier check.

  • model (torch.nn.Module) – Instance of model to be saved

  • labels (Dict[str, str], optional, default to None) – user-defined labels for managing models, e.g. team=nlp, stage=dev

  • custom_objects (Dict[str, Any]], optional, default to None) – user-defined additional python objects to be saved alongside the model, e.g. a tokenizer instance, preprocessor function, model configuration json

  • metadata (Dict[str, Any], optional, default to None) – Custom metadata for given model.

  • model_store (ModelStore, default to BentoMLContainer.model_store) – BentoML modelstore, provided by DI Container.

Returns

A tag with a format name:version where name is the user-defined model’s name, and a generated version by BentoML.

Return type

Tag

Examples:

import torch
import bentoml

class NGramLanguageModeler(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):
        super(NGramLanguageModeler, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        self.linear2 = nn.Linear(128, vocab_size)

    def forward(self, inputs):
        embeds = self.embeddings(inputs).view((1, -1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out, dim=1)
        return log_probs

tag = bentoml.pytorch.save("ngrams", NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE))
# example tag: ngrams:20201012_DE43A2

Integration with Torch Hub and BentoML:

import torch
import bentoml

resnet50 = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
...
# trained a custom resnet50

tag = bentoml.pytorch.save("resnet50", resnet50)
bentoml.pytorch.load(tag, device_id='cpu', model_store=<simple_di.providers.SingletonFactory object>)

Load a model from BentoML local modelstore with given name.

Parameters
  • tag (Union[str, Tag]) – Tag of a saved model in BentoML local modelstore.

  • device_id (str, optional, default to cpu) – Optional devices to put the given model on. Refers to device attributes.

  • model_store (ModelStore, default to BentoMLContainer.model_store) – BentoML modelstore, provided by DI Container.

Returns

an instance of torch.nn.Module from BentoML modelstore.

Return type

torch.nn.Module

Examples:

import bentoml
model = bentoml.pytorch.load('lit_classifier:latest', device_id="cuda:0")
bentoml.pytorch.load_runner(tag, *, predict_fn_name='__call__', partial_kwargs=None, name=None)

Runner represents a unit of serving logic that can be scaled horizontally to maximize throughput. bentoml.pytorch.load_runner implements a Runner class that wrap around a pytorch instance, which optimize it for the BentoML runtime.

Parameters
  • tag (Union[str, Tag]) – Tag of a saved model in BentoML local modelstore.

  • predict_fn_name (str, default to __call__) – inference function to be used.

  • partial_kwargs (Dict[str, Any], optional, default to None) – Common kwargs passed to model for this runner

Returns

Runner instances for bentoml.pytorch model

Return type

Runner

Examples:

import bentoml
import pandas as pd

runner = bentoml.pytorch.load_runner("ngrams:latest")
runner.run(pd.DataFrame("/path/to/csv"))