PyTorch#

BentoML provides native support for serving and deploying models trained from PyTorch. For more in-depth tutorials about PyTorch, please visit PyTorch’s official documentation

Preface#

If you have already compiled your PyTorch model to TorchScript, you should consider using BentoML’s first-class module bentoml.torchscript instead, as it is less likely to cause compatibility issues during production.

Note

Remarks: We recommend users to apply model optimization techniques such as distillation or quantization . Alternatively, PyTorch models can also be converted to ONNX models and leverage different runtimes (e.g. TensorRT, Apache TVM, etc.) for better performance.

Saving a Trained Model#

For common PyTorch models with single input:

train.py#

import bentoml
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
[transforms.ToTensor(),
 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


model = Net()

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # a small epoch just for demostration purpose
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        print('Epoch: %d, Step: %d, Loss: %.4f' % (epoch, i, loss.item()))

bentoml.pytorch.save_model(
    "my_torch_model",
    model,
    signatures={"__call__": {"batchable": True, "batch_dim": 0}},
)

bentoml.pytorch also supports saving models that take multiple tensors as input:

train.py#

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


class Net(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x, y):
        return x + y


model = Net()
... # training

bentoml.pytorch.save_model(
    "my_torch_model",
    model,
    signatures={"__call__": {"batchable": True, "batch_dim": 0}},
)

Note

Remarks: External python classes or utility functions required by the model must be referenced in <module>.<class> format, and such modules should be passed to bentoml.pytorch.save_model via external_modules. For example:

train.py#

import my_models

model = my_models.MyModel()
bentoml.pytorch.save_model(
    "my_torch_model",
    model,
    signatures={"__call__": {"batchable": True, "batch_dim": 0}},
    external_modules=[my_models],
)

This is due to a limitation from PyTorch model serialisation, where PyTorch requires the model’s source code to restore it.

A better practice is to compile your model to TorchScript format.

Note

bentoml.pytorch.save_model has parameter signatures. The signatures argument of type Model signatures in bentoml.pytorch.save_model is used to determine which methods will be used for inference and exposed in the Runner. The signatures dictionary will then be used during the creation process of a Runner instance.

The signatures used for creating a Runner is {"__call__": {"batchable": False}}. This means by default, BentoML’s Adaptive Batching is disabled when using save_model(). If you want to utilize adaptive batching behavior and know your model’s dynamic batching dimension, make sure to pass in signatures as follow:

bentoml.pytorch.save_model("my_model", model, signatures={"__call__": {"batch_dim": 0, "batchable": True}})

Building a Service#

Create a BentoML service with the previously saved my_torch_model pipeline using the bentoml.pytorch framework APIs.

service.py#

runner = bentoml.pytorch.get("my_torch_model").to_runner()

svc = bentoml.Service(name="test_service", runners=[runner])

@svc.api(input=JSON(), output=JSON())
async def predict(json_obj: JSONSerializable) -> JSONSerializable:
    batch_ret = await runner.async_run([json_obj])
    return batch_ret[0]

Note

Follow the steps to get the best performance out of your PyTorch model. #. Apply adaptive batching if possible. #. Serve on GPUs if applicable. #. See performance guide from PyTorch Model Opt Doc

Adaptive Batching#

Most PyTorch models can accept batched data as input. If batched interence is supported, it is recommended to enable batching to take advantage of the adaptive batching capability to improve the throughput and efficiency of the model. Enable adaptive batching by overriding the signatures argument with the method name and providing batchable and batch_dim configurations when saving the model to the model store.