API IO Descriptors#

IO Descriptors are used for describing the input and output spec of a Service API. Here’s a list of built-in IO Descriptors and APIs for extending custom IO Descriptor.

NumPy ndarray#

Note

The numpy package is required to use the bentoml.io.NumpyNdarray.

Install it with pip install numpy and add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options.

bentofile.yaml#
...
python:
  packages:
    - numpy
bentofile.yaml#
...
conda:
  channels:
    - conda-forge
  dependencies:
    - numpy
class bentoml.io.NumpyNdarray(dtype: str | ext.NpDTypeLike | None = None, enforce_dtype: bool = False, shape: tuple[int, ...] | None = None, enforce_shape: bool = False)[source]#

NumpyNdarray defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from type numpy.ndarray as specified in your API function signature.

A sample service implementation:

service.py#
from __future__ import annotations

from typing import TYPE_CHECKING, Any

import bentoml
from bentoml.io import NumpyNdarray

if TYPE_CHECKING:
    from numpy.typing import NDArray

runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

svc = bentoml.Service("iris-classifier", runners=[runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_arr: NDArray[Any]) -> NDArray[Any]:
    return runner.run(input_arr)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

% curl -X POST -H "Content-Type: application/json" \
        --data '[[5,4,3,2]]' http://0.0.0.0:3000/predict

# [1]%
request.py#
 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='[{"0":5,"1":4,"2":3,"3":2}]'
 ).text
Parameters:
  • dtype – Data type users wish to convert their inputs/outputs to. Refer to arrays dtypes for more information.

  • enforce_dtype – Whether to enforce a certain data type. if enforce_dtype=True then dtype must be specified.

  • shape –

    Given shape that an array will be converted to. For example:

    service.py#
    from bentoml.io import NumpyNdarray
    
    @svc.api(input=NumpyNdarray(shape=(2,2), enforce_shape=False), output=NumpyNdarray())
    async def predict(input_array: np.ndarray) -> np.ndarray:
        # input_array will be reshaped to (2,2)
        result = await runner.run(input_array)
    

    When enforce_shape=True is provided, BentoML will raise an exception if the input array received does not match the shape provided.

    About the behaviour of shape

    If specified, then both bentoml.io.NumpyNdarray.from_http_request() and bentoml.io.NumpyNdarray.from_proto() will reshape the input array before sending it to the API function.

  • enforce_shape – Whether to enforce a certain shape. If enforce_shape=True then shape must be specified.

Returns:

IO Descriptor that represents a np.ndarray.

Return type:

IODescriptor

classmethod NumpyNdarray.from_sample(sample: IOType | t.Any, **kwargs: t.Any) Self#
async NumpyNdarray.from_proto(field: pb.NDArray | bytes) ext.NpNDArray[source]#

Process incoming protobuf request and convert it to numpy.ndarray

Parameters:
  • request – Incoming RPC request message.

  • context – grpc.ServicerContext

Returns:

A np.array constructed from given protobuf message.

Return type:

numpy.ndarray

Note

Currently, we support pb.NDArray and serialized_bytes as valid inputs. serialized_bytes will be prioritised over pb.NDArray if both are provided. Serialized bytes has a specialized bytes representation and should not be used by users directly.

async NumpyNdarray.from_http_request(request: Request) ext.NpNDArray[source]#

Process incoming requests and convert incoming objects to numpy.ndarray.

Parameters:

request – Incoming Requests

Returns:

a numpy.ndarray object. This can then be used

inside users defined logics.

async NumpyNdarray.to_proto(obj: ext.NpNDArray) pb.NDArray[source]#

Process given objects and convert it to grpc protobuf response.

Parameters:

obj – np.array that will be serialized to protobuf.

Returns:

Protobuf representation of given np.ndarray

Return type:

pb.NDArray

async NumpyNdarray.to_http_response(obj: ext.NpNDArray, ctx: Context | None = None)[source]#

Process given objects and convert it to HTTP response.

Parameters:
  • obj – np.ndarray that will be serialized to JSON

  • ctx – Context object that contains information about the request.

Returns:

HTTP Response of type starlette.responses.Response. This can

be accessed via cURL or any external web traffic.

Tabular Data with Pandas#

To use the IO descriptor, install bentoml with extra io-pandas dependency:

pip install "bentoml[io-pandas]"

Note

The pandas package is required to use the bentoml.io.PandasDataFrame or bentoml.io.PandasSeries.

Install it with pip install pandas and add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options.

bentofile.yaml#
...
python:
  packages:
    - pandas
bentofile.yaml#
...
conda:
  channels:
    - conda-forge
  dependencies:
    - pandas
class bentoml.io.PandasDataFrame(orient: ext.DataFrameOrient = 'records', columns: list[str] | None = None, apply_column_names: bool = False, dtype: bool | ext.PdDTypeArg | None = None, enforce_dtype: bool = False, shape: tuple[int, ...] | None = None, enforce_shape: bool = False, default_format: t.Literal['json', 'parquet', 'csv'] = 'json')[source]#

PandasDataFrame defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from type pd.DataFrame as specified in your API function signature.

A sample service implementation:

service.py#
from __future__ import annotations

import bentoml
import pandas as pd
import numpy as np
from bentoml.io import PandasDataFrame

input_spec = PandasDataFrame.from_sample(pd.DataFrame(np.array([[5,4,3,2]])))

runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

svc = bentoml.Service("iris-classifier", runners=[runner])

@svc.api(input=input_spec, output=PandasDataFrame())
def predict(input_arr):
    res = runner.run(input_arr)
    return pd.DataFrame(res)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

% curl -X POST -H "Content-Type: application/json" \
        --data '[{"0":5,"1":4,"2":3,"3":2}]' http://0.0.0.0:3000/predict

# [{"0": 1}]%
request.py#
 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='[{"0":5,"1":4,"2":3,"3":2}]'
 ).text
Parameters:
  • orient –

    Indication of expected JSON string format. Compatible JSON strings can be produced by pandas.io.json.to_json() with a corresponding orient value. Possible orients are:

    • split - dict[str, Any] ↦ {idx β†  [idx], columns β†  [columns], data β†  [values]}

    • records - list[Any] ↦ [{column β†  value}, …, {column β†  value}]

    • index - dict[str, Any] ↦ {idx β†  {column β†  value}}

    • columns - dict[str, Any] ↦ {column β†  {index β†  value}}

    • values - dict[str, Any] ↦ Values arrays

  • columns – List of columns name that users wish to update.

  • apply_column_names – Whether to update incoming DataFrame columns. If apply_column_names=True, then columns must be specified.

  • dtype – Data type users wish to convert their inputs/outputs to. If it is a boolean, then pandas will infer dtypes. Else if it is a dictionary of column to dtype, then applies those to incoming dataframes. If False, then don’t infer dtypes at all (only applies to the data). This is not applicable for orient='table'.

  • enforce_dtype – Whether to enforce a certain data type. if enforce_dtype=True then dtype must be specified.

  • shape –

    Optional shape check that users can specify for their incoming HTTP requests. We will only check the number of columns you specified for your given shape:

    service.py#
    import pandas as pd
    from bentoml.io import PandasDataFrame
    
    df = pd.DataFrame([[1, 2, 3]])  # shape (1,3)
    inp = PandasDataFrame.from_sample(df)
    
    @svc.api(
        input=PandasDataFrame(shape=(51, 10),
              enforce_shape=True),
        output=PandasDataFrame()
    )
    def predict(input_df: pd.DataFrame) -> pd.DataFrame:
        # if input_df have shape (40,9),
        # it will throw out errors
        ...
    

  • enforce_shape – Whether to enforce a certain shape. If enforce_shape=True then shape must be specified.

  • default_format –

    The default serialization format to use if the request does not specify a Content-Type Headers. It is also the serialization format used for the response. Possible values are:

    • json - JSON text format (inferred from content-type "application/json")

    • parquet - Parquet binary format (inferred from content-type "application/octet-stream")

    • csv - CSV text format (inferred from content-type "text/csv")

Returns:

IO Descriptor that represents a pd.DataFrame.

Return type:

PandasDataFrame

classmethod PandasDataFrame.from_sample(sample: IOType | t.Any, **kwargs: t.Any) Self#
async PandasDataFrame.from_proto(field: pb.DataFrame | bytes) ext.PdDataFrame[source]#

Process incoming protobuf request and convert it to pandas.DataFrame

Parameters:
  • request – Incoming RPC request message.

  • context – grpc.ServicerContext

Returns:

a pandas.DataFrame object. This can then be used

inside users defined logics.

async PandasDataFrame.from_http_request(request: Request) ext.PdDataFrame[source]#

Process incoming requests and convert incoming objects to pd.DataFrame

Parameters:

request (starlette.requests.Requests) – Incoming Requests

Returns:

a pd.DataFrame object. This can then be used

inside users defined logics.

Raises:

BadInput – Raised when the incoming requests are bad formatted.

async PandasDataFrame.to_proto(obj: ext.PdDataFrame) pb.DataFrame[source]#

Process given objects and convert it to grpc protobuf response.

Parameters:
  • obj – pandas.DataFrame that will be serialized to protobuf

  • context – grpc.aio.ServicerContext from grpc.aio.Server

Returns:

Protobuf representation of given pandas.DataFrame

Return type:

service_pb2.Response

async PandasDataFrame.to_http_response(obj: ext.PdDataFrame, ctx: Context | None = None) Response[source]#

Process given objects and convert it to HTTP response.

Parameters:

obj (pd.DataFrame) – pd.DataFrame that will be serialized to JSON or parquet

Returns:

HTTP Response of type starlette.responses.Response. This can

be accessed via cURL or any external web traffic.

class bentoml.io.PandasSeries(orient: ext.SeriesOrient = 'records', dtype: ext.PdDTypeArg | None = None, enforce_dtype: bool = False, shape: tuple[int, ...] | None = None, enforce_shape: bool = False)[source]#

PandasSeries defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from type pd.Series as specified in your API function signature.

A sample service implementation:

service.py#
 import bentoml
 import pandas as pd
 import numpy as np
 from bentoml.io import PandasSeries

 runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

 svc = bentoml.Service("iris-classifier", runners=[runner])

 @svc.api(input=PandasSeries(), output=PandasSeries())
 def predict(input_arr):
     res = runner.run(input_arr)  # type: np.ndarray
     return pd.Series(res)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

% curl -X POST -H "Content-Type: application/json" \
        --data '[{"0":5,"1":4,"2":3,"3":2}]' http://0.0.0.0:3000/predict

# [{"0": 1}]%
request.py#
 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='[{"0":5,"1":4,"2":3,"3":2}]'
 ).text
Parameters:
  • orient –

    Indication of expected JSON string format. Compatible JSON strings can be produced by pandas.io.json.to_json() with a corresponding orient value. Possible orients are:

    • split - dict[str, Any] ↦ {idx β†  [idx], columns β†  [columns], data β†  [values]}

    • records - list[Any] ↦ [{column β†  value}, …, {column β†  value}]

    • index - dict[str, Any] ↦ {idx β†  {column β†  value}}

    • columns - dict[str, Any] ↦ {column β†  {index β†  value}}

    • values - dict[str, Any] ↦ Values arrays

  • columns – List of columns name that users wish to update.

  • apply_column_names (bool, optional, default to False) –

  • apply_column_names – Whether to update incoming DataFrame columns. If apply_column_names=True, then columns must be specified.

  • dtype – Data type users wish to convert their inputs/outputs to. If it is a boolean, then pandas will infer dtypes. Else if it is a dictionary of column to dtype, then applies those to incoming dataframes. If False, then don’t infer dtypes at all (only applies to the data). This is not applicable for orient='table'.

  • enforce_dtype – Whether to enforce a certain data type. if enforce_dtype=True then dtype must be specified.

  • shape –

    Optional shape check that users can specify for their incoming HTTP requests. We will only check the number of columns you specified for your given shape: .. code-block:: python

    caption:

    service.py

    import pandas as pd from bentoml.io import PandasSeries

    @svc.api(input=PandasSeries(shape=(51,), enforce_shape=True), output=PandasSeries()) def infer(input_series: pd.Series) -> pd.Series: # if input_series has shape (40,), it will error

    …

  • enforce_shape – Whether to enforce a certain shape. If enforce_shape=True then shape must be specified.

Returns:

IO Descriptor that represents a pd.Series.

Return type:

PandasSeries

classmethod PandasSeries.from_sample(sample: IOType | t.Any, **kwargs: t.Any) Self#
async PandasSeries.from_proto(field: pb.Series | bytes) ext.PdSeries[source]#

Process incoming protobuf request and convert it to pandas.Series

Parameters:
  • request – Incoming RPC request message.

  • context – grpc.ServicerContext

Returns:

a pandas.Series object. This can then be used

inside users defined logics.

async PandasSeries.from_http_request(request: Request) ext.PdSeries[source]#

Process incoming requests and convert incoming objects to pd.Series.

Parameters:

request – Incoming Requests

Returns:

a pd.Series object. This can then be used inside users defined logics.

async PandasSeries.to_proto(obj: ext.PdSeries) pb.Series[source]#

Process given objects and convert it to grpc protobuf response.

Parameters:
  • obj – pandas.Series that will be serialized to protobuf

  • context – grpc.aio.ServicerContext from grpc.aio.Server

Returns:

Protobuf representation of given pandas.Series

Return type:

service_pb2.Response

async PandasSeries.to_http_response(obj: t.Any, ctx: Context | None = None) Response[source]#

Process given objects and convert it to HTTP response.

Parameters:

obj – pd.Series that will be serialized to JSON

Returns:

HTTP Response of type starlette.responses.Response. This can be accessed via cURL or any external web traffic.

Structured Data with JSON#

Note

For common structure data, we recommend using the JSON descriptor, as it provides the most flexibility. Users can also define a schema of the JSON data via a Pydantic model, and use it to for data validation.

To use the IO descriptor with pydantic, install bentoml with extra io-json dependency:

pip install "bentoml[io-json]"

This will include BentoML with Pydantic alongside with BentoML

Then proceed to add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options.

bentofile.yaml#
...
python:
  packages:
    - pydantic
bentofile.yaml#
...
conda:
  channels:
    - conda-forge
  dependencies:
    - pydantic

Refers to Build Options.

bentofile.yaml#
...
python:
  packages:
    - pydantic
bentofile.yaml#
...
conda:
  channels:
    - conda-forge
  dependencies:
    - pydantic
class bentoml.io.JSON(*, pydantic_model: ~typing.Optional[~typing.Type[~pydantic.main.BaseModel]] = None, validate_json: ~typing.Optional[bool] = None, json_encoder: ~typing.Type[~json.encoder.JSONEncoder] = <class 'bentoml._internal.io_descriptors.json.DefaultJsonEncoder'>)[source]#

JSON defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from a JSON representation as specified in your API function signature.

A sample service implementation:

service.py#
from __future__ import annotations

import typing
from typing import TYPE_CHECKING
from typing import Any
from typing import Optional

import bentoml
from bentoml.io import NumpyNdarray
from bentoml.io import JSON

import numpy as np
import pandas as pd
from pydantic import BaseModel

iris_clf_runner = bentoml.sklearn.get("iris_clf_with_feature_names:latest").to_runner()

svc = bentoml.Service("iris_classifier_pydantic", runners=[iris_clf_runner])

class IrisFeatures(BaseModel):
    sepal_len: float
    sepal_width: float
    petal_len: float
    petal_width: float

    # Optional field
    request_id: Optional[int]

    # Use custom Pydantic config for additional validation options
    class Config:
        extra = 'forbid'


input_spec = JSON(pydantic_model=IrisFeatures)

@svc.api(input=input_spec, output=NumpyNdarray())
def classify(input_data: IrisFeatures) -> NDArray[Any]:
    if input_data.request_id is not None:
        print("Received request ID: ", input_data.request_id)

    input_df = pd.DataFrame([input_data.dict(exclude={"request_id"})])
    return iris_clf_runner.run(input_df)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

% curl -X POST -H "content-type: application/json" \
    --data '{"sepal_len": 6.2, "sepal_width": 3.2, "petal_len": 5.2, "petal_width": 2.2}' \
    http://127.0.0.1:3000/classify

# [2]%
request.py#
 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='{"sepal_len": 6.2, "sepal_width": 3.2, "petal_len": 5.2, "petal_width": 2.2}'
 ).text
Parameters:
  • pydantic_model – Pydantic model schema. When used, inference API callback will receive an instance of the specified pydantic_model class.

  • json_encoder – JSON encoder class. By default BentoML implements a custom JSON encoder that provides additional serialization supports for numpy arrays, pandas dataframes, dataclass-like (attrs, dataclass, etc.). If you wish to use a custom encoder, make sure to support the aforementioned object.

Returns:

IO Descriptor that represents JSON format.

Return type:

JSON

classmethod JSON.from_sample(sample: IOType | t.Any, **kwargs: t.Any) Self#
async JSON.from_proto(field: google.protobuf.struct_pb2.Value | bytes) Optional[Union[str, Dict[str, Any], BaseModel]][source]#
async JSON.from_http_request(request: Request) Optional[Union[str, Dict[str, Any], BaseModel]][source]#
async JSON.to_proto(obj: Optional[Union[str, Dict[str, Any], BaseModel]]) Value[source]#
async JSON.to_http_response(obj: JSONType | pydantic.BaseModel, ctx: Context | None = None)[source]#

Texts#

bentoml.io.Text is commonly used for NLP Applications:

class bentoml.io.Text(*args: Any, **kwargs: Any)[source]#

Text defines API specification for the inputs/outputs of a Service. Text represents strings for all incoming requests/outcoming responses as specified in your API function signature.

A sample GPT2 service implementation:

service.py#
from __future__ import annotations

import bentoml
from bentoml.io import Text

runner = bentoml.tensorflow.get('gpt2:latest').to_runner()

svc = bentoml.Service("gpt2-generation", runners=[runner])

@svc.api(input=Text(), output=Text())
def predict(text: str) -> str:
    res = runner.run(text)
    return res['generated_text']

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

% curl -X POST -H "Content-Type: text/plain" \
        --data 'Not for nothing did Orin say that people outdoors.' \
        http://0.0.0.0:3000/predict
request.py#
import requests
requests.post(
    "http://0.0.0.0:3000/predict",
    headers = {"content-type":"text/plain"},
    data = 'Not for nothing did Orin say that people outdoors.'
).text

Note

Text is not designed to take any args or kwargs during initialization.

Returns:

IO Descriptor that represents strings type.

Return type:

Text

async Text.from_proto(field: google.protobuf.wrappers_pb2.StringValue | bytes) str[source]#
async Text.from_http_request(request: Request) str[source]#
async Text.to_proto(obj: str) StringValue[source]#
async Text.to_http_response(obj: str, ctx: Context | None = None) Response[source]#

Images#

To use the IO descriptor, install bentoml with extra io-image dependency:

pip install "bentoml[io-image]"

Note

The Pillow package is required to use the bentoml.io.Image.

Install it with pip install Pillow and add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options.

bentofile.yaml#
...
python:
  packages:
    - Pillow
bentofile.yaml#
...
conda:
  channels:
    - conda-forge
  dependencies:
    - Pillow
class bentoml.io.Image(pilmode: _Mode | None = 'RGB', mime_type: str = 'image/jpeg', *, allowed_mime_types: t.Iterable[str] | None = None)[source]#

Image defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from images as specified in your API function signature.

A sample object detection service:

service.py#
from __future__ import annotations

from typing import TYPE_CHECKING
from typing import Any

import bentoml
from bentoml.io import Image
from bentoml.io import NumpyNdarray

if TYPE_CHECKING:
    from PIL.Image import Image
    from numpy.typing import NDArray

runner = bentoml.tensorflow.get('image-classification:latest').to_runner()

svc = bentoml.Service("vit-object-detection", runners=[runner])

@svc.api(input=Image(), output=NumpyNdarray(dtype="float32"))
async def predict_image(f: Image) -> NDArray[Any]:
    assert isinstance(f, Image)
    arr = np.array(f) / 255.0
    assert arr.shape == (28, 28)

    # We are using greyscale image and our PyTorch model expect one
    # extra channel dimension
    arr = np.expand_dims(arr, (0, 3)).astype("float32")  # reshape to [1, 28, 28, 1]
    return await runner.async_run(arr)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

# we will run on our input image test.png
# image can get from http://images.cocodataset.org/val2017/000000039769.jpg
% curl -H "Content-Type: multipart/form-data" \
       -F 'fileobj=@test.jpg;type=image/jpeg' \
       http://0.0.0.0:3000/predict_image

# [{"score":0.8610631227493286,"label":"Egyptian cat"},
# {"score":0.08770329505205154,"label":"tabby, tabby cat"},
# {"score":0.03540956228971481,"label":"tiger cat"},
# {"score":0.004140055272728205,"label":"lynx, catamount"},
# {"score":0.0009498853469267488,"label":"Siamese cat, Siamese"}]%
request.py#
import requests

requests.post(
    "http://0.0.0.0:3000/predict_image",
    files = {"upload_file": open('test.jpg', 'rb')},
    headers = {"content-type": "multipart/form-data"}
).text
Parameters:
  • pilmode – Color mode for PIL. Default to RGB.

  • mime_type – The MIME type of the file type that this descriptor should return. Only relevant when used as an output descriptor.

  • allowed_mime_types – A list of MIME types to restrict input to.

Returns:

IO Descriptor that either a PIL.Image.Image or a np.ndarray representing an image.

Return type:

Image

async Image.from_proto(field: pb.File | bytes) ImageType[source]#
async Image.from_http_request(request: Request) ImageType[source]#
async Image.to_proto(obj: ImageType) pb.File[source]#
async Image.to_http_response(obj: ImageType, ctx: Context | None = None) Response[source]#

Files#

class bentoml.io.File(kind: FileKind = 'binaryio', mime_type: str | None = None, **kwargs: t.Any)[source]#

File defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from file-like objects as specified in your API function signature.

A sample ViT service:

service.py#
from __future__ import annotations

import io
from typing import TYPE_CHECKING
from typing import Any

import bentoml
from bentoml.io import File

if TYPE_CHECKING:
    from numpy.typing import NDArray

runner = bentoml.tensorflow.get('image-classification:latest').to_runner()

svc = bentoml.Service("vit-pdf-classifier", runners=[runner])

@svc.api(input=File(), output=NumpyNdarray(dtype="float32"))
async def predict(input_pdf: io.BytesIO[Any]) -> NDArray[Any]:
    return await runner.async_run(input_pdf)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

% curl -H "Content-Type: multipart/form-data" \
       -F 'fileobj=@test.pdf;type=application/pdf' \
       http://0.0.0.0:3000/predict
request.py#
import requests

requests.post(
    "http://0.0.0.0:3000/predict",
    files = {"upload_file": open('test.pdf', 'rb')},
    headers = {"content-type": "multipart/form-data"}
).text
Parameters:
  • kind – The kind of file-like object to be used. Currently, the only accepted value is binaryio.

  • mime_type – Return MIME type of the starlette.response.Response, only available when used as output descriptor

Returns:

IO Descriptor that represents file-like objects.

Return type:

File

async File.from_proto(field: bentoml.grpc.v1.service_pb2.File | bytes) FileLike[bytes][source]#
async File.from_http_request(request: Request) FileLike[bytes][source]#
async File.to_proto(obj: Union[IOBase, IO[bytes], FileLike[bytes]]) File[source]#
async File.to_http_response(obj: FileType, ctx: Context | None = None)[source]#

Multipart Payloads#

Note

io.Multipart makes it possible to compose a multipart payload from multiple other IO Descriptor instances. For example, you may create a Multipart input that contains a image file and additional metadata in JSON.

class bentoml.io.Multipart(**inputs: IODescriptor[Any])[source]#

Multipart defines API specification for the inputs/outputs of a Service, where inputs/outputs of a Service can receive/send a multipart request/responses as specified in your API function signature.

A sample service implementation:

service.py#
from __future__ import annotations

from typing import TYPE_CHECKING
from typing import Any

import bentoml
from bentoml.io import NumpyNdarray
from bentoml.io import Multipart
from bentoml.io import JSON

if TYPE_CHECKING:
    from numpy.typing import NDArray

runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

svc = bentoml.Service("iris-classifier", runners=[runner])

input_spec = Multipart(arr=NumpyNdarray(), annotations=JSON())
output_spec = Multipart(output=NumpyNdarray(), result=JSON())

@svc.api(input=input_spec, output=output_spec)
async def predict(
    arr: NDArray[Any], annotations: dict[str, Any]
) -> dict[str, NDArray[Any] | dict[str, Any]]:
    res = await runner.run(arr)
    return {"output": res, "result": annotations}

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

% curl -X POST -H "Content-Type: multipart/form-data" \
       -F annotations=@test.json -F arr='[5,4,3,2]' \
       http://0.0.0.0:3000/predict

# --b1d72c201a064ecd92a17a412eb9208e
# Content-Disposition: form-data; name="output"
# content-length: 1
# content-type: application/json

# 1
# --b1d72c201a064ecd92a17a412eb9208e
# Content-Disposition: form-data; name="result"
# content-length: 13
# content-type: application/json

# {"foo":"bar"}
# --b1d72c201a064ecd92a17a412eb9208e--

Note

The following code snippet uses requests_toolbelt. Install with pip install requests-toolbelt.

request.py#
import requests

from requests_toolbelt.multipart.encoder import MultipartEncoder

m = MultipartEncoder(
    fields={
        "field0": "value",
        "field1": "value",
        "field2": ("filename", open("test.json", "rb"), "application/json"),
    }
)

requests.post(
    "http://0.0.0.0:3000/predict", data=m, headers={"Content-Type": m.content_type}
)
Parameters:

inputs –

Dictionary consisting keys as inputs definition for a Multipart request/response, values as IODescriptor supported by BentoML. Currently, Multipart supports Image, NumpyNdarray, PandasDataFrame, PandasSeries, Text, and File.

Make sure to match the input parameters in function signatures in an API function to the keys defined under Multipart:

+----------------------------------------------------------------+
|                                                                |
|   +--------------------------------------------------------+   |
|   |                                                        |   |
|   |    Multipart(arr=NumpyNdarray(), annotations=JSON())   |   |
|   |               |                       |                |   |
|   +---------------+-----------------------+----------------+   |
|                   |                       |                    |
|                   |                       |                    |
|                   |                       |                    |
|                   +-----+        +--------+                    |
|                         |        |                             |
|         +---------------v--------v---------+                   |
|         |  def predict(arr, annotations):  |                   |
|         +----------------------------------+                   |
|                                                                |
+----------------------------------------------------------------+

Returns:

IO Descriptor that represents a Multipart request/response.

Return type:

Multipart

async Multipart.from_proto(field: Multipart) dict[str, Any][source]#
async Multipart.from_http_request(request: Request) dict[str, Any][source]#
async Multipart.to_proto(obj: dict[str, Any]) Multipart[source]#
async Multipart.to_http_response(obj: dict[str, t.Any], ctx: Context | None = None) Response[source]#

Custom IODescriptor#

Note

The IODescriptor base class can be extended to support custom data format for your APIs, if the built-in descriptors does not fit your needs.

class bentoml.io.IODescriptor[source]#

IODescriptor describes the input/output data format of an InferenceAPI defined in a bentoml.Service. This is an abstract base class for extending new HTTP endpoint IO descriptor types in BentoServer.