API IO Descriptors#

IO Descriptors are used for describing the input and output spec of a Service API. Here’s a list of built-in IO Descriptors and APIs for extending custom IO Descriptor.

NumPy `ndarray`#

Note

The numpy package is required to use the bentoml.io.NumpyNdarray.

Install it with pip install numpy and add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options.

pip

bentofile.yaml#

...
python:
  packages:
    - numpy

conda

bentofile.yaml#

...
conda:
  channels:
    - conda-forge
  dependencies:
    - numpy

class bentoml.io.NumpyNdarray(dtype: str | ext.NpDTypeLike | None = None, enforce_dtype: bool = False, shape: tuple[int, ...] | None = None, enforce_shape: bool = False)[source]#

NumpyNdarray defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from type numpy.ndarray as specified in your API function signature.

A sample service implementation:

service.py#

from __future__ import annotations

from typing import TYPE_CHECKING, Any

import bentoml
from bentoml.io import NumpyNdarray

if TYPE_CHECKING:
    from numpy.typing import NDArray

runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

svc = bentoml.Service("iris-classifier", runners=[runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_arr: NDArray[Any]) -> NDArray[Any]:
    return runner.run(input_arr)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

% curl -X POST -H "Content-Type: application/json" \
        --data '[[5,4,3,2]]' http://0.0.0.0:3000/predict

# [1]%

Python

request.py#

 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='[{"0":5,"1":4,"2":3,"3":2}]'
 ).text

Parameters:

dtype – Data type users wish to convert their inputs/outputs to. Refer to arrays dtypes for more information.
enforce_dtype – Whether to enforce a certain data type. if enforce_dtype=True then dtype must be specified.
shape –
Given shape that an array will be converted to. For example:
service.py#
```
from bentoml.io import NumpyNdarray

@svc.api(input=NumpyNdarray(shape=(2,2), enforce_shape=False), output=NumpyNdarray())
async def predict(input_array: np.ndarray) -> np.ndarray:
    # input_array will be reshaped to (2,2)
    result = await runner.run(input_array)
```
When enforce_shape=True is provided, BentoML will raise an exception if the input array received does not match the shape provided.

About the behaviour of shape

If specified, then both bentoml.io.NumpyNdarray.from_http_request() and bentoml.io.NumpyNdarray.from_proto() will reshape the input array before sending it to the API function.
enforce_shape – Whether to enforce a certain shape. If enforce_shape=True then shape must be specified.

Returns:

IO Descriptor that represents a np.ndarray.

Return type:

IODescriptor

classmethod NumpyNdarray.from_sample(sample: ext.NpNDArray | t.Sequence[t.Any]) → ext.NpNDArray#

Create a NumpyNdarray IO Descriptor from given inputs.

Parameters:

sample – Given sample np.ndarray data. It also accepts a sequence-like data type that can be converted to np.ndarray.
enforce_dtype – Enforce a certain data type. dtype must be specified at function signature. If you don’t want to enforce a specific dtype then change enforce_dtype=False.
enforce_shape – Enforce a certain shape. shape must be specified at function signature. If you don’t want to enforce a specific shape then change enforce_shape=False.

Returns:

IODescriptor from given users inputs.

Return type:

NumpyNdarray

Example:

service.py#

from __future__ import annotations

from typing import TYPE_CHECKING, Any

import bentoml
from bentoml.io import NumpyNdarray

import numpy as np

if TYPE_CHECKING:
    from numpy.typing import NDArray

input_spec = NumpyNdarray.from_sample(np.array([[1,2,3]]))

@svc.api(input=input_spec, output=NumpyNdarray())
async def predict(input: NDArray[np.int16]) -> NDArray[Any]:
    return await runner.async_run(input)

Raises:: BadInput – If given sample is a type numpy.generic. This exception will also be raised if we failed to create a np.ndarray from given sample.

async NumpyNdarray.from_proto(field: pb.NDArray | bytes) → ext.NpNDArray[source]#

Process incoming protobuf request and convert it to numpy.ndarray

Parameters:

request – Incoming RPC request message.
context – grpc.ServicerContext

Returns:

A np.array constructed from given protobuf message.

Return type:

numpy.ndarray

Tabular Data with Pandas#

To use the IO descriptor, install bentoml with extra io-pandas dependency:

pip install "bentoml[io-pandas]"

Note

The pandas package is required to use the bentoml.io.PandasDataFrame or bentoml.io.PandasSeries.

Install it with pip install pandas and add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options.

pip

bentofile.yaml#

...
python:
  packages:
    - pandas

conda

bentofile.yaml#

...
conda:
  channels:
    - conda-forge
  dependencies:
    - pandas

class bentoml.io.PandasDataFrame(orient: ext.DataFrameOrient = 'records', columns: list[str] | None = None, apply_column_names: bool = False, dtype: bool | ext.PdDTypeArg | None = None, enforce_dtype: bool = False, shape: tuple[int, ...] | None = None, enforce_shape: bool = False, default_format: t.Literal['json', 'parquet', 'csv'] = 'json')[source]#

PandasDataFrame defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from type pd.DataFrame as specified in your API function signature.

A sample service implementation:

service.py#

from __future__ import annotations

import bentoml
import pandas as pd
import numpy as np
from bentoml.io import PandasDataFrame

input_spec = PandasDataFrame.from_sample(pd.DataFrame(np.array([[5,4,3,2]])))

runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

svc = bentoml.Service("iris-classifier", runners=[runner])

@svc.api(input=input_spec, output=PandasDataFrame())
def predict(input_arr):
    res = runner.run(input_arr)
    return pd.DataFrame(res)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

% curl -X POST -H "Content-Type: application/json" \
        --data '[{"0":5,"1":4,"2":3,"3":2}]' http://0.0.0.0:3000/predict

# [{"0": 1}]%

Python

request.py#

 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='[{"0":5,"1":4,"2":3,"3":2}]'
 ).text

Parameters:

orient –
Indication of expected JSON string format. Compatible JSON strings can be produced by pandas.io.json.to_json() with a corresponding orient value. Possible orients are:
- split - dict[str, Any] ↦ {idx ↠ [idx], columns ↠ [columns], data ↠ [values]}
- records - list[Any] ↦ [{column ↠ value}, …, {column ↠ value}]
- index - dict[str, Any] ↦ {idx ↠ {column ↠ value}}
- columns - dict[str, Any] ↦ {column ↠ {index ↠ value}}
- values - dict[str, Any] ↦ Values arrays
columns – List of columns name that users wish to update.
apply_column_names – Whether to update incoming DataFrame columns. If apply_column_names=True, then columns must be specified.
dtype – Data type users wish to convert their inputs/outputs to. If it is a boolean, then pandas will infer dtypes. Else if it is a dictionary of column to dtype, then applies those to incoming dataframes. If False, then don’t infer dtypes at all (only applies to the data). This is not applicable for orient='table'.
enforce_dtype – Whether to enforce a certain data type. if enforce_dtype=True then dtype must be specified.

shape –

Optional shape check that users can specify for their incoming HTTP requests. We will only check the number of columns you specified for your given shape:

service.py#

import pandas as pd
from bentoml.io import PandasDataFrame

df = pd.DataFrame([[1, 2, 3]])  # shape (1,3)
inp = PandasDataFrame.from_sample(df)

@svc.api(
    input=PandasDataFrame(shape=(51, 10),
          enforce_shape=True),
    output=PandasDataFrame()
)
def predict(input_df: pd.DataFrame) -> pd.DataFrame:
    # if input_df have shape (40,9),
    # it will throw out errors
    ...

enforce_shape – Whether to enforce a certain shape. If enforce_shape=True then shape must be specified.
default_format –
The default serialization format to use if the request does not specify a Content-Type Headers. It is also the serialization format used for the response. Possible values are:
- json - JSON text format (inferred from content-type "application/json")
- parquet - Parquet binary format (inferred from content-type "application/octet-stream")
- csv - CSV text format (inferred from content-type "text/csv")

Returns:

IO Descriptor that represents a pd.DataFrame.

Return type:

PandasDataFrame

classmethod PandasDataFrame.from_sample(sample: ext.PdDataFrame) → ext.PdDataFrame#

Create a PandasDataFrame IO Descriptor from given inputs.

Parameters:

sample – Given sample pd.DataFrame data
orient –
Indication of expected JSON string format. Compatible JSON strings can be produced by pandas.io.json.to_json() with a corresponding orient value. Possible orients are:
- split - dict[str, Any] ↦ {idx ↠ [idx], columns ↠ [columns], data ↠ [values]}
- records - list[Any] ↦ [{column ↠ value}, …, {column ↠ value}]
- index - dict[str, Any] ↦ {idx ↠ {column ↠ value}}
- columns - dict[str, Any] ↦ {column ↠ {index ↠ value}}
- values - dict[str, Any] ↦ Values arrays
- table - dict[str, Any] ↦ {schema: { schema }, data: { data }}
apply_column_names – Update incoming DataFrame columns. columns must be specified at function signature. If you don’t want to enforce a specific columns name then change apply_column_names=False.
enforce_dtype – Enforce a certain data type. dtype must be specified at function signature. If you don’t want to enforce a specific dtype then change enforce_dtype=False.
enforce_shape – Enforce a certain shape. shape must be specified at function signature. If you don’t want to enforce a specific shape then change enforce_shape=False.
default_format –
The default serialization format to use if the request does not specify a Content-Type Headers. It is also the serialization format used for the response. Possible values are:
- json - JSON text format (inferred from content-type "application/json")
- parquet - Parquet binary format (inferred from content-type "application/octet-stream")
- csv - CSV text format (inferred from content-type "text/csv")

Returns:

IODescriptor from given users inputs.

Return type:

PandasDataFrame

Example:

service.py#

import pandas as pd
from bentoml.io import PandasDataFrame
arr = [[1,2,3]]
input_spec = PandasDataFrame.from_sample(pd.DataFrame(arr))

@svc.api(input=input_spec, output=PandasDataFrame())
def predict(inputs: pd.DataFrame) -> pd.DataFrame: ...

async PandasDataFrame.from_proto(field: pb.DataFrame | bytes) → ext.PdDataFrame[source]#

Process incoming protobuf request and convert it to pandas.DataFrame

Parameters:

request – Incoming RPC request message.
context – grpc.ServicerContext

Returns:

a pandas.DataFrame object. This can then be used: inside users defined logics.

async PandasDataFrame.from_http_request(request: Request) → ext.PdDataFrame[source]#

Process incoming requests and convert incoming objects to pd.DataFrame

Parameters:

request (starlette.requests.Requests) – Incoming Requests

Returns:

a pd.DataFrame object. This can then be used: inside users defined logics.

Raises:

BadInput – Raised when the incoming requests are bad formatted.

async PandasDataFrame.to_proto(obj: ext.PdDataFrame) → pb.DataFrame[source]#

async PandasDataFrame.to_http_response(obj: ext.PdDataFrame, ctx: Context | None = None) → Response[source]#

Process given objects and convert it to HTTP response.

Parameters:

obj (pd.DataFrame) – pd.DataFrame that will be serialized to JSON or parquet

Returns:

HTTP Response of type starlette.responses.Response. This can: be accessed via cURL or any external web traffic.

class bentoml.io.PandasSeries(orient: ext.SeriesOrient = 'records', dtype: ext.PdDTypeArg | None = None, enforce_dtype: bool = False, shape: tuple[int, ...] | None = None, enforce_shape: bool = False)[source]#

PandasSeries defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from type pd.Series as specified in your API function signature.

A sample service implementation:

service.py#

 import bentoml
 import pandas as pd
 import numpy as np
 from bentoml.io import PandasSeries

 runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

 svc = bentoml.Service("iris-classifier", runners=[runner])

 @svc.api(input=PandasSeries(), output=PandasSeries())
 def predict(input_arr):
     res = runner.run(input_arr)  # type: np.ndarray
     return pd.Series(res)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

% curl -X POST -H "Content-Type: application/json" \
        --data '[{"0":5,"1":4,"2":3,"3":2}]' http://0.0.0.0:3000/predict

# [{"0": 1}]%

Python

request.py#

 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='[{"0":5,"1":4,"2":3,"3":2}]'
 ).text

Parameters:

orient –
Indication of expected JSON string format. Compatible JSON strings can be produced by pandas.io.json.to_json() with a corresponding orient value. Possible orients are:
- split - dict[str, Any] ↦ {idx ↠ [idx], columns ↠ [columns], data ↠ [values]}
- records - list[Any] ↦ [{column ↠ value}, …, {column ↠ value}]
- index - dict[str, Any] ↦ {idx ↠ {column ↠ value}}
- columns - dict[str, Any] ↦ {column ↠ {index ↠ value}}
- values - dict[str, Any] ↦ Values arrays
columns – List of columns name that users wish to update.
apply_column_names – Whether to update incoming DataFrame columns. If apply_column_names=True, then columns must be specified.
dtype – Data type users wish to convert their inputs/outputs to. If it is a boolean, then pandas will infer dtypes. Else if it is a dictionary of column to dtype, then applies those to incoming dataframes. If False, then don’t infer dtypes at all (only applies to the data). This is not applicable for orient='table'.
enforce_dtype – Whether to enforce a certain data type. if enforce_dtype=True then dtype must be specified.

shape –

Optional shape check that users can specify for their incoming HTTP requests. We will only check the number of columns you specified for your given shape:

service.py#

 import pandas as pd
 from bentoml.io import PandasSeries

 @svc.api(input=PandasSeries(shape=(51,), enforce_shape=True), output=PandasSeries())
 def infer(input_series: pd.Series) -> pd.Series:
      # if input_series has shape (40,), it will error
      ...

enforce_shape – Whether to enforce a certain shape. If enforce_shape=True then shape must be specified.

Returns:

IO Descriptor that represents a pd.Series.

Return type:

PandasSeries

classmethod PandasSeries.from_sample(sample: ext.PdSeries | t.Sequence[t.Any]) → ext.PdSeries#

Create a PandasSeries IO Descriptor from given inputs.

Parameters:

sample_input – Given sample pd.DataFrame data
orient –
Indication of expected JSON string format. Compatible JSON strings can be produced by pandas.io.json.to_json() with a corresponding orient value. Possible orients are:
- split - dict[str, Any] ↦ {idx ↠ [idx], columns ↠ [columns], data ↠ [values]}
- records - list[Any] ↦ [{column ↠ value}, …, {column ↠ value}]
- index - dict[str, Any] ↦ {idx ↠ {column ↠ value}}
- table - dict[str, Any] ↦ {schema: { schema }, data: { data }}
enforce_dtype – Enforce a certain data type. dtype must be specified at function signature. If you don’t want to enforce a specific dtype then change enforce_dtype=False.
enforce_shape – Enforce a certain shape. shape must be specified at function signature. If you don’t want to enforce a specific shape then change enforce_shape=False.

Returns:

IODescriptor from given users inputs.

Return type:

PandasSeries

Example:

service.py#

import pandas as pd
from bentoml.io import PandasSeries

arr = [1,2,3]
input_spec = PandasSeries.from_sample(pd.DataFrame(arr))

@svc.api(input=input_spec, output=PandasSeries())
def predict(inputs: pd.Series) -> pd.Series: ...

async PandasSeries.from_proto(field: pb.Series | bytes) → ext.PdSeries[source]#

Process incoming protobuf request and convert it to pandas.Series

Parameters:

request – Incoming RPC request message.
context – grpc.ServicerContext

Returns:

a pandas.Series object. This can then be used: inside users defined logics.

async PandasSeries.from_http_request(request: Request) → ext.PdSeries[source]#

Process incoming requests and convert incoming objects to pd.Series.

Parameters:: request – Incoming Requests
Returns:: a pd.Series object. This can then be used inside users defined logics.

async PandasSeries.to_proto(obj: ext.PdSeries) → pb.Series[source]#

async PandasSeries.to_http_response(obj: t.Any, ctx: Context | None = None) → Response[source]#

Process given objects and convert it to HTTP response.

Parameters:: obj – pd.Series that will be serialized to JSON
Returns:: HTTP Response of type starlette.responses.Response. This can be accessed via cURL or any external web traffic.

Structured Data with JSON#

Note

For common structure data, we recommend using the JSON descriptor, as it provides the most flexibility. Users can also define a schema of the JSON data via a Pydantic model, and use it to for data validation.

To use the IO descriptor with pydantic, install bentoml with extra io-json dependency:

pip install "bentoml[io-json]"

This will include BentoML with Pydantic alongside with BentoML

Then proceed to add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options. We also provide an example project using Pydantic for request validation.

pip

bentofile.yaml#

...
python:
  packages:
    - pydantic

conda

bentofile.yaml#

...
conda:
  channels:
    - conda-forge
  dependencies:
    - pydantic

Refers to Build Options.

pip

bentofile.yaml#

...
python:
  packages:
    - pydantic

conda

bentofile.yaml#

...
conda:
  channels:
    - conda-forge
  dependencies:
    - pydantic

class bentoml.io.JSON(*, pydantic_model: type[pydantic.main.BaseModel] | None = None, validate_json: bool | None = None, json_encoder: type[json.encoder.JSONEncoder] = <class 'bentoml._internal.io_descriptors.json.DefaultJsonEncoder'>)[source]#

JSON defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from a JSON representation as specified in your API function signature.

A sample service implementation:

service.py#

from __future__ import annotations

import typing
from typing import TYPE_CHECKING
from typing import Any
from typing import Optional

import bentoml
from bentoml.io import NumpyNdarray
from bentoml.io import JSON

import numpy as np
import pandas as pd
from pydantic import BaseModel

iris_clf_runner = bentoml.sklearn.get("iris_clf_with_feature_names:latest").to_runner()

svc = bentoml.Service("iris_classifier_pydantic", runners=[iris_clf_runner])

class IrisFeatures(BaseModel):
    sepal_len: float
    sepal_width: float
    petal_len: float
    petal_width: float

    # Optional field
    request_id: Optional[int]

    # Use custom Pydantic config for additional validation options
    class Config:
        extra = 'forbid'


input_spec = JSON(pydantic_model=IrisFeatures)

@svc.api(input=input_spec, output=NumpyNdarray())
def classify(input_data: IrisFeatures) -> NDArray[Any]:
    if input_data.request_id is not None:
        print("Received request ID: ", input_data.request_id)

    input_df = pd.DataFrame([input_data.dict(exclude={"request_id"})])
    return iris_clf_runner.run(input_df)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

% curl -X POST -H "content-type: application/json" \
    --data '{"sepal_len": 6.2, "sepal_width": 3.2, "petal_len": 5.2, "petal_width": 2.2}' \
    http://127.0.0.1:3000/classify

# [2]%

Python

request.py#

 import requests

 requests.post(
     "http://0.0.0.0:3000/predict",
     headers={"content-type": "application/json"},
     data='{"sepal_len": 6.2, "sepal_width": 3.2, "petal_len": 5.2, "petal_width": 2.2}'
 ).text

Parameters:

pydantic_model – Pydantic model schema. When used, inference API callback will receive an instance of the specified pydantic_model class.
json_encoder – JSON encoder class. By default BentoML implements a custom JSON encoder that provides additional serialization supports for numpy arrays, pandas dataframes, dataclass-like (attrs, dataclass, etc.). If you wish to use a custom encoder, make sure to support the aforementioned object.

Returns:

IO Descriptor that represents JSON format.

Return type:

JSON

Create a JSON IO Descriptor from given inputs.

Parameters:

sample –

A JSON-like datatype, which can be either dict, str, list. sample will also accepting a Pydantic model.

from pydantic import BaseModel

class IrisFeatures(BaseModel):
    sepal_len: float
    sepal_width: float
    petal_len: float
    petal_width: float

input_spec = JSON.from_sample(
    IrisFeatures(sepal_len=1.0, sepal_width=2.0, petal_len=3.0, petal_width=4.0)
)

@svc.api(input=input_spec, output=NumpyNdarray())
async def predict(input: NDArray[np.int16]) -> NDArray[Any]:
    return await runner.async_run(input)

json_encoder – Optional JSON encoder.

Returns:

IODescriptor from given users inputs.

Return type:

JSON

Example:

service.py#

from __future__ import annotations

import bentoml
from typing import Any
from bentoml.io import JSON

input_spec = JSON.from_sample({"Hello": "World", "foo": "bar"})
@svc.api(input=input_spec, output=JSON())
async def predict(input: dict[str, Any]) -> dict[str, Any]:
    return await runner.async_run(input)

Raises:: BadInput – Given sample is not a valid JSON string, bytes, or supported nest types.

async JSON.from_proto(field: Value | bytes) → str | Dict[str, Any] | BaseModel | None[source]#

async JSON.from_http_request(request: Request) → str | Dict[str, Any] | BaseModel | None[source]#

async JSON.to_proto(obj: str | Dict[str, Any] | BaseModel | None) → Value[source]#

async JSON.to_http_response(obj: JSONType | pydantic.BaseModel, ctx: Context | None = None)[source]#

Texts#

bentoml.io.Text is commonly used for NLP Applications:

class bentoml.io.Text(content_type: Literal['text/plain', 'text/event-stream'] = 'text/plain')[source]#

Text defines API specification for the inputs/outputs of a Service. Text represents strings for all incoming requests/outcoming responses as specified in your API function signature.

A sample GPT2 service implementation:

service.py#

from __future__ import annotations

import bentoml
from bentoml.io import Text

runner = bentoml.tensorflow.get('gpt2:latest').to_runner()

svc = bentoml.Service("gpt2-generation", runners=[runner])

@svc.api(input=Text(), output=Text())
def predict(text: str) -> str:
    res = runner.run(text)
    return res['generated_text']

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

% curl -X POST -H "Content-Type: text/plain" \
        --data 'Not for nothing did Orin say that people outdoors.' \
        http://0.0.0.0:3000/predict

Python

request.py#

import requests
requests.post(
    "http://0.0.0.0:3000/predict",
    headers = {"content-type":"text/plain"},
    data = 'Not for nothing did Orin say that people outdoors.'
).text

Note

Text is not designed to take any args or kwargs during initialization.

Returns:: IO Descriptor that represents strings type.
Return type:: Text

classmethod Text.from_sample(sample: str | bytes) → str#

Create a Text IO Descriptor from given inputs.

Parameters:: sample – Given sample text.
Returns:: IODescriptor from given users inputs.
Return type:: Text

Example:

service.py#

@svc.api(input=bentoml.io.Text.from_sample('Bento box is'), output=bentoml.io.Text())
def predict(inputs: str) -> str: ...

async Text.from_proto(field: StringValue | bytes) → str[source]#

async Text.from_http_request(request: Request) → str[source]#

async Text.to_proto(obj: str) → StringValue[source]#

async Text.to_http_response(obj: str | t.AsyncGenerator[str, None], ctx: Context | None = None) → StreamingResponse[source]#

Images#

To use the IO descriptor, install bentoml with extra io-image dependency:

pip install "bentoml[io-image]"

Note

The Pillow package is required to use the bentoml.io.Image.

Install it with pip install Pillow and add it to your bentofile.yaml’s under either Python or Conda packages list.

Refer to Build Options.

pip

bentofile.yaml#

...
python:
  packages:
    - Pillow

conda

bentofile.yaml#

...
conda:
  channels:
    - conda-forge
  dependencies:
    - Pillow

class bentoml.io.Image(pilmode: _Mode | None = 'RGB', mime_type: str = 'image/jpeg', *, allowed_mime_types: t.Iterable[str] | None = None)[source]#

Image defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from images as specified in your API function signature.

A sample object detection service:

service.py#

from __future__ import annotations

from typing import TYPE_CHECKING
from typing import Any

import bentoml
from bentoml.io import Image
from bentoml.io import NumpyNdarray

if TYPE_CHECKING:
    from PIL.Image import Image
    from numpy.typing import NDArray

runner = bentoml.tensorflow.get('image-classification:latest').to_runner()

svc = bentoml.Service("vit-object-detection", runners=[runner])

@svc.api(input=Image(), output=NumpyNdarray(dtype="float32"))
async def predict_image(f: Image) -> NDArray[Any]:
    assert isinstance(f, Image)
    arr = np.array(f) / 255.0
    assert arr.shape == (28, 28)

    # We are using greyscale image and our PyTorch model expect one
    # extra channel dimension
    arr = np.expand_dims(arr, (0, 3)).astype("float32")  # reshape to [1, 28, 28, 1]
    return await runner.async_run(arr)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

# we will run on our input image test.png
# image can get from http://images.cocodataset.org/val2017/000000039769.jpg
% curl -H "Content-Type: multipart/form-data" \
       -F 'fileobj=@test.jpg;type=image/jpeg' \
       http://0.0.0.0:3000/predict_image

# [{"score":0.8610631227493286,"label":"Egyptian cat"},
# {"score":0.08770329505205154,"label":"tabby, tabby cat"},
# {"score":0.03540956228971481,"label":"tiger cat"},
# {"score":0.004140055272728205,"label":"lynx, catamount"},
# {"score":0.0009498853469267488,"label":"Siamese cat, Siamese"}]%

Python

request.py#

import requests

requests.post(
    "http://0.0.0.0:3000/predict_image",
    files = {"upload_file": open('test.jpg', 'rb')},
    headers = {"content-type": "multipart/form-data"}
).text

Parameters:

pilmode – Color mode for PIL. Default to RGB.
mime_type – The MIME type of the file type that this descriptor should return. Only relevant when used as an output descriptor.
allowed_mime_types – A list of MIME types to restrict input to.

Returns:

IO Descriptor that either a PIL.Image.Image or a np.ndarray representing an image.

Return type:

Image

classmethod Image.from_sample(sample: ImageType | str) → ImageType#

Create a Image IO Descriptor from given inputs.

Parameters:

sample – Given File-like object, or a path to a file.
pilmode – Optional color mode for PIL. Default to RGB.
mime_type – The MIME type of the file type that this descriptor should return. If not specified, then from_sample will try to infer the MIME type from file extension.
allowed_mime_types – An optional list of MIME types to restrict input to.

Returns:

IODescriptor from given users inputs.

Return type:

Image

Example:

service.py#

from __future__ import annotations

import bentoml
from typing import Any
from bentoml.io import Image
import numpy as np

input_spec = Image.from_sample("/path/to/image.jpg")
@svc.api(input=input_spec, output=Image())
async def predict(input: t.IO[t.Any]) -> t.IO[t.Any]:
    return await runner.async_run(input)

Raises:

InvalidArgument – If the given sample is not a valid image type.
MissingDependencyException – If filetype is not installed.
BadInput – Given sample from file can’t be parsed with Pillow.

async Image.from_proto(field: pb.File | pb_v1alpha1.File | bytes) → ImageType[source]#

async Image.from_http_request(request: Request) → ImageType[source]#

async Image.to_proto(obj: ImageType) → pb.File[source]#

async Image.to_http_response(obj: ImageType, ctx: Context | None = None) → Response[source]#

Files#

class bentoml.io.File(kind: FileKind = 'binaryio', mime_type: str | None = None, **kwargs: t.Any)[source]#

File defines API specification for the inputs/outputs of a Service, where either inputs will be converted to or outputs will be converted from file-like objects as specified in your API function signature.

A sample ViT service:

service.py#

from __future__ import annotations

import io
from typing import TYPE_CHECKING
from typing import Any

import bentoml
from bentoml.io import File

if TYPE_CHECKING:
    from numpy.typing import NDArray

runner = bentoml.tensorflow.get('image-classification:latest').to_runner()

svc = bentoml.Service("vit-pdf-classifier", runners=[runner])

@svc.api(input=File(), output=NumpyNdarray(dtype="float32"))
async def predict(input_pdf: io.BytesIO[Any]) -> NDArray[Any]:
    return await runner.async_run(input_pdf)

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

% curl -H "Content-Type: multipart/form-data" \
       -F 'fileobj=@test.pdf;type=application/pdf' \
       http://0.0.0.0:3000/predict

Python

request.py#

import requests

requests.post(
    "http://0.0.0.0:3000/predict",
    files = {"upload_file": open('test.pdf', 'rb')},
    headers = {"content-type": "multipart/form-data"}
).text

Parameters:

kind – The kind of file-like object to be used. Currently, the only accepted value is binaryio.
mime_type – Return MIME type of the starlette.response.Response, only available when used as output descriptor

Returns:

IO Descriptor that represents file-like objects.

Return type:

File

Create a File IO Descriptor from given inputs.

Parameters:

sample – Given File-like object, or a path to a file.
kind – The kind of file-like object to be used. Currently, the only accepted value is binaryio.
mime_type – Optional MIME type for the descriptor. If not provided, from_sample will try to infer the MIME type from the file extension.

Returns:

IODescriptor from given users inputs.

Return type:

File

Example:

service.py#

from __future__ import annotations

import bentoml
from typing import Any
from bentoml.io import File

input_spec = File.from_sample("/path/to/file.pdf")
@svc.api(input=input_spec, output=File())
async def predict(input: t.IO[t.Any]) -> t.IO[t.Any]:
    return await runner.async_run(input)

Raises:

MissingDependencyException – If filetype is not installed.
ValueError – If the MIME type cannot be inferred automatically.
BadInput – Any other errors that may occur during the process of figure out the MIME type.

async File.from_proto(field: File | bytes) → FileLike[bytes][source]#

async File.from_http_request(request: Request) → FileLike[bytes][source]#

async File.to_proto(obj: IOBase | IO[bytes] | FileLike[bytes]) → File[source]#

async File.to_http_response(obj: FileType, ctx: Context | None = None)[source]#

Multipart Payloads#

Note

Multipart makes it possible to compose a multipart payload from multiple other IO Descriptor instances. For example, you may create a Multipart input that contains a image file and additional metadata in JSON.

class bentoml.io.Multipart(**inputs: IODescriptor[Any])[source]#

Multipart defines API specification for the inputs/outputs of a Service, where inputs/outputs of a Service can receive/send a multipart request/responses as specified in your API function signature.

A sample service implementation:

service.py#

from __future__ import annotations

from typing import TYPE_CHECKING
from typing import Any

import bentoml
from bentoml.io import NumpyNdarray
from bentoml.io import Multipart
from bentoml.io import JSON

if TYPE_CHECKING:
    from numpy.typing import NDArray

runner = bentoml.sklearn.get("sklearn_model_clf").to_runner()

svc = bentoml.Service("iris-classifier", runners=[runner])

input_spec = Multipart(arr=NumpyNdarray(), annotations=JSON())
output_spec = Multipart(output=NumpyNdarray(), result=JSON())

@svc.api(input=input_spec, output=output_spec)
async def predict(
    arr: NDArray[Any], annotations: dict[str, Any]
) -> dict[str, NDArray[Any] | dict[str, Any]]:
    res = await runner.run(arr)
    return {"output": res, "result": annotations}

Users then can then serve this service with bentoml serve:

% bentoml serve ./service.py:svc --reload

Users can then send requests to the newly started services with any client:

Bash

% curl -X POST -H "Content-Type: multipart/form-data" \
       -F annotations=@test.json -F arr='[5,4,3,2]' \
       http://0.0.0.0:3000/predict

# --b1d72c201a064ecd92a17a412eb9208e
# Content-Disposition: form-data; name="output"
# content-length: 1
# content-type: application/json

# 1
# --b1d72c201a064ecd92a17a412eb9208e
# Content-Disposition: form-data; name="result"
# content-length: 13
# content-type: application/json

# {"foo":"bar"}
# --b1d72c201a064ecd92a17a412eb9208e--

Python

Note

The following code snippet uses requests_toolbelt. Install with pip install requests-toolbelt.

request.py#

import requests

from requests_toolbelt.multipart.encoder import MultipartEncoder

m = MultipartEncoder(
    fields={
        "field0": "value",
        "field1": "value",
        "field2": ("filename", open("test.json", "rb"), "application/json"),
    }
)

requests.post(
    "http://0.0.0.0:3000/predict", data=m, headers={"Content-Type": m.content_type}
)

Parameters:

inputs –

Dictionary consisting keys as inputs definition for a Multipart request/response, values as IODescriptor supported by BentoML. Currently, Multipart supports Image, NumpyNdarray, PandasDataFrame, PandasSeries, Text, and File.

Make sure to match the input parameters in function signatures in an API function to the keys defined under Multipart:

+----------------------------------------------------------------+
|                                                                |
|   +--------------------------------------------------------+   |
|   |                                                        |   |
|   |    Multipart(arr=NumpyNdarray(), annotations=JSON())   |   |
|   |               |                       |                |   |
|   +---------------+-----------------------+----------------+   |
|                   |                       |                    |
|                   |                       |                    |
|                   |                       |                    |
|                   +-----+        +--------+                    |
|                         |        |                             |
|         +---------------v--------v---------+                   |
|         |  def predict(arr, annotations):  |                   |
|         +----------------------------------+                   |
|                                                                |
+----------------------------------------------------------------+

Returns:

IO Descriptor that represents a Multipart request/response.

Return type:

Multipart

async Multipart.from_proto(field: Multipart) → dict[str, Any][source]#

async Multipart.from_http_request(request: Request) → dict[str, Any][source]#

async Multipart.to_proto(obj: dict[str, Any]) → Multipart[source]#

async Multipart.to_http_response(obj: dict[str, t.Any], ctx: Context | None = None) → Response[source]#

Custom IODescriptor#

Note

The IODescriptor base class can be extended to support custom data format for your APIs, if the built-in descriptors does not fit your needs.

class bentoml.io.IODescriptor[source]#: IODescriptor describes the input/output data format of an InferenceAPI defined in a bentoml.Service. This is an abstract base class for extending new HTTP endpoint IO descriptor types in BentoServer.

API IO Descriptors#

NumPy ndarray#

Tabular Data with Pandas#

Structured Data with JSON#

Texts#

Images#

Files#

Multipart Payloads#

Custom IODescriptor#

NumPy `ndarray`#