API and IO Descriptors¶
APIs are functions defined in the service definition that are exposed as an HTTP or gRPC endpoint. A function is a part of the APIs if it is decorated with the @svc.api decorator. APIs can be defined either as a synchronous function or asynchronous coroutine in Python. APIs fulfill requests by invoking the pre- and post-processing logic in the function and model runners created in the service definition. Let’s look into each of these parts in details.
Sync vs Async APIs¶
APIs can be defined as either synchronous function or asynchronous coroutines in Python. The API we created in the Getting Started guide was a synchronous API. BentoML will intelligently create an optimally sized pool of workers to execute the synchronous logic. Synchronous APIs are simple and capable of getting the job done for many common model serving scenarios.
# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
# Define pre-processing logic
result = runner.run(input_array)
# Define post-processing logic
return result
Synchronous APIs fall short when we want to maximize the performance and throughput of the service. Asynchronous APIs are preferred if the processing logic is IO-bound or invokes multiple runners simultaneously. The following async API example calls a remote feature store asynchronously, invokes two runners simultaneously, and returns the better result.
import aiohttp
import asyncio
# Load two runners for two different versions of the ScikitLearn
# Iris Classifier models we saved before
runner1 = bentoml.sklearn.load_runner("iris_classifier_model:yftvuwkbbbi6zcphca6rzl235")
runner2 = bentoml.sklearn.load_runner("iris_classifier_model:edq3adsfhzi6zgr6vtpeqaare")
# Create async API coroutine with pre-rocessing logic calling a feature store
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
async def predict(input_array: np.ndarray) -> np.ndarray:
# Call a remote feature store to pre-process the request
async with aiohttp.ClientSession() as session:
params = [("key", v) for v in a]
async with session.get('https://features/get', params=input_array[0]) as resp:
features = get_features(await resp.text())
# Invoke both model runners simultaneously and return the better result
results = await asyncio.gather(
runner1.async_run(input_array, features),
runner2.async_run(input_array, features),
)
return compare_results(results)
The asynchronous API implementation is more efficient because when an asynchronous method is invoked, the event loop is released to service other requests while this request awaits the results of the method. In addition, BentoML will automatically configure the ideal amount of parallelism based on the available number of CPU cores. Further tuning of event loop configuration is not needed under common use cases.
IO Descriptors¶
The input and output descriptors define the API specifications and validate the arguments and return values of the API at runtime. They are specified through the input and output arguments in the @svc.api decorator. Recall the API we created in the Getting Started guide. The predict API both accepts arguments and returns results in the type of bentoml.io.NumpyNdarray. NumpyNdarray describes the argument of return value of type numpy.ndarray, as specified in the Python function signature.
import numpy as np
from bentoml.io import NumpyNdarray
# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
# Define pre-processing logic
result = await runner.run(input_array)
# Define post-processing logic
return result
The IO descriptors help automatically generate an OpenAPI specifications of the service based on the types of IO descriptors selected. We can further customize the IO descriptors by providing the dtype of the numpy.ndarray object. The provided dtype will be automatically translated in the generated OpenAPI specification. The IO descriptors will validate the arguments and return values against the provided dtype. Requests that fail the validation will result in errors. We can choose to optionally disable validation through the validate argument.
import numpy as np
from bentoml.io import NumpyNdarray
# Create API function with pre- and post- processing logic
@svc.api(
input=NumpyNdarray(schema=np.dtype(int, 4), validate=True),
output=NumpyNdarray(schema=np.dtype(int), validate=True),
)
def predict(input_array: np.ndarray) -> np.ndarray:
# Define pre-processing logic
result = await runner.run(input_array)
# Define post-processing logic
return result
Todo
insert Swagger screenshot
Built-in Types¶
Beside NumpyNdarray, BentoML supports a variety of other built-in IO descriptor types under the bentoml.io package. Each type comes with support of type validation and OpenAPI specification generation.
IO Descriptor |
Type |
Arguments |
Schema Type |
---|---|---|---|
NumpyNdarray |
numpy.ndarray |
validate, schema |
numpy.dtype |
PandasDataFrame |
pandas.DataFrame |
validate, schema |
pandas.DataFrame.dtypes |
Json |
Python native types |
validate, schema |
Pydantic.BaseModel |
Composite Types¶
Multiple IO descriptors can be specified as tuples in the input and output arguments the API decorator. Composite IO descriptors allow the API to accept multiple arguments and return multiple values. Each IO descriptor can be customized with independent schema and validation logic.
import typing as t
import numpy as np
from pydantic import BaseModel
from bentoml.io import NumpyNdarray, Json
class FooModel(BaseModel):
"""Foo model documentation"""
field1: int
field2: float
field3: str
my_np_input = NumpyNdarray.from_sample(np.ndarray(...))
# Create API function with pre- and post- processing logic
@svc.api(
input=Multipart(
arr=NumpyNdarray(schema=np.dtype(int, 4), validate=True),
json=Json(pydantic_model=FooModel),
)
output=NumpyNdarray(schema=np.dtype(int), validate=True),
)
def predict(arr: np.ndarray, json: t.Dict[str, t.Any]) -> np.ndarray:
...
Further Reading¶
API Reference for IO descriptors