ONNX-mlir

Users can now use onnx-mlir with BentoML with the following API: load, save, and load_runner as follow:

import sys
import os
import subprocess

import bentoml
import tensorflow as tf

sys.path.append("/workdir/onnx-mlir/build/Debug/lib/")

from PyRuntime import ExecutionSession

class NativeModel(tf.Module):
   def __init__(self):
         super().__init__()
         self.weights = np.asfarray([[1.0], [1.0], [1.0], [1.0], [1.0]])
         self.dense = lambda inputs: tf.matmul(inputs, self.weights)

   @tf.function(
         input_signature=[tf.TensorSpec(shape=[1, 5], dtype=tf.float64, name="inputs")]
   )
   def __call__(self, inputs):
         return self.dense(inputs)

directory = "/tmp/model"
model = NativeModel()
tf.saved_model.save(model, directory)

model_path = os.path.join(directory, "model.onnx")
command = [
   "python",
   "-m",
   "tf2onnx.convert",
   "--saved-model",
   directory,
   "--output",
   model_path,
]
docker_proc = subprocess.Popen(  # noqa
   command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=tmpdir, text=True
)
stdout, stderr = docker_proc.communicate()

sys.path.append("/workdir/onnx-mlir/build/Debug/lib/")
model_location = os.path.join(directory, "model.onnx")
command = ["./onnx-mlir", "--EmitLib", model_location]
onnx_mlir_loc = "/workdir/onnx-mlir/build/Debug/bin"

docker_proc = subprocess.Popen(
   command,
   stdout=subprocess.PIPE,
   stderr=subprocess.PIPE,
   text=True,
   cwd=onnx_mlir_loc,
)
stdout, stderr = docker_proc.communicate()

model_path = os.path.join(directory, "model.so")

# `save` a ONNX model to BentoML modelstore:
tag = bentoml.onnxmlir.save("compiled_model", model)

# retrieve metadata with `bentoml.models.get`:
metadata = bentoml.models.get(tag)

# `load` the given model back:
loaded = bentoml.onnxmlir.load("compiled_model")

# Run a given model under `Runner` abstraction with `load_runner`
runner = bentoml.onnxmlir.load_runner("compiled_model:latest")
res = runner.run_batch(np.array([[1,2,3]]).astype(np.float64))

Note

You can find more examples for ONNX in our gallery repo.

bentoml.onnxmlir.save(name, model, *, labels=None, custom_objects=None, metadata=None)

Save a model instance to BentoML modelstore.

Parameters
  • name (str) – Name for given model instance. This should pass Python identifier check.

  • model (str) – Path to compiled model by MLIR.

  • labels (Dict[str, str], optional, default to None) – user-defined labels for managing models, e.g. team=nlp, stage=dev

  • custom_objects (Dict[str, Any]], optional, default to None) – user-defined additional python objects to be saved alongside the model, e.g. a tokenizer instance, preprocessor function, model configuration json

  • metadata (Dict[str, Any], optional, default to None) – Custom metadata for given model.

  • model_store (ModelStore, default to BentoMLContainer.model_store) – BentoML modelstore, provided by DI Container.

Returns

A tag with a format name:version where name is the user-defined model’s name, and a generated version by BentoML.

Return type

Tag

Examples:

import sys
import os
import subprocess

import bentoml
import tensorflow as tf

sys.path.append("/workdir/onnx-mlir/build/Debug/lib/")

from PyRuntime import ExecutionSession

class NativeModel(tf.Module):
    def __init__(self):
        super().__init__()
        self.weights = np.asfarray([[1.0], [1.0], [1.0], [1.0], [1.0]])
        self.dense = lambda inputs: tf.matmul(inputs, self.weights)

    @tf.function(
        input_signature=[tf.TensorSpec(shape=[1, 5], dtype=tf.float64, name="inputs")]
    )
    def __call__(self, inputs):
        return self.dense(inputs)

directory = "/tmp/model"
model = NativeModel()
tf.saved_model.save(model, directory)

model_path = os.path.join(directory, "model.onnx")
command = [
    "python",
    "-m",
    "tf2onnx.convert",
    "--saved-model",
    directory,
    "--output",
    model_path,
]
docker_proc = subprocess.Popen(  # noqa
    command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=tmpdir, text=True
)
stdout, stderr = docker_proc.communicate()

sys.path.append("/workdir/onnx-mlir/build/Debug/lib/")
model_location = os.path.join(directory, "model.onnx")
command = ["./onnx-mlir", "--EmitLib", model_location]
onnx_mlir_loc = "/workdir/onnx-mlir/build/Debug/bin"

docker_proc = subprocess.Popen(
    command,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True,
    cwd=onnx_mlir_loc,
)
stdout, stderr = docker_proc.communicate()

model_path = os.path.join(directory, "model.so")
tag = bentoml.onnxmlir.save("compiled_model", model)
bentoml.onnxmlir.load(tag, model_store=<simple_di.providers.SingletonFactory object>)

Load a model from BentoML local modelstore with given name.

onnx-mlir is a compiler technology that can take an onnx model and lower it (using llvm) to an inference library that is optimized and has little external dependencies.

The PyRuntime interface is created during the build of onnx-mlir using pybind. See the onnx-mlir supporting documentation for detail.

Parameters
  • tag (Union[str, Tag]) – Tag of a saved model in BentoML local modelstore.

  • model_store (ModelStore, default to BentoMLContainer.model_store) – BentoML modelstore, provided by DI Container.

Returns

an instance of ONNX-MLir compiled model from BentoML modelstore.

Return type

ExecutionSession

Examples:

import bentoml

session = bentoml.onnxmlir.load(tag)
session.run(data)
bentoml.onnxmlir.load_runner(tag, *, name=None)

Runner represents a unit of serving logic that can be scaled horizontally to maximize throughput. bentoml.onnxmlir.load_runner() implements a Runner class that wrap around a ONNX-MLir compiled model, which optimize it for the BentoML runtime.

Parameters

tag (Union[str, Tag]) – Tag of a saved model in BentoML local modelstore.

Returns

Runner instances for bentoml.xgboost model

Return type

Runner

Examples:

import bentoml

runner = bentoml.onnxmlir.load_runner(tag)
res = runner.run_batch(pd_dataframe.to_numpy().astype(np.float64))