TensorFlow#

TensorFlow is an open source machine learning library focusing on deep neural networks. BentoML provides native support for serving and deploying models trained from TensorFlow.

Preface#

Even though bentoml.tensorflow supports Keras model, we recommend our users to use bentoml.keras for better development experience.

If you must use TensorFlow for your Keras model, make sure that your Keras model inference callback (such as predict) is decorated with function.

Note

  • Keras is not optimized for production inferencing. There are known reports of memory leaks during serving at the time of BentoML 1.0 release. The same issue applies to bentoml.keras as it heavily relies on the Keras APIs.

  • Running Inference with tensorflow usually halves the time comparing with using bentoml.keras.

  • bentoml.keras performs input casting that resembles the original Keras model input signatures.

Note

Remarks: We recommend users apply model optimization techniques such as distillation or quantization. Alternatively, Keras models can also be converted to ONNX models and leverage different runtimes.

Compatibility#

BentoML requires TensorFlow version 2.0 or higher. For TensorFlow version 1.0, consider using a Custom Runner.

Saving a Trained Model#

bentoml.tensorflow supports saving ``tf.Module``s, ``keras.models.Sequential``s, and ``keras.Model``s.

train.py#
# models created from the tf native API

class NativeModel(tf.Module):
    def __init__(self):
        super().__init__()
        self.weights = np.asfarray([[1.0], [1.0], [1.0], [1.0], [1.0]])
        self.dense = lambda inputs: tf.matmul(inputs, self.weights)

    @tf.function(
        input_signature=[
            tf.TensorSpec(shape=[None, 5], dtype=tf.float64, name="inputs")
        ]
    )
    def __call__(self, inputs):
        return self.dense(inputs)

model = NativeModel()

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

EPOCHS = 10
for epoch in range(EPOCHS):
    with tf.GradientTape() as tape:
        predictions = model(train_x)
        loss = loss_object(train_y, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

bentoml.tensorflow.save(
    model,
    "my_tf_model",
    signatures={"__call__": {"batchable": True, "batchdim": 0}}
)
train.py#
class Model(keras.Model):
    def __init__(self):
        super().__init__()
        self.dense = keras.layers.Dense(1)

    @tf.function(
        input_signature=[
            tf.TensorSpec(shape=[None, 5], dtype=tf.float64, name="inputs")
        ]
    )
    def call(self, inputs):
        return self.dense(inputs)

model = Model()
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(train_x, train_y, epochs=10)

bentoml.tensorflow.save(
    model,
    "my_keras_model",
    signatures={"__call__": {"batchable": True, "batchdim": 0}}
)
train.py#
model = keras.models.Sequential(
    (
        keras.layers.Dense(
            units=1,
            input_shape=(5,),
            dtype=tf.float64,
            use_bias=False,
            kernel_initializer=keras.initializers.Ones(),
        ),
    )
)
opt = keras.optimizers.Adam(0.002, 0.5)
model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])
model.fit(train_x, train_y, epochs=10)

bentoml.tensorflow.save(
    model,
    "my_keras_model",
    signatures={"__call__": {"batchable": True, "batchdim": 0}}
)
train.py#
x = keras.layers.Input((5,), dtype=tf.float64, name="x")
y = keras.layers.Dense(
    6,
    name="out",
    kernel_initializer=keras.initializers.Ones(),
)(x)
model = keras.Model(inputs=x, outputs=y)
opt = keras.optimizers.Adam(0.002, 0.5)
model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])
model.fit(train_x, train_y, epochs=10)

bentoml.tensorflow.save(
    model,
    "my_keras_model",
    signatures={"__call__": {"batchable": True, "batchdim": 0}}
)

bentoml.tensorflow also supports saving models that take multiple tensors as input:

Note

save_model has two parameters: tf_signature and signatures. Use the following arguments to define the model signatures to ensure consistent model behaviors in a Python session and from the BentoML model store. - tf_signatures is an alias to tf.saved_model.save signatures field. This optional signatures controls which methods in a given obj will be available to programs that consume SavedModel’s, for example, serving APIs. Read more about TensorFlow’s signatures behavior from their API documentation. - signatures refers to a general :ref:`Model Signatures <concepts/model:Model Signatures>`_ that dictates which methods can be used for inference in the Runner context. This signatures dictionary will be used during the creation process of a Runner instance.

The signatures used for creating a Runner is {"__call__": {"batchable": False}}. This means by default, BentoML’s Adaptive Batching is disabled when using save_model(). If you want to utilize adaptive batching behavior and know your model’s dynamic batching dimension, make sure to pass in signatures as follow:

bentoml.tensorflow.save(model, "my_model", signatures={"__call__": {"batch_dim": 0, "batchable": True}})

Building a Service#

Create a BentoML service with the previously saved my_tf_model pipeline using the tensorflow framework APIs.

service.py#
runner = bentoml.tensorflow.get("my_tf_model").to_runner()

svc = bentoml.Service(name="test_service", runners=[runner])

@svc.api(input=JSON(), output=JSON())
async def predict(json_obj: JSONSerializable) -> JSONSerializable:
    batch_ret = await runner.async_run([json_obj])
    return batch_ret[0]

Note

Follow the steps to get the best performance out of your TensorFlow model. #. Save the model with well-defined function decorator. #. Apply adaptive batching if possible. #. Serve on GPUs if applicable. #. See performance guide from [TensorFlow Doc]

Adaptive Batching#

Most TensorFlow models can accept batched data as input. If batch inference is supported, it is recommended to enable batching to take advantage of the adaptive batching capability to improve the throughput and efficiency of the model. Enable adaptive batching by overriding the signatures argument with the method name and providing batchable and batch_dim configurations when saving the model to the model store.

We may modify our code from

train.py#
class NativeModel(tf.Module):
    @tf.function(
        input_signature=[
            tf.TensorSpec(shape=[1, 5], dtype=tf.int64, name="inputs")
        ]
    )
    def __call__(self, inputs):
        ...

model = NativeModel()
bentoml.tensorflow.save(model, "test_model")  # the default signature is `{"__call__": {"batchable": False}}`

runner.run([[1,2,3,4,5]])  # -> bentoml will always call `model([[1,2,3,4,5]])`

to

train.py#
class NativeModel(tf.Module):
    @tf.function(
        input_signature=[
            tf.TensorSpec(shape=[None, 5], dtype=tf.float64, name="inputs")
        ]
    )
    def __call__(self, inputs):
        ...

model = NativeModel()
bentoml.tensorflow.save(
    model,
    "test_model",
    signatures={"__call__": {"batchable": True, "batchdim": 0}},
)

#client 1
runner.run([[1,2,3,4,5]])

#client 2
runner.run([[6,7,8,9,0]])

# if multiple requests from different clients arrived at the same time,
# bentoml will automatically merge them and call model([[1,2,3,4,5], [6,7,8,9,0]])

See also

See Adaptive Batching to learn more.

Note

You can find more examples for TensorFlow in our `bentoml/examples https://github.com/bentoml/BentoML/tree/main/examples`_ directory.