TensorFlow#
TensorFlow is an open source machine learning library focusing on deep neural networks. BentoML provides native support for serving and deploying models trained from TensorFlow.
Preface#
Even though bentoml.tensorflow
supports Keras model, we recommend our users to use bentoml.keras for better development experience.
If you must use TensorFlow for your Keras model, make sure that your Keras model inference callback (such as predict
) is decorated with function
.
Note
Keras is not optimized for production inferencing. There are known reports of memory leaks during serving at the time of BentoML 1.0 release. The same issue applies to
bentoml.keras
as it heavily relies on the Keras APIs.Running Inference with
tensorflow
usually halves the time comparing with usingbentoml.keras
.bentoml.keras
performs input casting that resembles the original Keras model input signatures.
Note
Remarks: We recommend users apply model optimization techniques such as distillation or quantization. Alternatively, Keras models can also be converted to ONNX models and leverage different runtimes.
Compatibility#
BentoML requires TensorFlow version 2.0 or higher. For TensorFlow version 1.0, consider using a Custom Runner.
Saving a Trained Model#
bentoml.tensorflow
supports saving ``tf.Module``s, ``keras.models.Sequential``s, and ``keras.Model``s.
# models created from the tf native API
class NativeModel(tf.Module):
def __init__(self):
super().__init__()
self.weights = np.asfarray([[1.0], [1.0], [1.0], [1.0], [1.0]])
self.dense = lambda inputs: tf.matmul(inputs, self.weights)
@tf.function(
input_signature=[
tf.TensorSpec(shape=[None, 5], dtype=tf.float64, name="inputs")
]
)
def __call__(self, inputs):
return self.dense(inputs)
model = NativeModel()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
EPOCHS = 10
for epoch in range(EPOCHS):
with tf.GradientTape() as tape:
predictions = model(train_x)
loss = loss_object(train_y, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
bentoml.tensorflow.save(
model,
"my_tf_model",
signatures={"__call__": {"batchable": True, "batchdim": 0}}
)
class Model(keras.Model):
def __init__(self):
super().__init__()
self.dense = keras.layers.Dense(1)
@tf.function(
input_signature=[
tf.TensorSpec(shape=[None, 5], dtype=tf.float64, name="inputs")
]
)
def call(self, inputs):
return self.dense(inputs)
model = Model()
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(train_x, train_y, epochs=10)
bentoml.tensorflow.save(
model,
"my_keras_model",
signatures={"__call__": {"batchable": True, "batchdim": 0}}
)
model = keras.models.Sequential(
(
keras.layers.Dense(
units=1,
input_shape=(5,),
dtype=tf.float64,
use_bias=False,
kernel_initializer=keras.initializers.Ones(),
),
)
)
opt = keras.optimizers.Adam(0.002, 0.5)
model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])
model.fit(train_x, train_y, epochs=10)
bentoml.tensorflow.save(
model,
"my_keras_model",
signatures={"__call__": {"batchable": True, "batchdim": 0}}
)
x = keras.layers.Input((5,), dtype=tf.float64, name="x")
y = keras.layers.Dense(
6,
name="out",
kernel_initializer=keras.initializers.Ones(),
)(x)
model = keras.Model(inputs=x, outputs=y)
opt = keras.optimizers.Adam(0.002, 0.5)
model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])
model.fit(train_x, train_y, epochs=10)
bentoml.tensorflow.save(
model,
"my_keras_model",
signatures={"__call__": {"batchable": True, "batchdim": 0}}
)
bentoml.tensorflow
also supports saving models that take multiple tensors as input:
Note
save_model
has two parameters: tf_signature
and signatures
.
Use the following arguments to define the model signatures to ensure consistent model behaviors in a Python session and from the BentoML model store.
- tf_signatures is an alias to tf.saved_model.save signatures field. This optional signatures controls which methods in a given obj will be available to programs that consume SavedModelâs, for example, serving APIs. Read more about TensorFlowâs signatures behavior from their API documentation.
- signatures
refers to a general :ref:`Model Signatures <concepts/model:Model Signatures>`_ that dictates which methods can be used for inference in the Runner context. This signatures dictionary will be used during the creation process of a Runner instance.
The signatures used for creating a Runner is {"__call__": {"batchable": False}}
. This means by default, BentoMLâs Adaptive Batching is disabled when using save_model()
. If you want to utilize adaptive batching behavior and know your modelâs dynamic batching dimension, make sure to pass in signatures
as follow:
bentoml.tensorflow.save(model, "my_model", signatures={"__call__": {"batch_dim": 0, "batchable": True}})
Building a Service#
Create a BentoML service with the previously saved my_tf_model pipeline using the tensorflow
framework APIs.
runner = bentoml.tensorflow.get("my_tf_model").to_runner()
svc = bentoml.Service(name="test_service", runners=[runner])
@svc.api(input=JSON(), output=JSON())
async def predict(json_obj: JSONSerializable) -> JSONSerializable:
batch_ret = await runner.async_run([json_obj])
return batch_ret[0]
Note
Follow the steps to get the best performance out of your TensorFlow model.
#. Save the model with well-defined function
decorator.
#. Apply adaptive batching if possible.
#. Serve on GPUs if applicable.
#. See performance guide from [TensorFlow Doc]
Adaptive Batching#
Most TensorFlow models can accept batched data as input. If batch inference is supported, it is recommended to enable batching to take advantage of
the adaptive batching capability to improve the throughput and efficiency of the model. Enable adaptive batching by overriding the signatures
argument with the method name and providing batchable
and batch_dim
configurations when saving the model to the model store.
We may modify our code from
class NativeModel(tf.Module):
@tf.function(
input_signature=[
tf.TensorSpec(shape=[1, 5], dtype=tf.int64, name="inputs")
]
)
def __call__(self, inputs):
...
model = NativeModel()
bentoml.tensorflow.save(model, "test_model") # the default signature is `{"__call__": {"batchable": False}}`
runner.run([[1,2,3,4,5]]) # -> bentoml will always call `model([[1,2,3,4,5]])`
to
class NativeModel(tf.Module):
@tf.function(
input_signature=[
tf.TensorSpec(shape=[None, 5], dtype=tf.float64, name="inputs")
]
)
def __call__(self, inputs):
...
model = NativeModel()
bentoml.tensorflow.save(
model,
"test_model",
signatures={"__call__": {"batchable": True, "batchdim": 0}},
)
#client 1
runner.run([[1,2,3,4,5]])
#client 2
runner.run([[6,7,8,9,0]])
# if multiple requests from different clients arrived at the same time,
# bentoml will automatically merge them and call model([[1,2,3,4,5], [6,7,8,9,0]])
See also
See Adaptive Batching to learn more.
Note
You can find more examples for TensorFlow in our `bentoml/examples https://github.com/bentoml/BentoML/tree/main/examples`_ directory.