Deploying to Google Cloud AI Platform Unified

Google Cloud AI Platform offers a solution to perform ML Inference through containers. Similar to Cloud Run, this solution is a fully managed compute platform that automatically scales. The benefit to use this solution instead of Cloud Run are 2: 1. Perform inference with GPUs 2. Integrates with managed solutions available in AI Platform Predictions, such as batch prediction, continuous evaluation, monitoring and explainability.

AI Platform Unified Prediction has strict requirements for the accepted input and output of a request. You can find more here

This guide demonstrates how to deploy a scikit-learn based iris classifier model with BentoML to Google Cloud AI Platform Unified. The same deployment steps are also applicable for models trained with other machine learning frameworks, see more BentoML examples here.


Create Google cloud project

You can create a GCP project to test this deployment or use a an existing project, setting the following environment variable:


Alternatively you can create a dummy project following the next steps.

$ gcloud components update

All components are up to date.
$ gcloud projects create irisclassifier-gcloud-aiplat

# Sample output

Create in progress for [].
Waiting for [operations/cp.6403723248945195918] to finish...done.
Enabling service [] on project [irisclassifier-gcloud-aiplat]...
Operation "operations/acf.15917ed1-662a-484b-b66a-03259041bf43" finished successfully.
$ gcloud config set project irisclassifier-gcloud-aiplat

Updated property [core/project]

Build and push BentoML model service image to GCP repository

Run the example project from the quick start guide to create the BentoML saved bundle for deployment:

Create a file with the following bash command which will train an iris sklearn model and package it in a Bento Service:

AI Platform Unified expects a Json Input with the following structure: .. code-block:

   "instances": INSTANCES,
   "parameters": PARAMETERS

The BentoML predict function needs to return a JSON Dict with the following structure: .. code-block:

{'predictions': PREDICTIONS}

We define with the following bash command the BentoML Service:

cat > <<EOF
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.frameworks.sklearn import SklearnModelArtifact

class IrisClassifier(BentoService):
    A minimum prediction service exposing a Scikit-learn model

    @api(input=JsonInput(), batch=False)
    def predict(self, input: dict):
        AI Platform Unified expects a Json Input with the following structure:
          "instances": INSTANCES,
          "parameters": PARAMETERS
        See more here:
        And returns a JSON Dict with the following structure:
        {'predictions': PREDICTIONS}
        See more here:

        return {'predictions': self.artifacts.model.predict(input['instances'])}

Build BentoML bundle:

$ python

Verify the saved bundle created:

$ bentoml get IrisClassifier:latest

# Sample output
  "name": "IrisClassifier",
  "version": "20210325170627_3F9592",
  "uri": {
    "type": "LOCAL",
    "uri": "/Users/eliasecchi/bentoml/repository/IrisClassifier/20210325170627_3F9592"
  "bentoServiceMetadata": {
    "name": "IrisClassifier",
    "version": "20210325170627_3F9592",
    "createdAt": "2021-03-25T17:06:28.274128Z",
    "env": {
      "condaEnv": "name: bentoml-default-conda-env\nchannels:\n- conda-forge\n- defaults\ndependencies:\n- pip\n",
      "pythonVersion": "3.7.9",
      "dockerBaseImage": "bentoml/model-server:0.10.1-py37",
      "pipPackages": [
    "artifacts": [
        "name": "model",
        "artifactType": "SklearnModelArtifact",
        "metadata": {}
    "apis": [
        "name": "predict",
        "inputType": "DataframeInput",
        "docs": "\n        An inference API named `predict` with Dataframe input adapter, which codifies\n        how HTTP requests or CSV files are converted to a pandas Dataframe object as the\n        inference API function input\n        ",
        "inputConfig": {
          "orient": null,
          "typ": "frame",
          "dtype": null
        "outputConfig": {
          "cors": "*"
        "outputType": "DefaultOutput",
        "mbMaxLatency": 10000,
        "mbMaxBatchSize": 2000,
        "batch": true

The BentoML saved bundle created can now be used to start a REST API Server hosting the BentoService and available for sending test request:

# Start BentoML API server:
bentoml serve IrisClassifier:latest
# Send test request:
curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '{"instances":[[5.1, 3.5, 1.4, 0.2]]}' \

Use gcloud CLI to build the docker image

# Find the local path of the latest version IrisClassifier saved bundle
$ saved_path=$(bentoml get IrisClassifier:latest --print-location --quiet)
$ cd $saved_path
$ gcloud builds submit --tag$GCP_PROJECT/iris-classifier

# Sample output

Creating temporary tarball archive of 15 file(s) totalling 15.8 MiB before compression.
Uploading tarball of [.] to [gs://irisclassifier-gcloud-aiplat_cloudbuild/source/1587430763.39-03422068242448efbcfc45f2aed218d3.tgz]
Created [].
Logs are available at []
----------------------------- REMOTE BUILD OUTPUT ------------------------------

ID                                    CREATE_TIME                DURATION  SOURCE                                                                                               IMAGES                                                      STATUS
9c0f3ef4-11c0-4089-9406-1c7fb9c7e8e8  2020-04-21T00:59:38+00:00  5M22S     gs://irisclassifier-gcloud-aiplat_cloudbuild/source/1587430763.39-03422068242448efbcfc45f2aed218d3.tgz (+1 more)  SUCCESS

Deploy the image to Google Cloud AI Platform Unified

  1. Use your browser, go into the Google Cloud Console, select project irisclassifier-gcloud-aiplat or to the project you are using in this deployment and navigate to the AI Platform Unified page

  2. Click Models on the bottom of the navigation bar.

  3. Click on IMPORT

  4. In the Create Model service page, insert a name for the model and select a region. Click on Continue.

  5. Select Import an existing container. Select the image you previously pushed to GCR.

  6. You will need to setup routes and ports. The following configuration will allow you to do that for BentoML:

GCP project creation
  1. Click on Import

  2. Click on the model you just created. You will need now to create an endpoint for that. You can do that clicking in Deploy Endpoint as shown in the image.

GCP project creation
  1. You will need to give a name to the endpoint and allocate some resources to it. You can use the default values for resources and traffic split.

GCP project creation
  1. Click on DEPLOY

Validate Google Cloud AI Platform Unified deployment with sample data

Copy the ENDPOINT_ID from the deployed endpoint

$ gcloud ai endpoints list

# Sample output
887508193754741784   test

Create an Environment variable for that:

$ ENDPOINT_ID=887508193754741784

Send a request:

$ curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \${GCP_PROJECT}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict \
-d '{ "instances":[[0, 1, 0, 1]] }'

# Sample output
  "predictions": [
  "deployedModelId": "3013629430338682880"

Clean up deployed service on Google Cloud AI Platform Unified

  1. Navigate to the manage resources page in Google Cloud Console.

  2. In the project list, select the project you want to delete and click the delete icon

  3. In the dialog, type the projectID irisclassifier-gcloud-aiplat and then click Shut down to delete the project.