Serving with gRPC#

This guide will demonstrate advanced features that BentoML offers for you to get started with gRPC:

This guide will also walk you through tradeoffs of serving with gRPC, as well as recommendation on scenarios where gRPC might be a good fit.

Requirements: This guide assumes that you have basic knowledge of gRPC and protobuf. If you aren’t familar with gRPC, you can start with gRPC quick start guide.

See also

For quick introduction to serving with gRPC, see Intro to BentoML

Get started with gRPC in BentoML#

We will be using the example from the quickstart to demonstrate BentoML capabilities with gRPC.

Requirements#

BentoML supports for gRPC are introduced in version 1.0.6 and above.

Install BentoML with gRPC support with pip:

» pip install -U "bentoml[grpc]"

Thats it! You can now serve your Bento with gRPC via bentoml serve-grpc without having to modify your current service definition 😃.

» bentoml serve-grpc iris_classifier:latest --production

Using your gRPC BentoService#

There are two ways to interact with your gRPC BentoService:

  1. Use tools such as fullstorydev/grpcurl, fullstorydev/grpcui: The server requires reflection to be enabled for those tools to work. Pass in --enable-reflection to enable reflection:

    » bentoml serve-grpc iris_classifier:latest --production --enable-reflection
    

    We will use fullstorydev/grpcurl to send a CURL-like request to the gRPC BentoServer.

    Note that we will use docker to run the grpcurl command.

    » docker run -i --rm \
                   fullstorydev/grpcurl -d @ -plaintext host.docker.internal:3000 \
                   bentoml.grpc.v1.BentoService/Call <<EOT
    {
       "apiName": "classify",
       "ndarray": {
          "shape": [1, 4],
          "floatValues": [5.9, 3, 5.1, 1.8]
       }
    }
    EOT
    
    » docker run -i --rm \
                   --network=host \
                   fullstorydev/grpcurl -d @ -plaintext 0.0.0.0:3000 \
                   bentoml.grpc.v1.BentoService/Call <<EOT
    {
       "apiName": "classify",
       "ndarray": {
          "shape": [1, 4],
          "floatValues": [5.9, 3, 5.1, 1.8]
       }
    }
    EOT
    

    We will use fullstorydev/grpcui to send request from a web browser.

    Note that we will use docker to run the grpcui command.

    » docker run --init --rm \
                   -p 8080:8080 fullstorydev/grpcui -plaintext host.docker.internal:3000
    
    » docker run --init --rm \
                   -p 8080:8080 \
                   --network=host fullstorydev/grpcui -plaintext 0.0.0.0:3000
    

    Proceed to http://127.0.0.1:8080 in your browser and send test request from the web UI.

    Open a different terminal and use one of the following:

  2. Use one of the below client implementations to send test requests to your BentoService.

Client Implementation#

Note

All of the following client implementations are available on GitHub.


From another terminal, use one of the following client implementation to send request to the gRPC server:

Note

gRPC comes with supports for multiple languages. In the upcoming sections we will demonstrate two workflows of generating stubs and implementing clients:

  • Using bazel to manage and isolate dependencies (recommended)

  • A manual approach using protoc its language-specific plugins

We will create our Python client in the directory ~/workspace/iris_python_client/:

» mkdir -p ~/workspace/iris_python_client
» cd ~/workspace/iris_python_client

Create a client.py file with the following content:

client.py#
import asyncio

import grpc

from bentoml.grpc.utils import import_generated_stubs

pb, services = import_generated_stubs()


async def run():
    async with grpc.aio.insecure_channel("localhost:3000") as channel:
        stub = services.BentoServiceStub(channel)
        req = await stub.Call(
            request=pb.Request(
                api_name="classify",
                ndarray=pb.NDArray(
                    dtype=pb.NDArray.DTYPE_FLOAT,
                    shape=(1, 4),
                    float_values=[5.9, 3, 5.1, 1.8],
                ),
            )
        )
    print(req)


if __name__ == "__main__":
    loop = asyncio.new_event_loop()
    try:
        loop.run_until_complete(run())
    finally:
        loop.close()
        assert loop.is_closed()

Requirements: Make sure to install the prerequisites before using Go.

We will create our Golang client in the directory ~/workspace/iris_go_client/:

» mkdir -p ~/workspace/iris_go_client
» cd ~/workspace/iris_go_client

Define a WORKSPACE file:

WORKSPACE
workspace(name = "iris_go_client")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

# setup rules_proto and rules_proto_grpc
http_archive(
    name = "rules_proto",
    sha256 = "e017528fd1c91c5a33f15493e3a398181a9e821a804eb7ff5acdd1d2d6c2b18d",
    strip_prefix = "rules_proto-4.0.0-3.20.0",
    urls = [
        "https://github.com/bazelbuild/rules_proto/archive/refs/tags/4.0.0-3.20.0.tar.gz",
    ],
)

http_archive(
    name = "rules_proto_grpc",
    sha256 = "507e38c8d95c7efa4f3b1c0595a8e8f139c885cb41a76cab7e20e4e67ae87731",
    strip_prefix = "rules_proto_grpc-4.1.1",
    urls = ["https://github.com/rules-proto-grpc/rules_proto_grpc/archive/4.1.1.tar.gz"],
)

load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")
load("@rules_proto_grpc//:repositories.bzl", "rules_proto_grpc_repos", "rules_proto_grpc_toolchains")

rules_proto_grpc_toolchains()

rules_proto_grpc_repos()

rules_proto_dependencies()

rules_proto_toolchains()

# We need to load go_grpc rules first
load("@rules_proto_grpc//:repositories.bzl", "bazel_gazelle", "io_bazel_rules_go")  # buildifier: disable=same-origin-load

io_bazel_rules_go()

bazel_gazelle()

load("@rules_proto_grpc//go:repositories.bzl", rules_proto_grpc_go_repos = "go_repos")

rules_proto_grpc_go_repos()

load("@io_bazel_rules_go//go:deps.bzl", "go_register_toolchains", "go_rules_dependencies")

go_rules_dependencies()

go_register_toolchains(version = "1.19")

http_archive(
    name = "com_github_grpc_grpc",
    strip_prefix = "grpc-v1.49.1",
    urls = [
        "https://github.com/grpc/grpc/archive/v1.49.1.tar.gz",
    ],
)

load("@com_github_grpc_grpc//bazel:grpc_deps.bzl", "grpc_deps")

grpc_deps()

load("@com_github_grpc_grpc//bazel:grpc_extra_deps.bzl", "grpc_extra_deps")

grpc_extra_deps()

load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")

protobuf_deps()

Followed by defining a BUILD file:

BUILD
load("@rules_proto_grpc//go:defs.bzl", "go_grpc_library")
load("@io_bazel_rules_go//go:def.bzl", "go_binary")

proto_library(
    name = "service_v1_proto",
    srcs = ["bentoml/grpc/v1/service.proto"],
    deps = ["@com_google_protobuf//:struct_proto", "@com_google_protobuf//:wrappers_proto"],
)

go_grpc_library(
    name = "service_go",
    importpath = "github.com/bentoml/bentoml/grpc/v1",
    protos = [":service_v1_proto"],
)

go_binary(
    name = "client_go",
    srcs = ["client.go"],
    importpath = "github.com/bentoml/bentoml/grpc/v1",
    deps = [
        ":service_go",
        "@com_github_golang_protobuf//proto:go_default_library",
        "@org_golang_google_grpc//:go_default_library",
        "@org_golang_google_grpc//credentials:go_default_library",
        "@org_golang_google_grpc//credentials/insecure:go_default_library",
    ],
)

Create a Go module:

» go mod init iris_go_client && go mod tidy

Add the following lines to ~/workspace/iris_go_client/go.mod:

require github.com/bentoml/bentoml/grpc/v1 v0.0.0-unpublished

replace github.com/bentoml/bentoml/grpc/v1 v0.0.0-unpublished => ./github.com/bentoml/bentoml/grpc/v1

By using replace directive, we ensure that Go will know where our generated stubs to be imported from. (since we don’t host the generate gRPC stubs on pkg.go.dev 😄)

Since there is no easy way to add additional proto files, we will have to clone some repositories and copy the proto files into our project:

  1. protocolbuffers/protobuf - the official repository for the Protocol Buffers. We will need protobuf files that lives under src/google/protobuf:

» mkdir -p thirdparty && cd thirdparty
» git clone --depth 1 https://github.com/protocolbuffers/protobuf.git
  1. bentoml/bentoml - We need the service.proto under bentoml/grpc/ to build the client, therefore, we will perform a sparse checkout to only checkout bentoml/grpc directory:

» mkdir bentoml && pushd bentoml
» git init
» git remote add -f origin https://github.com/bentoml/BentoML.git
» git config core.sparseCheckout true
» cat <<EOT >|.git/info/sparse-checkout
src/bentoml/grpc
EOT
» git pull origin main && mv src/bentoml/grpc .
» popd

Here is the protoc command to generate the gRPC Go stubs:

» protoc -I. -I thirdparty/protobuf/src  \
         --go_out=. --go_opt=paths=import \
         --go-grpc_out=. --go-grpc_opt=paths=import \
         bentoml/grpc/v1/service.proto

Then run the following to make sure the generated stubs are importable:

» pushd github.com/bentoml/bentoml/grpc/v1
» go mod init v1 && go mod tidy
» popd

Create a client.go file with the following content:

client.go#
package main

import (
	"context"
	"fmt"
	"time"

	pb "github.com/bentoml/bentoml/grpc/v1"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
)

var opts []grpc.DialOption

const serverAddr = "localhost:3000"

func main() {
	opts = append(opts, grpc.WithTransportCredentials(insecure.NewCredentials()))
	conn, err := grpc.Dial(serverAddr, opts...)
	if err != nil {
		panic(err)
	}
	defer conn.Close()
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	client := pb.NewBentoServiceClient(conn)

	req := &pb.Request{
		ApiName: "classify",
		Content: &pb.Request_Ndarray{
			Ndarray: &pb.NDArray{
				Dtype:       *pb.NDArray_DTYPE_FLOAT.Enum(),
				Shape:       []int32{1, 4},
				FloatValues: []float32{3.5, 2.4, 7.8, 5.1},
			},
		},
	}
	resp, err := client.Call(ctx, req)
	if err != nil {
		panic(err)
	}
	fmt.Print(resp)
}

Requirements: Make sure follow the instructions to install gRPC and Protobuf locally.

We will create our C++ client in the directory ~/workspace/iris_cc_client/:

» mkdir -p ~/workspace/iris_cc_client
» cd ~/workspace/iris_cc_client

Define a WORKSPACE file:

WORKSPACE
workspace(name = "iris_cc_client")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_proto",
    sha256 = "e017528fd1c91c5a33f15493e3a398181a9e821a804eb7ff5acdd1d2d6c2b18d",
    strip_prefix = "rules_proto-4.0.0-3.20.0",
    urls = [
        "https://github.com/bazelbuild/rules_proto/archive/refs/tags/4.0.0-3.20.0.tar.gz",
    ],
)
http_archive(
    name = "rules_proto_grpc",
    sha256 = "507e38c8d95c7efa4f3b1c0595a8e8f139c885cb41a76cab7e20e4e67ae87731",
    strip_prefix = "rules_proto_grpc-4.1.1",
    urls = ["https://github.com/rules-proto-grpc/rules_proto_grpc/archive/4.1.1.tar.gz"],
)

load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")

rules_proto_dependencies()

rules_proto_toolchains()

load("@rules_proto_grpc//:repositories.bzl", "rules_proto_grpc_repos", "rules_proto_grpc_toolchains")

rules_proto_grpc_toolchains()

rules_proto_grpc_repos()

http_archive(
    name = "com_github_grpc_grpc",
    strip_prefix = "grpc-v1.49.1",
    urls = [
        "https://github.com/grpc/grpc/archive/v1.49.1.tar.gz",
    ],
)

load("@com_github_grpc_grpc//bazel:grpc_deps.bzl", "grpc_deps")

grpc_deps()

load("@com_github_grpc_grpc//bazel:grpc_extra_deps.bzl", "grpc_extra_deps")

grpc_extra_deps()

load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")

protobuf_deps()

Followed by defining a BUILD file:

BUILD
load("@rules_proto//proto:defs.bzl", "proto_library")
load("@rules_proto_grpc//cpp:defs.bzl", "cc_grpc_library", "cc_proto_library")

proto_library(
    name = "service_v1_proto",
    srcs = ["bentoml/grpc/v1/service.proto"],
    deps = ["@com_google_protobuf//:struct_proto", "@com_google_protobuf//:wrappers_proto"],
)

cc_proto_library(
    name = "service_cc",
    protos = [":service_v1_proto"],
)

cc_grpc_library(
    name = "service_cc_grpc",
    protos = [":service_v1_proto"],
    deps = [":service_cc"],
)

cc_binary(
    name = "client_cc",
    srcs = ["client.cc"],
    deps = [
        ":service_cc_grpc",
        "@com_github_grpc_grpc//:grpc++",
    ],
)

Since there is no easy way to add additional proto files, we will have to clone some repositories and copy the proto files into our project:

  1. protocolbuffers/protobuf - the official repository for the Protocol Buffers. We will need protobuf files that lives under src/google/protobuf:

» mkdir -p thirdparty && cd thirdparty
» git clone --depth 1 https://github.com/protocolbuffers/protobuf.git
  1. bentoml/bentoml - We need the service.proto under bentoml/grpc/ to build the client, therefore, we will perform a sparse checkout to only checkout bentoml/grpc directory:

» mkdir bentoml && pushd bentoml
» git init
» git remote add -f origin https://github.com/bentoml/BentoML.git
» git config core.sparseCheckout true
» cat <<EOT >|.git/info/sparse-checkout
src/bentoml/grpc
EOT
» git pull origin main && mv src/bentoml/grpc .
» popd

Here is the protoc command to generate the gRPC C++ stubs:

» protoc -I . -I ./thirdparty/protobuf/src \
         --cpp_out=. --grpc_out=. \
         --plugin=protoc-gen-grpc=$(which grpc_cpp_plugin) \
         bentoml/grpc/v1/service.proto

Create a client.cpp file with the following content:

client.cpp#
#include <array>
#include <iostream>
#include <memory>
#include <mutex>
#include <string>
#include <vector>

#include <grpc/grpc.h>
#include <grpcpp/channel.h>
#include <grpcpp/client_context.h>
#include <grpcpp/create_channel.h>
#include <grpcpp/grpcpp.h>
#include <grpcpp/security/credentials.h>

#include "bentoml/grpc/v1/service.grpc.pb.h"
#include "bentoml/grpc/v1/service.pb.h"

using bentoml::grpc::v1::BentoService;
using bentoml::grpc::v1::NDArray;
using bentoml::grpc::v1::Request;
using bentoml::grpc::v1::Response;
using grpc::Channel;
using grpc::ClientAsyncResponseReader;
using grpc::ClientContext;
using grpc::Status;

int main(int argc, char **argv) {
    auto stubs = BentoService::NewStub(grpc::CreateChannel(
        "localhost:3000", grpc::InsecureChannelCredentials()));
    std::vector<float> data = {3.5, 2.4, 7.8, 5.1};
    std::vector<int> shape = {1, 4};

    Request request;
    request.set_api_name("classify");

    NDArray *ndarray = request.mutable_ndarray();
    ndarray->mutable_shape()->Assign(shape.begin(), shape.end());
    ndarray->mutable_float_values()->Assign(data.begin(), data.end());

    Response resp;
    ClientContext context;

    // Storage for the status of the RPC upon completion.
    Status status = stubs->Call(&context, request, &resp);

    // Act upon the status of the actual RPC.
    if (!status.ok()) {
        std::cout << status.error_code() << ": " << status.error_message()
                  << std::endl;
        return 1;
    }
    if (!resp.has_ndarray()) {
        std::cout << "Currently only accept output as NDArray." << std::endl;
        return 1;
    }
    std::cout << "response byte size: " << resp.ndarray().ByteSizeLong()
              << std::endl;
    return 0;
}

Requirements: Make sure to have JDK>=7.

Optional: follow the instructions to install protoc plugin for gRPC Java if you plan to use protoc standalone.

Note

Feel free to use any Java build tools of choice (Maven, Gradle, Bazel, etc.) to build and run the client you find fit.

In this tutorial we will be using bazel.

We will create our Java client in the directory ~/workspace/iris_java_client/:

» mkdir -p ~/workspace/iris_java_client
» cd ~/workspace/iris_java_client

Create the client Java package (com.client.BentoServiceClient):

» mkdir -p src/main/java/com/client

Define a WORKSPACE file:

WORKSPACE
workspace(name = "iris_java_client")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_proto",
    sha256 = "e017528fd1c91c5a33f15493e3a398181a9e821a804eb7ff5acdd1d2d6c2b18d",
    strip_prefix = "rules_proto-4.0.0-3.20.0",
    urls = [
        "https://github.com/bazelbuild/rules_proto/archive/refs/tags/4.0.0-3.20.0.tar.gz",
    ],
)
http_archive(
    name = "rules_proto_grpc",
    sha256 = "507e38c8d95c7efa4f3b1c0595a8e8f139c885cb41a76cab7e20e4e67ae87731",
    strip_prefix = "rules_proto_grpc-4.1.1",
    urls = ["https://github.com/rules-proto-grpc/rules_proto_grpc/archive/4.1.1.tar.gz"],
)

load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")

rules_proto_dependencies()

rules_proto_toolchains()

load("@rules_proto_grpc//:repositories.bzl", "rules_proto_grpc_repos", "rules_proto_grpc_toolchains")

rules_proto_grpc_toolchains()

rules_proto_grpc_repos()

http_archive(
    name = "com_github_grpc_grpc",
    strip_prefix = "grpc-v1.48.1",
    urls = [
        "https://github.com/grpc/grpc/archive/v1.48.1.tar.gz",
    ],
)

load("@com_github_grpc_grpc//bazel:grpc_deps.bzl", "grpc_deps")

grpc_deps()

load("@com_github_grpc_grpc//bazel:grpc_extra_deps.bzl", "grpc_extra_deps")

grpc_extra_deps()

load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")

protobuf_deps()

# We will be using 1.48.1 for grpc-java
http_archive(
    name = "io_grpc_grpc_java",
    sha256 = "88b12b2b4e0beb849eddde98d5373f2f932513229dbf9ec86cc8e4912fc75e79",
    strip_prefix = "grpc-java-1.48.1",
    urls = ["https://github.com/grpc/grpc-java/archive/v1.48.1.tar.gz"],
)

http_archive(
    name = "rules_jvm_external",
    sha256 = "c21ce8b8c4ccac87c809c317def87644cdc3a9dd650c74f41698d761c95175f3",
    strip_prefix = "rules_jvm_external-1498ac6ccd3ea9cdb84afed65aa257c57abf3e0a",
    url = "https://github.com/bazelbuild/rules_jvm_external/archive/1498ac6ccd3ea9cdb84afed65aa257c57abf3e0a.zip",
)

load("@rules_jvm_external//:defs.bzl", "maven_install")
load("@com_google_protobuf//:protobuf_deps.bzl", "PROTOBUF_MAVEN_ARTIFACTS")
load("@io_grpc_grpc_java//:repositories.bzl", "IO_GRPC_GRPC_JAVA_ARTIFACTS", "IO_GRPC_GRPC_JAVA_OVERRIDE_TARGETS", "grpc_java_repositories")

grpc_java_repositories()

maven_install(
    artifacts = IO_GRPC_GRPC_JAVA_ARTIFACTS + PROTOBUF_MAVEN_ARTIFACTS,
    generate_compat_repositories = True,
    override_targets = IO_GRPC_GRPC_JAVA_OVERRIDE_TARGETS,
    repositories = [
        "https://repo.maven.apache.org/maven2/",
    ],
)

load("@maven//:compat.bzl", "compat_repositories")

compat_repositories()

Followed by defining a BUILD file:

BUILD
load("@rules_proto//proto:defs.bzl", "proto_library")

proto_library(
    name = "service_v1_proto",
    srcs = ["bentoml/grpc/v1/service.proto"],
    deps = [
        "@com_google_protobuf//:struct_proto",
        "@com_google_protobuf//:wrappers_proto",
    ],
)

load("@io_grpc_grpc_java//:java_grpc_library.bzl", "java_grpc_library")

java_proto_library(
    name = "service_java",
    deps = [":service_v1_proto"],
)

java_grpc_library(
    name = "service_java_grpc",
    srcs = [":service_v1_proto"],
    deps = [":service_java"],
)

java_library(
    name = "java_library",
    srcs = glob(["client/java/src/main/**/*.java"]),
    runtime_deps = [
        "@io_grpc_grpc_java//netty",
    ],
    deps = [
        ":service_java",
        ":service_java_grpc",
        "@com_google_protobuf//:protobuf_java",
        "@com_google_protobuf//:protobuf_java_util",
        "@io_grpc_grpc_java//api",
        "@io_grpc_grpc_java//protobuf",
        "@io_grpc_grpc_java//stub",
        "@maven//:com_google_api_grpc_proto_google_common_protos",
        "@maven//:com_google_code_findbugs_jsr305",
        "@maven//:com_google_code_gson_gson",
        "@maven//:com_google_guava_guava",
    ],
)

java_binary(
    name = "client_java",
    main_class = "com.client.BentoServiceClient",
    runtime_deps = [
        ":java_library",
    ],
)

One simply can’t manually running javac to compile the Java class, since there are way too many dependencies to be resolved.

Provided below is an example of how one can use gradle to build the Java client.

» gradle init --project-dir .

The following build.gradle should be able to help you get started:

build.gradle#
plugins {
    id 'application'
    // ASSUMES GRADLE 5.6 OR HIGHER. Use plugin version 0.8.10 with earlier gradle versions
    id 'com.google.protobuf' version '0.8.18'
    // Generate IntelliJ IDEA .idea & .iml project files
    id 'idea'
}

repositories {
    // The google mirror is less flaky than mavenCentral()
    maven {
        url "https://maven-central.storage-download.googleapis.com/maven2/" 
    }
    mavenCentral()
    mavenLocal()
}

def grpcVersion = '1.48.1'
def protobufVersion = '3.19.4'
def protocVersion = protobufVersion

dependencies {
    // This dependency is used internally, and not exposed to consumers on their own compile classpath.
    implementation 'com.google.guava:guava:30.1.1-jre'
    implementation "io.grpc:grpc-protobuf:${grpcVersion}"
    implementation "io.grpc:grpc-stub:${grpcVersion}"
    compileOnly "org.apache.tomcat:annotations-api:6.0.53"

    // examples/advanced need this for JsonFormat
    implementation "com.google.protobuf:protobuf-java-util:${protobufVersion}"

    runtimeOnly "io.grpc:grpc-netty-shaded:${grpcVersion}"
}

protobuf {
    protoc { artifact = "com.google.protobuf:protoc:${protocVersion}" }
    plugins {
        grpc { artifact = "io.grpc:protoc-gen-grpc-java:${grpcVersion}" }
    }
    generateProtoTasks {
        all()*.plugins { grpc {} }
    }
}

// Inform IDEs like IntelliJ IDEA, Eclipse or NetBeans about the generated code.
sourceSets {
    main {
        java {
            srcDirs 'build/generated/source/proto/main/grpc'
            srcDirs 'build/generated/source/proto/main/java'
        }
    }
}

task bentoServiceClient(type: CreateStartScripts) {
    mainClass = 'com.client.BentoServiceClient'
    applicationName = 'bento-service-client'
    outputDir = new File(project.buildDir, 'tmp/scripts/' + name)
    classpath = startScripts.classpath
}

applicationDistribution.into('bin') {
    from(bentoServiceClient)
    fileMode = 0755
}

To build the client, run:

» ./gradlew build

Proceed to create a src/main/java/com/client/BentoServiceClient.java file with the following content:

BentoServiceClient.java#
package com.client;

import io.grpc.Channel;
import io.grpc.ManagedChannel;
import io.grpc.ManagedChannelBuilder;
import io.grpc.Status;
import io.grpc.StatusRuntimeException;

import java.util.*;
import java.util.concurrent.TimeUnit;
import java.util.logging.Level;
import java.util.logging.Logger;

import com.bentoml.grpc.v1.BentoServiceGrpc;
import com.bentoml.grpc.v1.BentoServiceGrpc.BentoServiceBlockingStub;
import com.bentoml.grpc.v1.BentoServiceGrpc.BentoServiceStub;
import com.bentoml.grpc.v1.NDArray;
import com.bentoml.grpc.v1.Request;
import com.bentoml.grpc.v1.RequestOrBuilder;
import com.bentoml.grpc.v1.Response;

public class BentoServiceClient {

  private static final Logger logger = Logger.getLogger(BentoServiceClient.class.getName());

  static Iterable<Integer> convert(int[] array) {
    return () -> Arrays.stream(array).iterator();
  }

  public static void main(String[] args) throws Exception {
    String apiName = "classify";
    int shape[] = { 1, 4 };
    Iterable<Integer> shapeIterable = convert(shape);
    Float array[] = { 3.5f, 2.4f, 7.8f, 5.1f };
    Iterable<Float> arrayIterable = Arrays.asList(array);
    // Access a service running on the local machine on port 50051
    String target = "localhost:3000";

    ManagedChannel channel = ManagedChannelBuilder.forTarget(target).usePlaintext().build();
    try {
      BentoServiceBlockingStub blockingStub = BentoServiceGrpc.newBlockingStub(channel);

      NDArray.Builder builder = NDArray.newBuilder().addAllShape(shapeIterable).addAllFloatValues(arrayIterable).setDtype(NDArray.DType.DTYPE_FLOAT);

      Request req = Request.newBuilder().setApiName(apiName).setNdarray(builder).build();

      try {
        Response resp = blockingStub.call(req);
        Response.ContentCase contentCase = resp.getContentCase();
        if (contentCase != Response.ContentCase.NDARRAY) {
          throw new Exception("Currently only support NDArray response");
        }
        NDArray output = resp.getNdarray();
        logger.info("Response: " + resp.toString());
      } catch (StatusRuntimeException e) {
        logger.log(Level.WARNING, "RPC failed: {0}", e.getStatus());
        return;
      }
    } finally {
      // ManagedChannels use resources like threads and TCP connections. To prevent
      // leaking these
      // resources the channel should be shut down when it will no longer be used. If
      // it may be used
      // again leave it running.
      channel.shutdownNow().awaitTermination(1, TimeUnit.SECONDS);
    }
  }
}
On running protoc standalone (optional)

Since there is no easy way to add additional proto files, we will have to clone some repositories and copy the proto files into our project:

  1. protocolbuffers/protobuf - the official repository for the Protocol Buffers. We will need protobuf files that lives under src/google/protobuf:

» mkdir -p thirdparty && cd thirdparty
» git clone --depth 1 https://github.com/protocolbuffers/protobuf.git
  1. bentoml/bentoml - We need the service.proto under bentoml/grpc/ to build the client, therefore, we will perform a sparse checkout to only checkout bentoml/grpc directory:

» mkdir bentoml && pushd bentoml
» git init
» git remote add -f origin https://github.com/bentoml/BentoML.git
» git config core.sparseCheckout true
» cat <<EOT >|.git/info/sparse-checkout
src/bentoml/grpc
EOT
» git pull origin main && mv src/bentoml/grpc .
» popd

Here is the protoc command to generate the gRPC Java stubs if you need to use protoc standalone:

» protoc -I . \
         -I ./thirdparty/protobuf/src \
         --java_out=./src/main/java \
         --grpc-java_out=./src/main/java \
         bentoml/grpc/v1/service.proto

Requirements: Make sure to have the prequisites to get started with grpc/grpc-kotlin.

Optional: feel free to install Kotlin gRPC codegen in order to generate gRPC stubs if you plan to use protoc standalone.

To bootstrap the Kotlin client, feel free to use either gradle or maven to build and run the following client code.

In this example, we will use bazel to build and run the client.

We will create our Kotlin client in the directory ~/workspace/iris_kotlin_client/, followed by creating the client directory structure:

» mkdir -p ~/workspace/iris_kotlin_client
» cd ~/workspace/iris_kotlin_client
» mkdir -p src/main/kotlin/com/client

Define a WORKSPACE file:

WORKSPACE
workspace(name = "iris_kotlin_client")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_proto",
    sha256 = "e017528fd1c91c5a33f15493e3a398181a9e821a804eb7ff5acdd1d2d6c2b18d",
    strip_prefix = "rules_proto-4.0.0-3.20.0",
    urls = [
        "https://github.com/bazelbuild/rules_proto/archive/refs/tags/4.0.0-3.20.0.tar.gz",
    ],
)
http_archive(
    name = "rules_proto_grpc",
    sha256 = "507e38c8d95c7efa4f3b1c0595a8e8f139c885cb41a76cab7e20e4e67ae87731",
    strip_prefix = "rules_proto_grpc-4.1.1",
    urls = ["https://github.com/rules-proto-grpc/rules_proto_grpc/archive/4.1.1.tar.gz"],
)

load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")

rules_proto_dependencies()

rules_proto_toolchains()

load("@rules_proto_grpc//:repositories.bzl", "rules_proto_grpc_repos", "rules_proto_grpc_toolchains")

rules_proto_grpc_toolchains()

rules_proto_grpc_repos()

# We will be using 1.48.1 for grpc-java
http_archive(
    name = "io_grpc_grpc_java",
    sha256 = "88b12b2b4e0beb849eddde98d5373f2f932513229dbf9ec86cc8e4912fc75e79",
    strip_prefix = "grpc-java-1.48.1",
    urls = ["https://github.com/grpc/grpc-java/archive/v1.48.1.tar.gz"],
)

http_archive(
    name = "rules_jvm_external",
    sha256 = "c21ce8b8c4ccac87c809c317def87644cdc3a9dd650c74f41698d761c95175f3",
    strip_prefix = "rules_jvm_external-1498ac6ccd3ea9cdb84afed65aa257c57abf3e0a",
    url = "https://github.com/bazelbuild/rules_jvm_external/archive/1498ac6ccd3ea9cdb84afed65aa257c57abf3e0a.zip",
)

load("@rules_jvm_external//:defs.bzl", "maven_install")
load("@io_grpc_grpc_java//:repositories.bzl", "IO_GRPC_GRPC_JAVA_ARTIFACTS", "IO_GRPC_GRPC_JAVA_OVERRIDE_TARGETS", "grpc_java_repositories")

IO_GRPC_GRPC_KOTLIN_ARTIFACTS = [
    "com.google.guava:guava:29.0-android",
    "com.squareup:kotlinpoet:1.11.0",
    "org.jetbrains.kotlinx:kotlinx-coroutines-core:1.6.2",
    "org.jetbrains.kotlinx:kotlinx-coroutines-core-jvm:1.6.2",
    "org.jetbrains.kotlinx:kotlinx-coroutines-debug:1.6.2",
]

maven_install(
    artifacts = [
        "com.google.jimfs:jimfs:1.1",
        "com.google.truth.extensions:truth-proto-extension:1.0.1",
        "com.google.protobuf:protobuf-kotlin:3.18.0",
    ] + IO_GRPC_GRPC_KOTLIN_ARTIFACTS + IO_GRPC_GRPC_JAVA_ARTIFACTS,
    generate_compat_repositories = True,
    override_targets = IO_GRPC_GRPC_JAVA_OVERRIDE_TARGETS,
    repositories = [
        "https://repo.maven.apache.org/maven2/",
    ],
)

load("@maven//:compat.bzl", "compat_repositories")

compat_repositories()

grpc_java_repositories()

# loading kotlin rules
# first to load grpc/grpc-kotlin
http_archive(
    name = "com_github_grpc_grpc_kotlin",
    sha256 = "b1ec1caa5d81f4fa4dca0662f8112711c82d7db6ba89c928ca7baa4de50afbb2",
    strip_prefix = "grpc-kotlin-a1659c1b3fb665e01a6854224c7fdcafc8e54d56",
    urls = ["https://github.com/grpc/grpc-kotlin/archive/a1659c1b3fb665e01a6854224c7fdcafc8e54d56.tar.gz"],
)

http_archive(
    name = "io_bazel_rules_kotlin",
    sha256 = "a57591404423a52bd6b18ebba7979e8cd2243534736c5c94d35c89718ea38f94",
    urls = ["https://github.com/bazelbuild/rules_kotlin/releases/download/v1.6.0/rules_kotlin_release.tgz"],
)

load("@io_bazel_rules_kotlin//kotlin:repositories.bzl", "kotlin_repositories")

kotlin_repositories()

load("@io_bazel_rules_kotlin//kotlin:core.bzl", "kt_register_toolchains")

kt_register_toolchains()

Followed by defining a BUILD file:

BUILD
load("@io_bazel_rules_kotlin//kotlin:jvm.bzl", "kt_jvm_binary")
load("@io_grpc_grpc_java//:java_grpc_library.bzl", "java_grpc_library")
load("@com_github_grpc_grpc_kotlin//:kt_jvm_grpc.bzl", "kt_jvm_grpc_library", "kt_jvm_proto_library")

java_proto_library(
    name = "service_java",
    deps = ["//:service_v1_proto"],
)

kt_jvm_proto_library(
    name = "service_kt",
    deps = ["//:service_v1_proto"],
)

kt_jvm_grpc_library(
    name = "service_grpc_kt",
    srcs = ["//:service_v1_proto"],
    deps = [":service_java"],
)

kt_jvm_binary(
    name = "client_kt",
    srcs = ["src/main/kotlin/com/client/BentoServiceClient.kt"],
    main_class = "com.client.BentoServiceClient",
    deps = [
        ":service_grpc_kt",
        ":service_kt",
        "@com_google_protobuf//:protobuf_java_util",
        "@io_grpc_grpc_java//netty",
    ],
)

One simply can’t manually compile all the Kotlin files, since there are way too many dependencies to be resolved.

Provided below is an example of how one can use gradle to build the Kotlin client.

» gradle init --project-dir .

The following build.gradle.kts should be able to help you get started:

build.gradle.kts#
plugins {
    id("com.android.application") version "7.0.4" apply false // Older for IntelliJ Support
    id("com.google.protobuf") version "0.8.18" apply false
    kotlin("jvm") version "1.7.0" apply false
    id("org.jlleitschuh.gradle.ktlint") version "10.2.0"
    `java-library`
}

ext["grpcVersion"] = "1.48.0"
ext["grpcKotlinVersion"] = "1.6.0" // CURRENT_GRPC_KOTLIN_VERSION
ext["protobufVersion"] = "3.19.4"
ext["coroutinesVersion"] = "1.6.2"

allprojects {
    repositories {
        mavenLocal()
        mavenCentral()
        google()
    }

    apply(plugin = "org.jlleitschuh.gradle.ktlint")
}

java {
    toolchain {
        languageVersion.set(JavaLanguageVersion.of(8))
    }
    sourceSets.getByName("main").resources.srcDir("src/main/proto")
}

dependencies {
    implementation(platform("org.jetbrains.kotlin:kotlin-bom"))

    implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8")

    implementation("com.google.guava:guava:30.1.1-jre")

    runtimeOnly("io.grpc:grpc-netty:${rootProject.ext["grpcVersion"]}")
    api(kotlin("stdlib-jdk8"))
    api("org.jetbrains.kolinx:kotlinx-coroutines-core:${rootProject.ext["coroutinesVersion"]}")
    api("io.grpc:grpc-stub:${rootProject.ext["grpcVersion"]}")
    api("io.grpc:grpc-protobuf:${rootProject.ext["grpcVersion"]}")
    api("com.google.protobuf:protobuf-java-util:${rootProject.ext["protobufVersion"]}")
    api("com.google.protobuf:protobuf-kotlin:${rootProject.ext["protobufVersion"]}")
    api("io.grpc:grpc-kotlin-stub:${rootProject.ext["grpcKotlinVersion"]}")
}

tasks.register<JavaExec>("BentoServiceClient") {
    dependsOn("classes")
    classpath = sourceSets["main"].runtimeClasspath
    mainClass.set("com.client.BentoServiceClientKt")
}

val bentoServiceClientStartScripts = tasks.register<CreateStartScripts>("bentoServiceClientStartScripts") {
    mainClass.set("com.client.BentoServiceClientKt")
    applicationName = "bento-service-client"
    outputDir = tasks.named<CreateStartScripts>("startScripts").get().outputDir
    classpath = tasks.named<CreateStartScripts>("startScripts").get().classpath
}

tasks.named("startScripts") {
    dependsOn(bentoServiceClientStartScripts)
}

To build the client, run:

» ./gradlew build

Proceed to create a src/main/kotlin/com/client/BentoServiceClient.kt file with the following content:

BentoServiceClient.kt#
package com.client

import com.bentoml.grpc.v1.BentoServiceGrpc
import com.bentoml.grpc.v1.NDArray
import com.bentoml.grpc.v1.Request
import io.grpc.ManagedChannelBuilder

class BentoServiceClient {
  companion object {
    @JvmStatic
    fun main(args: Array<String>) {
      val apiName: String = "classify"
      val shape: List<Int> = listOf(1, 4)
      val data: List<Float> = listOf(3.5f, 2.4f, 7.8f, 5.1f)

      val channel = ManagedChannelBuilder.forAddress("localhost", 3000).usePlaintext().build()

      val client = BentoServiceGrpc.newBlockingStub(channel)

      val ndarray = NDArray.newBuilder().addAllShape(shape).addAllFloatValues(data).build()
      val req = Request.newBuilder().setApiName(apiName).setNdarray(ndarray).build()
      try {
        val resp = client.call(req)
        if (!resp.hasNdarray()) {
          println("Currently only support NDArray response.")
        } else {
          println("Response: ${resp.ndarray}")
        }
      } catch (e: Exception) {
        println("Rpc error: ${e.message}")
      }
    }
  }
}
On running protoc standalone (optional)

Since there is no easy way to add additional proto files, we will have to clone some repositories and copy the proto files into our project:

  1. protocolbuffers/protobuf - the official repository for the Protocol Buffers. We will need protobuf files that lives under src/google/protobuf:

» mkdir -p thirdparty && cd thirdparty
» git clone --depth 1 https://github.com/protocolbuffers/protobuf.git
  1. bentoml/bentoml - We need the service.proto under bentoml/grpc/ to build the client, therefore, we will perform a sparse checkout to only checkout bentoml/grpc directory:

» mkdir bentoml && pushd bentoml
» git init
» git remote add -f origin https://github.com/bentoml/BentoML.git
» git config core.sparseCheckout true
» cat <<EOT >|.git/info/sparse-checkout
src/bentoml/grpc
EOT
» git pull origin main && mv src/bentoml/grpc .
» popd

Here is the protoc command to generate the gRPC Kotlin stubs if you need to use protoc standalone:

» protoc -I. -I ./thirdparty/protobuf/src \
         --kotlin_out ./kotlin/src/main/kotlin/ \
         --grpc-kotlin_out ./kotlin/src/main/kotlin \
         --plugin=protoc-gen-grpc-kotlin=$(which protoc-gen-grpc-kotlin) \
         bentoml/grpc/v1/service.proto

Requirements: Make sure to have Node.js installed in your system.

We will create our Node.js client in the directory ~/workspace/iris_node_client/:

» mkdir -p ~/workspace/iris_node_client
» cd ~/workspace/iris_node_client
Initialize the project and use the following package.json:
package.json#
{
  "name": "grpc_client_js",
  "version": "1.0.0",
  "description": "gRPC client in JavaScript",
  "main": "client.js",
  "keywords": [],
  "author": "BentoML Team",
  "license": "Apache-2.0",
  "dependencies": {
    "@grpc/grpc-js": "^1.7.1",
    "google-protobuf": "^3.21.0",
    "grpc-tools": "^1.11.2",
    "ts-protoc-gen": "^0.15.0"
  },
    "scripts": {
      "compile": "npm_config_target_arch=x64 yarn && bash -x ./hack",
      "client": "yarn compile && node client.js"
  }
}

Install the dependencies with either npm or yarn:

» yarn install --add-devs

Note

If you are using M1, you might also have to prepend npm_config_target_arch=x64 to yarn command:

» npm_config_target_arch=x64 yarn install --add-devs

Since there is no easy way to add additional proto files, we will have to clone some repositories and copy the proto files into our project:

  1. protocolbuffers/protobuf - the official repository for the Protocol Buffers. We will need protobuf files that lives under src/google/protobuf:

» mkdir -p thirdparty && cd thirdparty
» git clone --depth 1 https://github.com/protocolbuffers/protobuf.git
  1. bentoml/bentoml - We need the service.proto under bentoml/grpc/ to build the client, therefore, we will perform a sparse checkout to only checkout bentoml/grpc directory:

» mkdir bentoml && pushd bentoml
» git init
» git remote add -f origin https://github.com/bentoml/BentoML.git
» git config core.sparseCheckout true
» cat <<EOT >|.git/info/sparse-checkout
src/bentoml/grpc
EOT
» git pull origin main && mv src/bentoml/grpc .
» popd

Here is the protoc command to generate the gRPC Javascript stubs:

» $(npm bin)/grpc_tools_node_protoc \
         -I . -I ./thirdparty/protobuf/src \
         --js_out=import_style=commonjs,binary:. \
         --grpc_out=grpc_js:js \
         bentoml/grpc/v1/service.proto

Proceed to create a client.js file with the following content:

client.js#
"use strict";
const grpc = require("@grpc/grpc-js");
const pb = require("./bentoml/grpc/v1/service_pb");
const services = require("./bentoml/grpc/v1/service_grpc_pb");

function main() {
  const target = "localhost:3000";
  const client = new services.BentoServiceClient(
    target,
    grpc.credentials.createInsecure()
  );
  var ndarray = new pb.NDArray();
  ndarray
    .setDtype(pb.NDArray.DType.DTYPE_FLOAT)
    .setShapeList([1, 4])
    .setFloatValuesList([3.5, 2.4, 7.8, 5.1]);
  var req = new pb.Request();
  req.setApiName("classify").setNdarray(ndarray);

  client.call(req, function (err, resp) {
    if (err) {
      console.log(err.message);
      if (err.code === grpc.status.INVALID_ARGUMENT) {
        console.log("Invalid argument", resp);
      }
    } else {
      if (resp.getContentCase() != pb.Response.ContentCase.NDARRAY) {
        console.error("Only support NDArray response.");
      }
      console.log("result: ", resp.getNdarray().toObject());
    }
  });
}

main();

Requirements: Make sure to have the prequisites to get started with grpc/grpc-swift.

We will create our Swift client in the directory ~/workspace/iris_swift_client/:

» mkdir -p ~/workspace/iris_swift_client
» cd ~/workspace/iris_swift_client

We will use Swift Package Manager to build and run the client.

» swift package init --type executable
Initialize the project and use the following Package.swift:
Package.swift#
// swift-tools-version: 5.7
// The swift-tools-version declares the minimum version of Swift required to build this package.

import PackageDescription

// To declare other packages that this package depends on.
let packageDependencies: [Package.Dependency] = [
  .package(
    url: "https://github.com/grpc/grpc-swift.git",
    from: "1.10.0"
  ),
  .package(
    url: "https://github.com/apple/swift-nio.git",
    from: "2.41.1"
  ),
  .package(
    url: "https://github.com/apple/swift-protobuf.git",
    from: "1.20.1"
  ),
]

// Defines dependencies for our targets.
extension Target.Dependency {
  static let bentoServiceModel: Self = .target(name: "BentoServiceModel")

  static let grpc: Self = .product(name: "GRPC", package: "grpc-swift")
  static let nio: Self = .product(name: "NIO", package: "swift-nio")
  static let nioCore: Self = .product(name: "NIOCore", package: "swift-nio")
  static let nioPosix: Self = .product(name: "NIOPosix", package: "swift-nio")
  static let protobuf: Self = .product(name: "SwiftProtobuf", package: "swift-protobuf")
}

// Targets are the basic building blocks of a package. A target can define a module or a test suite.
// Targets can depend on other targets in this package, and on products in packages this package depends on.
extension Target {
  static let bentoServiceModel: Target = .target(
    name: "BentoServiceModel",
    dependencies: [
      .grpc,
      .nio,
      .protobuf,
    ],
    path: "Sources/bentoml/grpc/v1alpha1"
  )

  static let bentoServiceClient: Target = .executableTarget(
    name: "BentoServiceClient",
    dependencies: [
      .grpc,
      .bentoServiceModel,
      .nioCore,
      .nioPosix,
    ],
    path: "Sources/BentoServiceClient"
  )
}

let package = Package(
  name: "iris-swift-client",
  dependencies: packageDependencies,
  targets: [.bentoServiceModel, .bentoServiceClient]
)

Since there is no easy way to add additional proto files, we will have to clone some repositories and copy the proto files into our project:

  1. protocolbuffers/protobuf - the official repository for the Protocol Buffers. We will need protobuf files that lives under src/google/protobuf:

» mkdir -p thirdparty && cd thirdparty
» git clone --depth 1 https://github.com/protocolbuffers/protobuf.git
  1. bentoml/bentoml - We need the service.proto under bentoml/grpc/ to build the client, therefore, we will perform a sparse checkout to only checkout bentoml/grpc directory:

» mkdir bentoml && pushd bentoml
» git init
» git remote add -f origin https://github.com/bentoml/BentoML.git
» git config core.sparseCheckout true
» cat <<EOT >|.git/info/sparse-checkout
src/bentoml/grpc
EOT
» git pull origin main && mv src/bentoml/grpc .
» popd

Here is the protoc command to generate the gRPC Swift stubs:

» protoc -I. -I ./thirdparty/protobuf/src \
         --swift_out=Sources --swift_opt=Visibility=Public \
         --grpc-swift_out=Sources --grpc-swift_opt=Visibility=Public \
         --plugin=protoc-gen-grpc-swift=$(which protoc-gen-grpc-swift) \
         bentoml/grpc/v1/service.proto

Proceed to create a Sources/BentoServiceClient/main.swift file with the following content:

main.swift#
#if compiler(>=5.6)
#if BAZEL_BUILD
import swift_BentoServiceModel // internal targets
#else
import BentoServiceModel
#endif
import Foundation
import GRPC
import NIOCore
import NIOPosix
import SwiftProtobuf

// Setup an `EventLoopGroup` for the connection to run on.
//
// See: https://github.com/apple/swift-nio#eventloops-and-eventloopgroups
let group = MultiThreadedEventLoopGroup(numberOfThreads: 1)

var apiName: String = "classify"
var shape: [Int32] = [1, 4]
var data: [Float] = [3.5, 2.4, 7.8, 5.1]

// Make sure the group is shutdown when we're done with it.
defer {
  try! group.syncShutdownGracefully()
}

// Configure the channel, we're not using TLS so the connection is `insecure`.
let channel = try GRPCChannelPool.with(
  target: .host("localhost", port: 3000),
  transportSecurity: .plaintext,
  eventLoopGroup: group
)

// Close the connection when we're done with it.
defer {
  try! channel.close().wait()
}

// Provide the connection to the generated client.
let stubs = Bentoml_Grpc_v1_BentoServiceNIOClient(channel: channel)

// Form the request with the NDArray, if one was provided.
let ndarray: Bentoml_Grpc_v1_NDArray = .with {
  $0.shape = shape
  $0.floatValues = data
  $0.dtype = Bentoml_Grpc_v1_NDArray.DType.float
}

let request: Bentoml_Grpc_v1_Request = .with {
  $0.apiName = apiName
  $0.ndarray = ndarray
}

let call = stubs.call(request)
do {
  let resp = try call.response.wait()
  if let content = resp.content {
    switch content {
    case let .ndarray(ndarray):
      print("Response: \(ndarray)")
    default:
      print("Currently only support NDArray response.")
    }
  }
} catch {
  print("Rpc failed \(try call.status.wait()): \(error)")
}
#else
@main
enum NotAvailable {
  static func main() {
    print("This example requires Swift >= 5.6")
  }
}
#endif // compiler(>=5.6)

Requirements: Make sure to follow the instructions to install grpc via either pecl or from source.

Note

You will also have to symlink the built C++ extension to the PHP extension directory for it to be loaded by PHP.

We will then use bazel, composer to build and run the client.

We will create our PHP client in the directory ~/workspace/iris_php_client/:

» mkdir -p ~/workspace/iris_php_client
» cd ~/workspace/iris_php_client

Create a new PHP package:

» composer init
An example composer.json for the client:
{
    "name": "bentoml/grpc-php-client",
    "description": "BentoML gRPC client for PHP",
    "require": {
        "grpc/grpc": "1.42.0",
        "google/protobuf": "3.19.4"
    },
    "license": "Apache-2.0",
    "autoload": {
        "psr-4": {
            "GBPMetadata\\": "GPBMetadata/",
            "Bentoml\\": "Bentoml"
        }
    },
    "authors": [
        {
            "name": "BentoML Team"
        }
    ]
}

Since there is no easy way to add additional proto files, we will have to clone some repositories and copy the proto files into our project:

  1. protocolbuffers/protobuf - the official repository for the Protocol Buffers. We will need protobuf files that lives under src/google/protobuf:

» mkdir -p thirdparty && cd thirdparty
» git clone --depth 1 https://github.com/protocolbuffers/protobuf.git
  1. bentoml/bentoml - We need the service.proto under bentoml/grpc/ to build the client, therefore, we will perform a sparse checkout to only checkout bentoml/grpc directory:

» mkdir bentoml && pushd bentoml
» git init
» git remote add -f origin https://github.com/bentoml/BentoML.git
» git config core.sparseCheckout true
» cat <<EOT >|.git/info/sparse-checkout
src/bentoml/grpc
EOT
» git pull origin main && mv src/bentoml/grpc .
» popd

Here is the protoc command to generate the gRPC swift stubs:

» protoc -I . -I ./thirdparty/protobuf/src \
         --php_out=. \
         --grpc_out=. \
         --plugin=protoc-gen-grpc=$(which grpc_php_plugin) \
         bentoml/grpc/v1/service.proto

Proceed to create a BentoServiceClient.php file with the following content:

BentoServiceClient.php#
<?php

use Bentoml\Grpc\v1\BentoServiceClient;
use Bentoml\Grpc\v1\NDArray;
use Bentoml\Grpc\v1\Request;

require dirname(__FILE__) . '/vendor/autoload.php';

function call()
{
    $hostname = 'localhost:3000';
    $apiName = "classify";
    $to_parsed = array("3.5", "2.4", "7.8", "5.1");
    $data = array_map("floatval", $to_parsed);
    $shape = array(1, 4);
    $client = new BentoServiceClient($hostname, [
        'credentials' => Grpc\ChannelCredentials::createInsecure(),
    ]);
    $request = new Request();
    $request->setApiName($apiName);
    $payload = new NDArray();
    $payload->setShape($shape);
    $payload->setFloatValues($data);
    $payload->setDtype(\Bentoml\Grpc\v1\NDArray\DType::DTYPE_FLOAT);

    list($response, $status) = $client->Call($request)->wait();
    if ($status->code !== Grpc\STATUS_OK) {
        echo "ERROR: " . $status->code . ", " . $status->details . PHP_EOL;
        exit(1);
    }
    echo $response->getMessage() . PHP_EOL;
}

call();

Todo

Bazel instruction for swift, nodejs, python


Then you can proceed to run the client scripts:

» python -m client
» bazel run //:client_go
» go run ./client.go
» bazel run :client_cc

Refer to grpc/grpc for instructions on using CMake and other similar build tools.

Note

See the instructions on GitHub for working C++ client.

» bazel run :client_java

We will use gradlew to build the client and run it:

» ./gradlew build && \
   ./build/tmp/scripts/bentoServiceClient/bento-service-client

Note

See the instructions on GitHub for working Java client.

» bazel run :client_kt

We will use gradlew to build the client and run it:

» ./gradlew build && \
   ./build/tmp/scripts/bentoServiceClient/bento-service-client

Note

See the instructions on GitHub for working Kotlin client.

» node client.js
» swift run BentoServiceClient
» php -d extension=/path/to/grpc.so -d max_execution_time=300 BentoServiceClient.php
Additional language support for client implementation

Note: Please check out the gRPC Ruby for how to install from source. Check out the examples folder for Ruby client implementation.

Note: Please check out the gRPC .NET examples folder for grpc/grpc-dotnet client implementation.

Note: Please check out the gRPC Dart examples folder for grpc/grpc-dart client implementation.

Note: Currently there are no official gRPC Rust client implementation. Please check out the tikv/grpc-rs as one of the unofficial implementation.

After successfully running the client, proceed to build the bento as usual:

» bentoml build


Containerize your Bento 🍱 with gRPC support#

To containerize the Bento with gRPC features, pass in --enable-features=grpc to bentoml containerize to add additional gRPC dependencies to your Bento

» bentoml containerize iris_classifier:latest --enable-features=grpc

--enable-features allows users to containerize any of the existing Bentos with additional features that BentoML provides without having to rebuild the Bento.

Note

--enable-features accepts a comma-separated list of features or multiple arguments.

After containerization, your Bento container can now be used with gRPC:

» docker run -it --rm \
             -p 3000:3000 -p 3001:3001 \
             iris_classifier:6otbsmxzq6lwbgxi serve-grpc --production

Congratulations! You have successfully served, containerized and tested your BentoService with gRPC.


Using gRPC in BentoML#

We will dive into some of the details of how gRPC is implemented in BentoML.

Protobuf definition#

Let’s take a quick look at protobuf definition of the BentoService:

service BentoService {
  rpc Call(Request) returns (Response) {}
}
Expands for current protobuf definition.
syntax = "proto3";

package bentoml.grpc.v1;

import "google/protobuf/struct.proto";
import "google/protobuf/wrappers.proto";

// cc_enable_arenas pre-allocate memory for given message to improve speed. (C++ only)
option cc_enable_arenas = true;
option go_package = "github.com/bentoml/bentoml/grpc/v1;service";
option java_multiple_files = true;
option java_outer_classname = "ServiceProto";
option java_package = "com.bentoml.grpc.v1";
option objc_class_prefix = "SVC";
option py_generic_services = true;

// a gRPC BentoServer.
service BentoService {
  // Call handles methodcaller of given API entrypoint.
  rpc Call(Request) returns (Response) {}
}

// Request message for incoming Call.
message Request {
  // api_name defines the API entrypoint to call.
  // api_name is the name of the function defined in bentoml.Service.
  // Example:
  //
  //     @svc.api(input=NumpyNdarray(), output=File())
  //     def predict(input: NDArray[float]) -> bytes:
  //         ...
  //
  //     api_name is "predict" in this case.
  string api_name = 1;

  oneof content {
    // NDArray represents a n-dimensional array of arbitrary type.
    NDArray ndarray = 3;

    // DataFrame represents any tabular data type. We are using
    // DataFrame as a trivial representation for tabular type.
    DataFrame dataframe = 5;

    // Series portrays a series of values. This can be used for
    // representing Series types in tabular data.
    Series series = 6;

    // File represents for any arbitrary file type. This can be
    // plaintext, image, video, audio, etc.
    File file = 7;

    // Text represents a string inputs.
    google.protobuf.StringValue text = 8;

    // JSON is represented by using google.protobuf.Value.
    // see https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto
    google.protobuf.Value json = 9;

    // Multipart represents a multipart message.
    // It comprises of a mapping from given type name to a subset of aforementioned types.
    Multipart multipart = 10;

    // serialized_bytes is for data serialized in BentoML's internal serialization format.
    bytes serialized_bytes = 2;
  }

  // Tensor is similiar to ndarray but with a name
  // We are reserving it for now for future use.
  // repeated Tensor tensors = 4;
  reserved 4, 11 to 13;
}

// Request message for incoming Call.
message Response {
  oneof content {
    // NDArray represents a n-dimensional array of arbitrary type.
    NDArray ndarray = 1;

    // DataFrame represents any tabular data type. We are using
    // DataFrame as a trivial representation for tabular type.
    DataFrame dataframe = 3;

    // Series portrays a series of values. This can be used for
    // representing Series types in tabular data.
    Series series = 5;

    // File represents for any arbitrary file type. This can be
    // plaintext, image, video, audio, etc.
    File file = 6;

    // Text represents a string inputs.
    google.protobuf.StringValue text = 7;

    // JSON is represented by using google.protobuf.Value.
    // see https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto
    google.protobuf.Value json = 8;

    // Multipart represents a multipart message.
    // It comprises of a mapping from given type name to a subset of aforementioned types.
    Multipart multipart = 9;

    // serialized_bytes is for data serialized in BentoML's internal serialization format.
    bytes serialized_bytes = 2;
  }
  // Tensor is similiar to ndarray but with a name
  // We are reserving it for now for future use.
  // repeated Tensor tensors = 4;
  reserved 4, 10 to 13;
}

// Part represents possible value types for multipart message.
// These are the same as the types in Request message.
message Part {
  oneof representation {
    // NDArray represents a n-dimensional array of arbitrary type.
    NDArray ndarray = 1;

    // DataFrame represents any tabular data type. We are using
    // DataFrame as a trivial representation for tabular type.
    DataFrame dataframe = 3;

    // Series portrays a series of values. This can be used for
    // representing Series types in tabular data.
    Series series = 5;

    // File represents for any arbitrary file type. This can be
    // plaintext, image, video, audio, etc.
    File file = 6;

    // Text represents a string inputs.
    google.protobuf.StringValue text = 7;

    // JSON is represented by using google.protobuf.Value.
    // see https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto
    google.protobuf.Value json = 8;

    // serialized_bytes is for data serialized in BentoML's internal serialization format.
    bytes serialized_bytes = 4;
  }

  // Tensor is similiar to ndarray but with a name
  // We are reserving it for now for future use.
  // Tensor tensors = 4;
  reserved 2, 9 to 13;
}

// Multipart represents a multipart message.
// It comprises of a mapping from given type name to a subset of aforementioned types.
message Multipart {
  map<string, Part> fields = 1;
}

// File represents for any arbitrary file type. This can be
// plaintext, image, video, audio, etc.
message File {
  // optional file type, let it be csv, text, parquet, etc.
  // v1alpha1 uses 1 as FileType enum.
  optional string kind = 3;
  // contents of file as bytes.
  bytes content = 2;
}

// DataFrame represents any tabular data type. We are using
// DataFrame as a trivial representation for tabular type.
// This message carries given implementation of tabular data based on given orientation.
// TODO: support index, records, etc.
message DataFrame {
  // columns name
  repeated string column_names = 1;

  // columns orient.
  // { column ↠ { index ↠ value } }
  repeated Series columns = 2;
}

// Series portrays a series of values. This can be used for
// representing Series types in tabular data.
message Series {
  // A bool parameter value
  repeated bool bool_values = 1 [packed = true];

  // A float parameter value
  repeated float float_values = 2 [packed = true];

  // A int32 parameter value
  repeated int32 int32_values = 3 [packed = true];

  // A int64 parameter value
  repeated int64 int64_values = 6 [packed = true];

  // A string parameter value
  repeated string string_values = 5;

  // represents a double parameter value.
  repeated double double_values = 4 [packed = true];
}

// NDArray represents a n-dimensional array of arbitrary type.
message NDArray {
  // Represents data type of a given array.
  enum DType {
    // Represents a None type.
    DTYPE_UNSPECIFIED = 0;

    // Represents an float type.
    DTYPE_FLOAT = 1;

    // Represents an double type.
    DTYPE_DOUBLE = 2;

    // Represents a bool type.
    DTYPE_BOOL = 3;

    // Represents an int32 type.
    DTYPE_INT32 = 4;

    // Represents an int64 type.
    DTYPE_INT64 = 5;

    // Represents a uint32 type.
    DTYPE_UINT32 = 6;

    // Represents a uint64 type.
    DTYPE_UINT64 = 7;

    // Represents a string type.
    DTYPE_STRING = 8;
  }

  // DTYPE is the data type of given array
  DType dtype = 1;

  // shape is the shape of given array.
  repeated int32 shape = 2;

  // represents a string parameter value.
  repeated string string_values = 5;

  // represents a float parameter value.
  repeated float float_values = 3 [packed = true];

  // represents a double parameter value.
  repeated double double_values = 4 [packed = true];

  // represents a bool parameter value.
  repeated bool bool_values = 6 [packed = true];

  // represents a int32 parameter value.
  repeated int32 int32_values = 7 [packed = true];

  // represents a int64 parameter value.
  repeated int64 int64_values = 8 [packed = true];

  // represents a uint32 parameter value.
  repeated uint32 uint32_values = 9 [packed = true];

  // represents a uint64 parameter value.
  repeated uint64 uint64_values = 10 [packed = true];
}
syntax = "proto3";

package bentoml.grpc.v1alpha1;

import "google/protobuf/struct.proto";
import "google/protobuf/wrappers.proto";

// cc_enable_arenas pre-allocate memory for given message to improve speed. (C++ only)
option cc_enable_arenas = true;
option go_package = "github.com/bentoml/bentoml/grpc/v1alpha1;service";
option java_multiple_files = true;
option java_outer_classname = "ServiceProto";
option java_package = "com.bentoml.grpc.v1alpha1";
option objc_class_prefix = "SVC";
option py_generic_services = true;

// a gRPC BentoServer.
service BentoService {
  // Call handles methodcaller of given API entrypoint.
  rpc Call(Request) returns (Response) {}
}

// Request message for incoming Call.
message Request {
  // api_name defines the API entrypoint to call.
  // api_name is the name of the function defined in bentoml.Service.
  // Example:
  //
  //     @svc.api(input=NumpyNdarray(), output=File())
  //     def predict(input: NDArray[float]) -> bytes:
  //         ...
  //
  //     api_name is "predict" in this case.
  string api_name = 1;

  oneof content {
    // NDArray represents a n-dimensional array of arbitrary type.
    NDArray ndarray = 3;

    // DataFrame represents any tabular data type. We are using
    // DataFrame as a trivial representation for tabular type.
    DataFrame dataframe = 5;

    // Series portrays a series of values. This can be used for
    // representing Series types in tabular data.
    Series series = 6;

    // File represents for any arbitrary file type. This can be
    // plaintext, image, video, audio, etc.
    File file = 7;

    // Text represents a string inputs.
    google.protobuf.StringValue text = 8;

    // JSON is represented by using google.protobuf.Value.
    // see https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto
    google.protobuf.Value json = 9;

    // Multipart represents a multipart message.
    // It comprises of a mapping from given type name to a subset of aforementioned types.
    Multipart multipart = 10;

    // serialized_bytes is for data serialized in BentoML's internal serialization format.
    bytes serialized_bytes = 2;
  }

  // Tensor is similiar to ndarray but with a name
  // We are reserving it for now for future use.
  // repeated Tensor tensors = 4;
  reserved 4, 11 to 13;
}

// Request message for incoming Call.
message Response {
  oneof content {
    // NDArray represents a n-dimensional array of arbitrary type.
    NDArray ndarray = 1;

    // DataFrame represents any tabular data type. We are using
    // DataFrame as a trivial representation for tabular type.
    DataFrame dataframe = 3;

    // Series portrays a series of values. This can be used for
    // representing Series types in tabular data.
    Series series = 5;

    // File represents for any arbitrary file type. This can be
    // plaintext, image, video, audio, etc.
    File file = 6;

    // Text represents a string inputs.
    google.protobuf.StringValue text = 7;

    // JSON is represented by using google.protobuf.Value.
    // see https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto
    google.protobuf.Value json = 8;

    // Multipart represents a multipart message.
    // It comprises of a mapping from given type name to a subset of aforementioned types.
    Multipart multipart = 9;

    // serialized_bytes is for data serialized in BentoML's internal serialization format.
    bytes serialized_bytes = 2;
  }
  // Tensor is similiar to ndarray but with a name
  // We are reserving it for now for future use.
  // repeated Tensor tensors = 4;
  reserved 4, 10 to 13;
}

// Part represents possible value types for multipart message.
// These are the same as the types in Request message.
message Part {
  oneof representation {
    // NDArray represents a n-dimensional array of arbitrary type.
    NDArray ndarray = 1;

    // DataFrame represents any tabular data type. We are using
    // DataFrame as a trivial representation for tabular type.
    DataFrame dataframe = 3;

    // Series portrays a series of values. This can be used for
    // representing Series types in tabular data.
    Series series = 5;

    // File represents for any arbitrary file type. This can be
    // plaintext, image, video, audio, etc.
    File file = 6;

    // Text represents a string inputs.
    google.protobuf.StringValue text = 7;

    // JSON is represented by using google.protobuf.Value.
    // see https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto
    google.protobuf.Value json = 8;

    // serialized_bytes is for data serialized in BentoML's internal serialization format.
    bytes serialized_bytes = 4;
  }

  // Tensor is similiar to ndarray but with a name
  // We are reserving it for now for future use.
  // Tensor tensors = 4;
  reserved 2, 9 to 13;
}

// Multipart represents a multipart message.
// It comprises of a mapping from given type name to a subset of aforementioned types.
message Multipart {
  map<string, Part> fields = 1;
}

// File represents for any arbitrary file type. This can be
// plaintext, image, video, audio, etc.
message File {
  // FileType represents possible file type to be handled by BentoML.
  // Currently, we only support plaintext (Text()), image (Image()), and file (File()).
  // TODO: support audio and video streaming file types.
  enum FileType {
    FILE_TYPE_UNSPECIFIED = 0;

    // file types
    FILE_TYPE_CSV = 1;
    FILE_TYPE_PLAINTEXT = 2;
    FILE_TYPE_JSON = 3;
    FILE_TYPE_BYTES = 4;
    FILE_TYPE_PDF = 5;

    // image types
    FILE_TYPE_PNG = 6;
    FILE_TYPE_JPEG = 7;
    FILE_TYPE_GIF = 8;
    FILE_TYPE_BMP = 9;
    FILE_TYPE_TIFF = 10;
    FILE_TYPE_WEBP = 11;
    FILE_TYPE_SVG = 12;
  }

  // optional type of file, let it be csv, text, parquet, etc.
  optional FileType kind = 1;

  // contents of file as bytes.
  bytes content = 2;
}

// DataFrame represents any tabular data type. We are using
// DataFrame as a trivial representation for tabular type.
// This message carries given implementation of tabular data based on given orientation.
// TODO: support index, records, etc.
message DataFrame {
  // columns name
  repeated string column_names = 1;

  // columns orient.
  // { column ↠ { index ↠ value } }
  repeated Series columns = 2;
}

// Series portrays a series of values. This can be used for
// representing Series types in tabular data.
message Series {
  // A bool parameter value
  repeated bool bool_values = 1 [packed = true];

  // A float parameter value
  repeated float float_values = 2 [packed = true];

  // A int32 parameter value
  repeated int32 int32_values = 3 [packed = true];

  // A int64 parameter value
  repeated int64 int64_values = 6 [packed = true];

  // A string parameter value
  repeated string string_values = 5;

  // represents a double parameter value.
  repeated double double_values = 4 [packed = true];
}

// NDArray represents a n-dimensional array of arbitrary type.
message NDArray {
  // Represents data type of a given array.
  enum DType {
    // Represents a None type.
    DTYPE_UNSPECIFIED = 0;

    // Represents an float type.
    DTYPE_FLOAT = 1;

    // Represents an double type.
    DTYPE_DOUBLE = 2;

    // Represents a bool type.
    DTYPE_BOOL = 3;

    // Represents an int32 type.
    DTYPE_INT32 = 4;

    // Represents an int64 type.
    DTYPE_INT64 = 5;

    // Represents a uint32 type.
    DTYPE_UINT32 = 6;

    // Represents a uint64 type.
    DTYPE_UINT64 = 7;

    // Represents a string type.
    DTYPE_STRING = 8;
  }

  // DTYPE is the data type of given array
  DType dtype = 1;

  // shape is the shape of given array.
  repeated int32 shape = 2;

  // represents a string parameter value.
  repeated string string_values = 5;

  // represents a float parameter value.
  repeated float float_values = 3 [packed = true];

  // represents a double parameter value.
  repeated double double_values = 4 [packed = true];

  // represents a bool parameter value.
  repeated bool bool_values = 6 [packed = true];

  // represents a int32 parameter value.
  repeated int32 int32_values = 7 [packed = true];

  // represents a int64 parameter value.
  repeated int64 int64_values = 8 [packed = true];

  // represents a uint32 parameter value.
  repeated uint32 uint32_values = 9 [packed = true];

  // represents a uint64 parameter value.
  repeated uint64 uint64_values = 10 [packed = true];
}

As you can see, BentoService defines a simple rpc Call that sends a Request message and returns a Response message.

A Request message takes in:

  • api_name: the name of the API function defined inside your BentoService.

  • oneof content: the field can be one of the following types:

Note

Series is currently not yet supported.

The Response message will then return one of the aforementioned types as result.


Example: In the quickstart guide, we defined a classify API that takes in a bentoml.io.NumpyNdarray.

Therefore, our Request message would have the following structure:

from bentoml.grpc.v1 import service_pb2 as pb

req = pb.Request(
    api_name="classify",
    ndarray=pb.NDArray(
        dtype=pb.NDArray.DTYPE_FLOAT, shape=(1, 4), float_values=[5.9, 3, 5.1, 1.8]
    ),
)
package main

import (
	pb "github.com/bentoml/bentoml/grpc/v1"
)

var req = &pb.Request{
	ApiName: "classify",
	Content: &pb.Request_Ndarray{
		Ndarray: &pb.NDArray{
			Dtype:       *pb.NDArray_DTYPE_FLOAT.Enum(),
			Shape:       []int32{1, 4},
			FloatValues: []float32{3.5, 2.4, 7.8, 5.1},
		},
	},
}
#include "bentoml/grpc/v1/service.pb.h"

using bentoml::grpc::v1::BentoService;
using bentoml::grpc::v1::NDArray;
using bentoml::grpc::v1::Request;

std::vector<float> data = {3.5, 2.4, 7.8, 5.1};
std::vector<int> shape = {1, 4};

Request request;
request.set_api_name("classify");

NDArray *ndarray = request.mutable_ndarray();
ndarray->mutable_shape()->Assign(shape.begin(), shape.end());
ndarray->mutable_float_values()->Assign(data.begin(), data.end());
import java.util.*;

int shape[] = { 1, 4 };
Iterable<Integer> shapeIterable = convert(shape);
Float array[] = { 3.5f, 2.4f, 7.8f, 5.1f };
Iterable<Float> arrayIterable = Arrays.asList(array);

NDArray.Builder builder = NDArray.newBuilder().addAllShape(shapeIterable).addAllFloatValues(arrayIterable).setDtype(NDArray.DType.DTYPE_FLOAT);

Request req = Request.newBuilder().setApiName(apiName).setNdarray(builder).build();
val shape: List<Int> = listOf(1, 4)
val data: List<Float> = listOf(3.5f, 2.4f, 7.8f, 5.1f)

val ndarray = NDArray.newBuilder().addAllShape(shape).addAllFloatValues(data).build()
val req = Request.newBuilder().setApiName(apiName).setNdarray(ndarray).build()
const pb = require("./bentoml/grpc/v1/service_pb");

var ndarray = new pb.NDArray();
ndarray
  .setDtype(pb.NDArray.DType.DTYPE_FLOAT)
  .setShapeList([1, 4])
  .setFloatValuesList([3.5, 2.4, 7.8, 5.1]);
var req = new pb.Request();
req.setApiName("classify").setNdarray(ndarray);
import BentoServiceModel

var shape: [Int32] = [1, 4]
var data: [Float] = [3.5, 2.4, 7.8, 5.1]

let ndarray: Bentoml_Grpc_v1_NDArray = .with {
  $0.shape = shape
  $0.floatValues = data
  $0.dtype = Bentoml_Grpc_v1_NDArray.DType.float
}

let request: Bentoml_Grpc_v1_Request = .with {
  $0.apiName = apiName
  $0.ndarray = ndarray
}

Array representation via NDArray#

Description: NDArray represents a flattened n-dimensional array of arbitrary type. It accepts the following fields:

  • dtype

    The data type of given input. This is a Enum field that provides 1-1 mapping with Protobuf data types to NumPy data types:

    pb.NDArray.DType

    numpy.dtype

    Enum value

    DTYPE_UNSPECIFIED

    None

    0

    DTYPE_FLOAT

    np.float

    1

    DTYPE_DOUBLE

    np.double

    2

    DTYPE_BOOL

    np.bool_

    3

    DTYPE_INT32

    np.int32

    4

    DTYPE_INT64

    np.int64

    5

    DTYPE_UINT32

    np.uint32

    6

    DTYPE_UINT64

    np.uint64

    7

    DTYPE_STRING

    np.str_

    8

  • shape

    A list of int32 that represents the shape of the flattened array. the bentoml.io.NumpyNdarray will then reshape the given payload into expected shape.

    Note that this value will always takes precendence over the shape field in the bentoml.io.NumpyNdarray descriptor, meaning the array will be reshaped to this value first if given. Refer to bentoml.io.NumpyNdarray.from_proto() for implementation details.

  • string_values, float_values, double_values, bool_values, int32_values, int64_values, uint32_values, unit64_values

    Each of the fields is a list of the corresponding data type. The list is a flattened array, and will be reconstructed alongside with shape field to the original payload.

    Per request sent, one message should only contain ONE of the aforementioned fields.

    The interaction among the above fields and dtype are as follows:

    • if dtype is not present in the message:
      • All of the fields are empty, then we return a np.empty.

      • We will loop through all of the provided fields, and only allows one field per message.

        If here are more than one field (i.e. string_values and float_values), then we will raise an error, as we don’t know how to deserialize the data.

    • otherwise:
      • We will use the provided dtype-to-field map to get the data from the given message.

      DType

      field

      DTYPE_BOOL

      bool_values

      DTYPE_DOUBLE

      double_values

      DTYPE_FLOAT

      float_values

      DTYPE_INT32

      int32_values

      DTYPE_INT64

      int64_values

      DTYPE_STRING

      string_values

      DTYPE_UINT32

      uint32_values

      DTYPE_UINT64

      uint64_values

    For example, if dtype is DTYPE_FLOAT, then the payload expects to have float_values field.

Python API
NumpyNdarray.from_sample(
   np.array([[5.4, 3.4, 1.5, 0.4]])
)
pb.NDArray
ndarray {
  dtype: DTYPE_FLOAT
  shape: 1
  shape: 4
  float_values: 5.4
  float_values: 3.4
  float_values: 1.5
  float_values: 0.4
}

API reference: bentoml.io.NumpyNdarray.from_proto()


Tabular data representation via DataFrame#

Description: DataFrame represents any tabular data type. Currently we only support the columns orientation since it is best for preserving the input order.

It accepts the following fields:

  • column_names

    A list of string that represents the column names of the given tabular data.

  • column_values

    A list of Series where Series represents a series of arbitrary data type. The allowed fields for Series as similar to the ones in NDArray:

    • one of [string_values, float_values, double_values, bool_values, int32_values, int64_values, uint32_values, unit64_values]

Python API
PandasDataFrame.from_sample(
    pd.DataFrame({
      "age": [3, 29],
      "height": [94, 170],
      "weight": [31, 115]
    }),
    orient="columns",
)
pb.DataFrame
dataframe {
  column_names: "age"
  column_names: "height"
  column_names: "weight"
  columns {
    int32_values: 3
    int32_values: 29
  }
  columns {
    int32_values: 40
    int32_values: 190
  }
  columns {
    int32_values: 140
    int32_values: 178
  }
}

API reference: bentoml.io.PandasDataFrame.from_proto()

Series representation via Series#

Description: Series portrays a series of values. This can be used for representing Series types in tabular data.

It accepts the following fields:

  • string_values, float_values, double_values, bool_values, int32_values, int64_values

    Similar to NumpyNdarray, each of the fields is a list of the corresponding data type. The list is a 1-D array, and will be then pass to pd.Series.

    Each request should only contain ONE of the aforementioned fields.

    The interaction among the above fields and dtype from PandasSeries are as follows:

    • if dtype is not present in the descriptor:
      • All of the fields are empty, then we return an empty pd.Series.

      • We will loop through all of the provided fields, and only allows one field per message.

        If here are more than one field (i.e. string_values and float_values), then we will raise an error, as we don’t know how to deserialize the data.

    • otherwise:
      • We will use the provided dtype-to-field map to get the data from the given message.

Python API
PandasSeries.from_sample([5.4, 3.4, 1.5, 0.4])
pb.Series
series {
  float_values: 5.4
  float_values: 3.4
  float_values: 1.5
  float_values: 0.4
}

API reference: bentoml.io.PandasSeries.from_proto()


File-like object via File#

Description: File represents any arbitrary file type. this can be used to send in any file type, including images, videos, audio, etc.

Note

Currently both bentoml.io.File and bentoml.io.Image are using pb.File

It accepts the following fields:

  • content

    A bytes field that represents the content of the file.

  • kind

    An optional string field that represents the file type. If specified, it will raise an error if mime_type specified in bentoml.io.File is not matched.

Python API
Image(mime_type="application/pdf")
pb.File
file {
  kind: "application/pdf"
  content: <bytes>
}

bentoml.io.Image will also be using pb.File.

Python API
File(mime_type="image/png")
pb.File
file {
  kind: "image/png"
  content: <bytes>
}

Complex payload via Multipart#

Description: Multipart represents a complex payload that can contain multiple different fields. It takes a fields, which is a dictionary of input name to its coresponding bentoml.io.IODescriptor

Python API
Multipart(
   meta=Text(),
   arr=NumpyNdarray(
      dtype=np.float16,
      shape=[2,2]
   )
)
pb.Multipart
multipart {
   fields {
      key: "arr"
      value {
         ndarray {
         dtype: DTYPE_FLOAT
         shape: 2
         shape: 2
         float_values: 1.0
         float_values: 2.0
         float_values: 3.0
         float_values: 4.0
         }
      }
   }
   fields {
      key: "meta"
      value {
         text {
         value: "nlp"
         }
      }
   }
}

API reference: bentoml.io.Multipart.from_proto()

Compact data format via serialized_bytes#

The serialized_bytes field in both Request and Response is reserved for pre-established protocol encoding between client and server.

BentoML leverages the field to improve serialization performance between BentoML client and server. Thus the field is not recommended for use directly.

Mounting Servicer#

gRPC service multiplexing enables us to mount additional custom servicers alongside with BentoService, and serve them under the same port.

service.py#
import route_guide_pb2
import route_guide_pb2_grpc
from servicer_impl import RouteGuideServicer

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

services_name = [
    v.full_name for v in route_guide_pb2.DESCRIPTOR.services_by_name.values()
]
svc.mount_grpc_servicer(
    RouteGuideServicer,
    add_servicer_fn=add_RouteGuideServicer_to_server,
    service_names=services_name,
)

Serve your service with bentoml serve-grpc command:

» bentoml serve-grpc service.py:svc --reload --enable-reflection

Now your RouteGuide service can also be accessed through localhost:3000.

Note

service_names is REQUIRED here, as this will be used for server reflection when --enable-reflection is passed to bentoml serve-grpc.

Mounting gRPC Interceptors#

Inteceptors are a component of gRPC that allows us to intercept and interact with the proto message and service context either before - or after - the actual RPC call was sent/received by client/server.

Interceptors to gRPC is what middleware is to HTTP. The most common use-case for interceptors are authentication, tracing, access logs, and more.

BentoML comes with a sets of built-in async interceptors to provide support for access logs, OpenTelemetry, and Prometheus.

The following diagrams demonstrates the flow of a gRPC request from client to server:

Interceptor Flow

Since interceptors are executed in the order they are added, users interceptors will be executed after the built-in interceptors.

Users interceptors shouldn’t modify the existing headers and data of the incoming Request.

BentoML currently only support async interceptors (via grpc.aio.ServerInterceptor, as opposed to grpc.ServerInterceptor). This is because BentoML gRPC server is an async implementation of gRPC server.

Note

If you are using grpc.ServerInterceptor, you will need to migrate it over to use the new grpc.aio.ServerInterceptor in order to use this feature.

Feel free to reach out to us at #support on Slack

A toy implementation AppendMetadataInterceptor
metadata_interceptor.py#
from __future__ import annotations

import typing as t
import functools
import dataclasses
from typing import TYPE_CHECKING

from grpc import aio

if TYPE_CHECKING:
    from bentoml.grpc.types import Request
    from bentoml.grpc.types import Response
    from bentoml.grpc.types import RpcMethodHandler
    from bentoml.grpc.types import AsyncHandlerMethod
    from bentoml.grpc.types import HandlerCallDetails
    from bentoml.grpc.types import BentoServicerContext


@dataclasses.dataclass
class Context:
    usage: str
    accuracy_score: float


class AppendMetadataInterceptor(aio.ServerInterceptor):
     def __init__(self, *, usage: str, accuracy_score: float) -> None:
         self.context = Context(usage=usage, accuracy_score=accuracy_score)
         self._record: set[str] = set()

     async def intercept_service(
         self,
         continuation: t.Callable[[HandlerCallDetails], t.Awaitable[RpcMethodHandler]],
         handler_call_details: HandlerCallDetails,
     ) -> RpcMethodHandler:
         from bentoml.grpc.utils import wrap_rpc_handler

         handler = await continuation(handler_call_details)

         if handler and (handler.response_streaming or handler.request_streaming):
             return handler

         def wrapper(behaviour: AsyncHandlerMethod[Response]):
             @functools.wraps(behaviour)
             async def new_behaviour(
                request: Request, context: BentoServicerContext
             ) -> Response | t.Awaitable[Response]:
                 self._record.update(
                   {f"{self.context.usage}:{self.context.accuracy_score}"}
                 )
                 resp = await behaviour(request, context)
                 context.set_trailing_metadata(
                    tuple(
                          [
                             (k, str(v).encode("utf-8"))
                             for k, v in dataclasses.asdict(self.context).items()
                          ]
                    )
                 )
                 return resp

             return new_behaviour

         return wrap_rpc_handler(wrapper, handler)

To add your intercptors to existing BentoService, use svc.add_grpc_interceptor:

service.py#
from custom_interceptor import CustomInterceptor

svc.add_grpc_interceptor(CustomInterceptor)

Note

add_grpc_interceptor also supports partial class as well as multiple arguments interceptors:

from metadata_interceptor import AppendMetadataInterceptor

svc.add_grpc_interceptor(AppendMetadataInterceptor, usage="NLP", accuracy_score=0.867)
from functools import partial

from metadata_interceptor import AppendMetadataInterceptor

svc.add_grpc_interceptor(partial(AppendMetadataInterceptor, usage="NLP", accuracy_score=0.867))

Recommendations#

gRPC is designed to be high performance framework for inter-service communications. This means that it is a perfect fit for building microservices. The following are some recommendation we have for using gRPC for model serving:


Demystifying the misconception of gRPC vs. REST#

You might stumble upon articles comparing gRPC to REST, and you might get the impression that gRPC is a better choice than REST when building services. This is not entirely true.

gRPC is built on top of HTTP/2, and it addresses some of the shortcomings of HTTP/1.1, such as head-of-line blocking, and HTTP pipelining. However, gRPC is not a replacement for REST, and indeed it is not a replacement for model serving. gRPC comes with its own set of trade-offs, such as:

  • Limited browser support: It is impossible to call a gRPC service directly from any browser. You will end up using tools such as gRPCUI in order to interact with your service, or having to go through the hassle of implementing a gRPC client in your language of choice.

  • Binary protocol format: While Protobuf is efficient to send and receive over the wire, it is not human-readable. This means additional toolin for debugging and analyzing protobuf messages are required.

  • Knowledge gap: gRPC comes with its own concepts and learning curve, which requires teams to invest time in filling those knowledge gap to be effectively use gRPC. This often leads to a lot of friction and sometimes increase friction to the development agility.

  • Lack of support for additional content types: gRPC depends on protobuf, its content type are restrictive, in comparison to out-of-the-box support from HTTP+REST.

See also

gRPC on HTTP/2 dives into how gRPC is built on top of HTTP/2, and this article goes into more details on how HTTP/2 address the problem from HTTP/1.1

For HTTP/2 specification, see RFC 7540.


Should I use gRPC instead of REST for model serving?#

Yes and no.

If your organization is already using gRPC for inter-service communications, using your Bento with gRPC is a no-brainer. You will be able to seemlessly integrate your Bento with your existing gRPC services without having to worry about the overhead of implementing grpc-gateway.

However, if your organization is not using gRPC, we recommend to keep using REST for model serving. This is because REST is a well-known and well-understood protocol, meaning there is no knowledge gap for your team, which will increase developer agility, and faster go-to-market strategy.


Performance tuning#

BentoML allows user to tune the performance of gRPC via bentoml_configuration.yaml via api_server.grpc.

A quick overview of the available configuration for gRPC:

bentoml_configuration.yaml#
api_server:
  grpc:
    host: 0.0.0.0
    port: 3000
    max_concurrent_streams: ~
    maximum_concurrent_rpcs: ~
    max_message_length: -1
    reflection:
      enabled: false
    metrics:
      host: 0.0.0.0
      port: 3001


max_concurrent_streams#

Definition: Maximum number of concurrent incoming streams to allow on a HTTP2 connection.

By default we don’t set a limit cap. HTTP/2 connections typically has limit of maximum concurrent streams on a connection at one time.

Some notes about fine-tuning max_concurrent_streams

Note that a gRPC channel uses a single HTTP/2 connection, and concurrent calls are multiplexed on said connection. When the number of active calls reaches the connection stream limit, any additional calls are queued to the client. Queued calls then wait for active calls to complete before being sent. This means that application will higher load and long running streams could see a performance degradation caused by queuing because of the limit.

Setting a limit cap on the number of concurrent streams will prevent this from happening, but it also means that you need to tune the limit cap to the right number.

  • If the limit cap is too low, you will sooner or later running into the issue mentioned above.

  • Not setting a limit cap are also NOT RECOMMENDED. Too many streams on a single HTTP/2 connection introduces thread contention between streams trying to write to the connection, packet loss which causes all call to be blocked.

Remarks: We recommend you to play around with the limit cap, starting with 100, and increase if needed.


maximum_concurrent_rpcs#

Definition: The maximum number of concurrent RPCs this server will service before returning RESOURCE_EXHAUSTED status.

By default we set to None to indicate no limit, and let gRPC to decide the limit.


max_message_length#

Definition: The maximum message length in bytes allowed to be received on/can be send to the server.

By default we set to -1 to indicate no limit. Message size limits via this options is a way to prevent gRPC from consuming excessive resources. By default, gRPC uses per-message limits to manage inbound and outbound message.

Some notes about fine-tuning max_message_length

This options sets two values: grpc.max_receive_message_length and grpc.max_send_message_length.

#define GRPC_ARG_MAX_RECEIVE_MESSAGE_LENGTH "grpc.max_receive_message_length"

#define GRPC_ARG_MAX_SEND_MESSAGE_LENGTH "grpc.max_send_message_length"

By default, gRPC sets incoming message to be 4MB, and no limit on outgoing message. We recommend you to only set this option if you want to limit the size of outcoming message. Otherwise, you should let gRPC to determine the limit.

We recommend you to also check out gRPC performance best practice to learn about best practice for gRPC.