Custom Load Balancing for gRPC applications with Envoy xDS API

Load Balancing Strategies

Before starting to discuss any problem/solution, a quick intro about load balancing strategies would be useful.

Today, any commonly used load balancing solution either uses proxy load balancing or client side load balancing. Proxy load balancing might be the most commonly used option, client hits to a proxy which knows addresses for every backend server and according to an algorithm chooses one of them and routes client request. You can think of Nginx, HAProxy or Envoy. The diagram for it would be similar to following:

                                                    | Backend    |
                                                  --| Server - I |
                                              ---/  +------------+
+------------+            +------------+  ---/
|   Client   |----------- | Proxy LB   |-/
|            |            |            | -\
+------------+            +------------+   ---\
                                               --\  +------------+
                                                  --| Backend    |
                                                    | Server - II|

Client side load balancing is a little bit different concept compared to proxy load balancing. In this option, all the clients know addresses of backend servers and they choose one of them themselves and connects directly. This options eliminates the proxy host, so it is faster since it has less hops also it is safer because it eliminates the single point of failure. But this is much harder to implement compared to proxy load balancing. The diagram for this one would look like:

+------------+            +------------+
| Client - I |------------| Backend    |
|            |         -- | Server - I |
+------------+       -/   +------------+
+------------+  -/        +------------+
| Client - II|-/          | Backend    |
|            |         -- | Server - II|
+------------+       -/   +------------+
+------------+  -/        +------------+
|Client - III|-/          |Backend     |
|            |            |Server - III|
+------------+            +------------+

There are multiple options available to implement client side load balancing such as thick client which means every client has an embedded list of server addreses, or look aside load balancing which means clients should do a request to an external tool to get the list of available servers, then choose and connect one of them.

While using gRPC protocol between client and server any of these strategies are fine as long as the proxy load balancer understands the communication is gRPC. Why this is important ? Because, gRPC is basically using HTTP 2.0 and if the load balancer is not working on Layer 7, it cannot differentiate the protocol and services cannot fully utilise gRPC. For more info about this please check [1].

The Problem

Nearly all load balancing solutions (Nginx, Linkerd, Haproxy, Envoy etc.) supports assigning weight to backend servers. So they can route ingress traffic according to those weights, but what happens if the weight values for the backend servers dynamically change every second (depending on some internal operation in the application) and if they are overly loaded they want stop receiving any traffic ? In this case we need some kind of rate limiting on some internal metric on the application.

Proposed Solution

The Architecture

If we use proxy load balancing as a solution for our problem we need an external tool that would be aware of all the backend servers and it would also go and do some healthcheck on those servers and according to the result it would update load balancer configuration on proxy load balancer and reload it. Implementing such a tool and making it scalable would be very difficult.

If we use client side load balancing with thick client the application becomes very hard to manage since there are static backend lists on all clients. Client side load balancing with look aside strategy would be a better fit, but again we would need an external server that can do all the complex healthchecks.

Instead, the proposed solution uses a mix of these strategies. We will use an Envoy proxy as a proxy load balancer and we will use Envoy’s dynamic discovery services to get all the available backend servers frequently and reload Envoy every time there is a change. We will still use an external server but it won’t be a complex one that would do complicated healthchecks on backend server list but instead it will be a very simple REST API that can return the list of available servers and update the list (add/remove) when requested. All the backend servers will register themselves to this discovery server and update if needed according to their internal status and envoy will also be talking to this discovery service and reload its config according to servers returned from the API. That’s how overly loaded server instances will remove themselves from the load balancer and add back once the load is lower. The overall diagram would look like following:

+------------+        +------------+          +------------+
| Client - I |--------|  Envoy     |--------- | Discovery  |
|            |       -|            |          | Service API|
+------------+     -/ +------------+          +------------+
                 -/      |   |  \                      | | |
+------------+ -/        |   \   -\                    | | |
| Client - II|/          |    |    \                   | | |
|            |           |    \     -\                 | | |
+------------+           |     |      \                | | |
                         |     \       -\              | | |
                         |      |        |             | | |
                         |      \      +------------+  | | |
                         |       |     |Backend     |--| | |
                         |       \     |Server - III|    | |
                         |        |    +------------+    | |
                         |      +------------+           | |
                         |      | Backend    |-----------| |
                         |      | Server - II|             |
                         |      +------------+             |
                      +------------+                       |
                      | Backend    |-----------------------|
                      | Server - I |

How Envoy Works ?

One of the most powerful features in Envoy is to separate data plane (the Envoy instances that route your traffic) from control plane which acts as the source of truth for the current state of your infrastructure and your desired configuration. Envoy’s data plane is universal you can replace the control plane with whatever you wish and use it [2]. Envoy also provides an API for anyone who wants to implement a new control plane. For a very custom service discovery (with multiple AZ and server weights etc) this xDS API can be used [3]. Letter x in xDS means it is a variable and possible options for x are:

For detailed explanations you can take a look at the xDS protocol documentation.

In our setup we will try to implement a very simple REST API for EDS. All the servers will register themselves to EDS API and Envoy will get available servers from the same place. When a server becomes overloaded (it is not necessarily unhealthy but it doesn’t want to get any more requests before finishing what is in progress), it can deregister itself and later when it becomes available again it can register again. From client’s perspective all these operations will be transparent because it will only talk with Envoy.

Seeing Everything in Action

I created a demo application that has a client and a server and they talk with gRPC protocol. What it does is really simple: The client starts bi-directional gRPC streaming with the server and sends random integers, and server returns the max among these integers to the client if the integer sent is the max (this application is a modified version of pahanini/go-grpc-bidirectional-streaming-example). This modified version reads the address of EDS server and its own IP address from the environment to register itself to the EDS service. Also this modified version accepts at most 3 client connections and once it reaches the limit it deregisters itself from the EDS service until the connection count drops below 3 again. To get the source code and create binaries:

git clone
cd grpc-demo-app/

After completing steps above you will have server and client binaries under ./bin directory of the cloned repo. Then you will need the service that will act as Endpoint Discovery Service. It is a python application that can be used directly or inside a docker container (this application is a modified version of salrashid123/envoy_discovery). To get the source code and build docker image:

git clone
cd envoy_discovery/
docker build -t eds_api .

Now we need to get Envoy binary to our system [4] and then start by running the eds_api first:

docker run -d -p8080:8080 eds_api

You can check its logs or go to http://localhost:8080/ on your browser to find out if it works or not. Then we will start Envoy using the following config:

# envoy.yaml
  access_log_path: /dev/null
      port_value: 9000

  cluster: mycluster
  id: test-id

  - name: listener_0

      socket_address: { address:, port_value: 10000 }

    - filters:
      - name: envoy.http_connection_manager
          stat_prefix: ingress_http
          codec_type: AUTO
            name: local_route
            - name: local_service
              domains: ["*"]
              - match: { prefix: "/" }
                route: { cluster: service_backend }
          - name: envoy.router  

  - name: service_backend
    type: EDS  
    connect_timeout: 0.1s
    http2_protocol_options: {}
        keepalive_time: 300
      service_name: eds-cluster-service
          api_type: REST
          cluster_names: [eds_cluster]
          refresh_delay: 0.25s
  - name: eds_cluster
    type: STRICT_DNS
    connect_timeout: 0.1s
    hosts: [{ socket_address: { address:, port_value: 8080 }}]
envoy -c envoy.yaml

Then we create multiple terminals in grpc-demo-app/ directory where the client and server binaries are already built and start a server process with proper environment variables:

EDS_SERVER="localhost" MY_IP="" ./bin/server

From other terminals we start 3 client processes, since by default client tries to connect port 50005 we will change it to 10000 to make it connect to server over Envoy:

./bin/client -p 10000

At this point all 3 clients should be running without any problems. If you make a GET request to http://localhost:8080/edsservice/eds-cluster-service endpoint you will see a json representing the server component running on your local. Actually this is exactly how Envoy discovers services. From the config you may notice the service name is the same as that application uses to register itself. If you also want details regarding how to register/deregister, you can check the source code from Github repo. Now, if you try to create another client which tries to connect to server over Envoy it will get an error that says can not receive rpc error: code = Unavailable desc = no healthy upstream, because after reaching 3 connections, the server deregistered itself from the EDS service and Envoy stopped routing traffic to it. If you make a GET request to the same endpoint now you should see hosts array as empty in the json response ({"hosts": []}).


[1] -

[2] -

[3] -

[4] -