Custom Load Balancing for gRPC applications with Envoy xDS API - Part II (Rate Limiting in Kubernetes)


In this setup we will deploy all the application to Kubernetes and show that once our server application reaches 3 client connections, it will not accept any traffic. Once the high load occurs on server side the server pods will deregister themselves from the load balancer discovery service and Envoy will remove them from the server list. Everything will be dynamic which means no manual interception will be done except increasing the number of clients. You can check the server source code from Github to see what is going on [1].

See everything in action using Kubernetes

In the previous post we went over a strategy to implement custom load balancing for gRPC applications. It was using binaries directly to prove the concept, this time we will use minikube to deploy the solution to Kubernetes environment and see it in action.

Building Docker images

git clone
cd grpc-demo-app
make client-docker
make server-docker
git clone
cd envoy_discovery
docker build -t envoy-discovery-service .

I already built them and pushed images to Docker Hub, but if you are building locally please read this part from minikube documentation.

Kubernetes manifests

I also prepared example manifests to use while deploying the app on kubernetes. They can be reached from here


minikube start -p grpc-demo
git clone
cd grpc-rate-limiting-minikube
kubectl apply -f eds-service.yaml
kubectl apply -f server.yaml
kubectl apply -f envoy-proxy.yaml
kubectl apply -f client.yaml

At this point, server pods registered themselves to EDS discovery service and ready to accept traffic from envoy proxy. Client pods hit envoy proxy and routed to one of the server pods. Since load is acceptable from server side everything works fine, you can check the logs and see how application is working.

To create high load on servers and make them deregister themselves from EDS discovery service we increase the number of client pods:

kubectl scale --current-replicas=5 --replicas=20 deployment/grpc-rate-limiting-example-client

Now, the servers are under high load and once they got 3 client connections they decide the load is too much and deregister themselves from EDS discovery service [2]. You can check /edsservice/eds-cluster-service REST endpoint on the EDS discovery service or check logs from envoy, to see there are no upstream servers that envoy can route traffic to. Some of the client pods would be failing and getting restarted by the Kubernetes scheduler. Once we lower the client count we will see the servers register themselves back to the EDS discovery service and envoy can route traffic to them without any problems:

kubectl scale --current-replicas=20 --replicas=3 deployment/grpc-rate-limiting-example-client

A similar strategy to this one can be used to rate limit requests from client app to server using any custom internal server metric. Since servers decide on themselves whether to receive traffic or not, it is very dynamic. Envoy has a great design and architecture, with very little effort very complex load balancing requirements can be implemented.


[1] -

[2] -