How to Autoscale Workloads in Kubernetes with KEDA

Introduction

KEDA is a Kubernetes-based Event Driven Autoscaler. When using KEDA, we can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. It means we are not fully related only to standard CPU and memory metrics. KEDA handles the triggers to respond to events that occur in other services and scales workloads as needed based on the real events that comes to our application. I highly recommend you to visit the official site: https://keda.sh/ for more.

What KEDA is?

KEDA is a single-purpose and lightweight component that can be added to any Kubernetes cluster and it works alongside standard Kubernetes components like Horizontal Pod Autoscaler. With KEDA we can explicitly map and scale our applications, with other applications continuing to function without any touch. This makes KEDA very flexible and safe option to run any applications on Kubernetes cluster.

In addition KEDA offers an abstraction to HPA, so that autoscaling can be easily done according to resources from various sources. It introduces more than 30 built-in scalers, you can check the full list of KEDA built-in scalers here: https://keda.sh/docs/2.0/scalers/. Of course it is also really easy to build its own custom one if needed.

KEDA Architecture

KEDA is composed of several core components:

Metric Adapter: it behaves like a metric server that will be use by the HPA
KEDA Operator: that interacts with a third-party solution to collect events on a third-party solution and report the metric to the metric adapter
KEDA CRDs: through which KEDA manages the HPA by defining an autoscaling policy

Beside core components it also creates some Custom Resource Definitions to the Kubernetes cluster which are:

scaledObjects
ScaledJobs
triggerAuthentications
clusterTriggerAuthentications

ScaledObject allows to manage the autoscaling of Kubernetes deployments, and any CRD defined.

ScaledJobs represents the mapping between an event source and a Kubernetes job. This helps to figure out the number of jobs based on events.

TriggerAuthentication and ClusterTriggerAuthentication contain the authentication configuration and secrets to collect and monitor metrics from an external data source with the help of a scaler. Each scaler has its specific way of authenticating so this is why those two CRDs are required.

Installing KEDA

KEDA can be deployed in a Kubernetes cluster through Helm charts, operator hub, or YAML declarations. In our case I will use Helm because in my opinion, it is the simplest way. I assume you have already Helm installed, if not install it prior proceeding with below tasks.

To add the Helm repo execute:

$ helm repo add kedacore https://kedacore.github.io/charts

Next, update the Helm repo:

$ helm repo update

Finally, we need to create the namespace for KEDA and install it via Helm:

$ kubectl create namespace keda
$ helm install keda kedacore/keda --namespace keda

Once the installation is successful and you see the welcome message, let’s verify whether the KEDA Pods are up and running by executing the below command:

$ kubectl get deploy,crd -n keda

You should see similar output as mine, so all KEDA Deployments and Pods should be up and running:

Deploying Sample NGINX Application

Let’s create a file called nginx-deployment.yaml with the below content to simulate our sample NGINX web application that is exposed through the NodePort service type. File content is as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deploy
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx-svc
spec:
  ports:
  - port: 80
    protocol: TCP
    nodePort: 30080
    targetPort: 80
  selector:
    app: nginx
  type: NodePort

Let’s create the deployment and service and verify if both are up and running:

$ kubectl apply -f nginx-deployment.yaml
$ kubectl get deploy,pod,svc

Now we should be able to access the nginx home page using the below command:

$ curl http://localhost:30080

To quickly sum up, I have my sample NGINX application deployed (accessible externally via service). I have also deployed KEDA. Let’s now see how we can scale our application using KEDA.

Auto-Scaling with KEDA HTTP Scaler

KEDA integrates with multiple scalers (event sources) and uses Custom Resources (CRDs) to define the desired scaling behavior and parameters. Now, I want KEDA to scale my Deployment based on HTTP traffic that comes to my application. KEDA will monitor service and based on the current load will automatically scale the resource out/in accordingly.

However, before starting the implementation, you need to know that KEDA uses following components to do the scaling:

Scaler – KEDA uses a Scaler to detect if a deployment should be activated or deactivated which in turn is fed into a specific event source.
ScaledObject – it is deployed as a Kubernetes CRD (Custom Resource Definition) which defines the relationship between an event source to a specific workload (i.e. Deployment) for scaling. I will define the ScaledObject which contains the event details that KEDA will monitor to scale my sample NGINX application with a service accordingly.

To implement KEDA autoscaling, we have actually two ways to follow:

Use Prometheus scaler to create scale rules based on metrics around HTTP events.
Use KEDA HTTP Add-on.

I will give you an example how to deal with the KEDA HTTP Add-on. The KEDA HTTP Add-on allows Kubernetes to automatically scale the application up and down (including to/from zero) based on incoming HTTP traffic. KEDA doesn’t come with an HTTP scaler by default, so I have to install it separately, execute the below command to do so:

$ helm install http-add-on kedacore/keda-add-ons-http --namespace keda

It’s time to create a new ScaledObject for HTTP Trigger now. In my case I defined keda-scaledobjects.yaml manifest file with following content:

kind: HTTPScaledObject
apiVersion: http.keda.sh/v1alpha1
metadata:
    name: nginx-http-scaledobject
spec:
    host: myhost.com
    targetPendingRequests: 1
    scaleTargetRef:
        deployment: nginx-deploy
        service: nginx-svc
        port: 80
    replicas:
        min: 0
        max: 5

In above YAML manifest I defined a ScaledObject of type HTTPScaledObject. It is listening to the requests which come from myhost.com. The minimum replica count is set to 0 and the maximum replica count is 5. I have defined my Deployment name and Service name to correlate them with ScaledObject itself.

To create the ScaledObject execute the below command:

$ kubectl apply -f keda-scaledobjects.yaml

Once the ScaledObject is created, the KEDA controller automatically syncs the configuration and starts watching the Deployment nginx-deploy and Service nginx-svc created earlier. At the beginning my application should be scaled down to 0 (so should be not running at all), because we have set the minimum replica count to 0. We can check the status of ScaledObjects and related objects like Service and Deployment using below command:

$ kubectl get scaledobjects,deploy,svc

Indeed there is 0 Pods running inside the Deployment nginx-deploy, so until now everything as expected.

Let’s generate some traffic and test if my KEDA autoscaling is working end to end.

A Kubernetes Service called keda-add-ons-http-interceptor-proxy is created automatically when I installed the Helm chart for HTTP-add-on. It is internal one and it is inaccessible over the network from outside of the Kubernetes cluster. For autoscaling to work properly HTTP traffic must route through the above service first. We can use kubectl port-forward to test this quickly in our setup, but keep in mind that for any production use case you should use K8s Ingress to route traffic to the Keda interceptor service.

$ kubectl port-forward svc/keda-add-ons-http-interceptor-proxy -n keda 30080:8080

Above configuration redirect connections from NodePort 30080 to port 8080 to the keda-add-ons-http-interceptor-proxy service that is running in Kubernetes Cluster. Let’s make some requests to my test NGINX application and simulate the requests. I will use simple curl command as below:

$ curl localhost:30080 -H 'Host: myhost.com'

If I check the Pods now l should notice that the Deployment was scaled to a single replica (single Pod should be running). So when we routed the traffic to the KEDA’s service, the interceptor keeps track of the number of pending HTTP requests that haven’t had a reply yet. Result is as expected:

Same I can observe within the K8s events:

The KEDA scaler periodically checks the size of the queue of the interceptor and stores the metrics. The KEDA controller monitors the metrics and increases or decreases the number of replicas as needed. In this case, a single request was pending so the KEDA controller scaled the deployment to a single replica.

After some time if no traffic would go to my NGINX service the number of replicas is changed to 0 again.

Conclusion

In this article, I explored the fundamentals of KEDA and delved into its concepts and architecture design. I presented you how to quickly install it and add a HTTP Scaler. I introduce the sample NGINX application within the Kubernetes cluster which was scaled automatically based on the HTTP traffic that was coming to the app using KEDA Custom Resource Definition. I hope you can see now that if you use Event Driven Pod Autoscaler then we can scale your application on demand based on real events and reduce the infrastructure cost to minimum.