Kubernetes DaemonSets: A Detailed Introductory Tutorial

Kubernetes is one of the most popular container orchestrator systems. One of the reasons for its popularity is the fact that it offloads you from a lot of maintenance tasks when it comes to containers. It does a lot of stuff for you. For example, Kubernetes saves you a lot of time from planning where to deploy your microservices and spending even more time making sure that all the pods are distributed equally across all available nodes.

But as with everything, there is no one solution that suits all use cases. That's why Kubernetes has a few different types of deployment strategies. In this post, you'll learn what DaemonSets are, what advantages they bring, and when to use them.

What Is Kubernetes Deployment?

Before we dive into DaemonSets, let's make sure we understand the general concept of Kubernetes workflows. Kubernetes is quite a complex system with a lot of components and options. There are many choices for networking, storage, scaling, etc. But the core function of Kubernetes is to run containers. So, if you want to instruct Kubernetes to run a container (as a pod), you need to create a workflow.

As with anything else on Kubernetes, there are a few configuration options for workflows. The most common type of workflow is Deployment. Creating a Deployment means telling Kubernetes, "Please run a container from this Docker image." This is, of course, a hugely simplified explanation, but you get the idea. To create a workflow of a Deployment type, you need to include just that in your typical Kubernetes YAML definition:


apiVersion: apps/v1
kind: Deployment
metadata:
	(...)
spec:
	(...)

Deployment in Action

So what happens when you create a Deployment? Kubernetes will first find appropriate nodes to run your pod. One of the main criteria for being "appropriate" is the load on the node. Kubernetes, by default, will try to distribute the load across all nodes. So, for example, say you have five nodes, and on four of them you have 10 pods running, whereas the last one is running only eight. There's a high chance that Kubernetes will schedule any new Deployment on that last node. Also, when one of the nodes becomes unavailable for whatever reason, Kubernetes will try to reschedule all the pods that were running on that node to the remaining nodes, and again, it will try to distribute these pods to all nodes.

All of this decision-making on where to schedule containers is happening under the hood, and you don't need to worry about where your pods will be scheduled. This is one of the main features of Kubernetes. You just add new nodes whenever your cluster becomes saturated, and Kubernetes does all the management for you.

Different Types of Workflows

All of the above is just Kubernetes' default behavior. Of course, sometimes you actually may want to have more control over the scheduling process. You may want to schedule some microservices on specific nodes, something that's often used with multiple node pools. For example, you may want to add a few nodes with high-performance graphics cards and schedule some big data for AI processing microservices specifically on these nodes. This is just one example. There are more use cases where you may want a different behavior from Kubernetes than the default "schedule my pods anywhere." One such use case is the need for scheduling a copy of a pod on every single node. Let me now introduce you to DaemonSets.

Enter DaemonSet

So why would you want to schedule the same containers on every single node? There are many possible reasons. The most common one is the need for scheduling a "daemon"-type application that needs to perform some action on every node. Common examples are logs or metrics-gathering daemons. It's also possible to schedule a copy of a pod not on all nodes but on a subset of them. This can be useful for scheduling a daemon-type pod, for example, only on a specific node pool.

For instance, if you want to get metrics (like CPU or RAM usage) from each node, the best option is to schedule a container on every node that will gather these metrics from each individual node. Why not simply schedule one container instead that will gather metrics from all nodes? Well, you would run the risk that the node on which the metrics are running dies for whatever reason, and you'd lose metrics from the whole cluster. Of course, Kubernetes would redeploy that service on another node. But depending on how busy your cluster is, that could take a while, and therefore, you would miss some of the data. In the case of metrics, maybe it wouldn't be such a big deal, but imagine losing logs from all containers for a moment.

But besides these common use cases, you may simply want to have a copy of the same container on every node for any application-specific use case—things like node-local application caches, for example.

DaemonSets in Detail

Now that we understand the need behind DaemonSets, let's talk about them in more detail. We know already that the main point of a DaemonSet is to ensure that all nodes are running a copy of a pod. Therefore, unlike with a typical Kubernetes Deployment, you don't specify how many pods you want to run. Kubernetes will automatically run as many pods as you have nodes. Another difference from normal deployment is the fact that in case of a node being removed from the cluster, Kubernetes won't move the pod that belongs to the DaemonSet to a different node but instead will simply destroy it.

So how do you create a workflow with DaemonSet? Very similarly to a normal Deployment. In fact, as with any other Kubernetes definition, you need to prepare a YAML definition with apiVersion, kind, and metadata fields. However, instead of Deployment, the kind value, in this case, will be DaemonSet. So an example DaemonSet YAML definition could look like this:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-daemon
spec:
  selector:
    matchLabels:
      name: fluentd-daemon
  template:
    metadata:
      labels:
        name: fluentd-daemon
    spec:
      containers:
        - image: fluent/fluentd
          name: fluentd-daemon

Following the idea of a DaemonSet, the above definition will deploy a fluentd pod on every node in the cluster. Kubernetes will make sure that there's only one pod on every node. For example, if you have five nodes, you'll have five fluentd pods running. If one of the nodes becomes unavailable, you'll have four fluentd pods running.

Summary

Kubernetes DaemonSets can be a bit tricky to understand at first. They may seem like something against the whole point of Kubernetes. But just like with anything else, there are use cases where something that seems odd is actually useful. In the case of Kubernetes DaemonSets, they're quite commonly used for things like logs or monitoring. Also, don't forget that the main advantages of Kubernetes are flexibility and the ability to adjust it to different companies and infrastructures.

Of course, no one will force you to use DaemonSets. It's totally fine to not use them if you feel like you don't need them. But on the other hand, when you do actually need a daemon-like functionality, it's way better and easier to use DaemonSets than trying to achieve the same with normal Kubernetes Deployment. If you want to learn more about Kubernetes, check out this post about advanced concepts for Kubernetes pods.