Using Linux Capabilities in Containers & Kubernetes

👋 Hi! I’m Bibin Wilson. In each edition, I share practical tips, guides, and the latest trends in DevOps and MLOps to make your day-to-day DevOps tasks more efficient. If someone forwarded this email to you, you can subscribe here to never miss out!

In this edition, we will look at:

  1. What Are Linux Capabilities?

  2. Linux Capabilities in Containers

  3. Kubernetes and Linux Capabilities

In our container non-root edition, I mentioned the concepts of Linux Capabilities.

It is an overlooked security feature that plays a key role in container security and Kubernetes configurations.

In this edition, we will dive a little deeper into Linux Capabilities to understand how they relate to containers and Kubernetes.

What Are Linux Capabilities?

In traditional Linux, a process is either root (superuser) or non-root (restricted). A concept you all know.

Linux Capabilities was introduced in kernel 2.2. Before that,

  • processes either had root privileges (Privileged processes (UID=0): Full root access.)

  • regular user privileges (Non-privileged processes (UID≠0): Limited permissions.

The problem with this approach was if a non-root user program needed to run one privileged operation, it had to run with full root access. For example, binding to a ports below 1024 that require root privileges.

What if there is mechanism where a non-root user can be provided privileged access to that only operation?

This is what Linux Capabilities solve.

Linux capabilities solve this by segregating root privileges into separate units that can be given access individually.

For example,

  1. CAP_NET_BIND_SERVICE: To grant the permission to bind to privileged ports.

  2. CAP_NET_ADMIN: To manage network interfaces.

  3. CAP_SYS_TIME: To modify the system clock.

With this, a non-root user can be granted only CAP_NET_BIND_SERVICE to bind to a privileged port while blocking all the other root related access.

Just run the following command to list all the supported capabilities in Linux.

man capabilities

There are 50 different capabilities in today's Linux kernel (I tested this on an Ubuntu server).

Now that we have an understanding of Linux capabilities, let's understand how containers and Kubernetes use them.

Linux Capabilities & Containers

By default, containers run as root (unless you run as non-root).

But this doesn’t mean they have full root privileges on the host.

Docker and other container runtimes use Linux Capabilities to restrict container permissions for enhanced security. This make the container environment more secure, even though the user ID (UID 0) remains the same inside the container and on the host.

Docker, for instance, drops many Capabilities by default and uses only required Capabilities.

Containerd code shows these default allowed capabilities.

CRIO has the following defaults. Refer doc here.

Let's look at an example using Docker.

Let's try to create a BusyBox container to create a dummy network interface.

$ docker run --rm -it \
    --name test_no_cap busybox sh

/ # ip link add dummy0 type dummy
ip: RTNETLINK answers: Operation not permitted

As you can see, the container lacks CAP_NET_ADMIN, so it cannot modify network interfaces.

Now, run the same container but with the required capability using --cap-add=NET_ADMIN flag.

$ docker run --rm -it --cap-add=NET_ADMIN \
    --name test_with_cap busybox sh

/ # ip link add dummy0 type dummy
/ # ip link show dummy0
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop qlen 1000
    link/ether 0a:0c:31:af:1e:0b brd ff:ff:ff:ff:ff:ff

Since the container has CAP_NET_ADMIN, it can create network interfaces.

Kubernetes and Linux Capabilities

When it comes to kubernetes, You can add or drop Linux capabilities in your SecurityContext to reduce attack surfaces.

Please refer to the SecurityContext edition to know more about k8s SecurityContext

Let's understand this with an example.

When you run a pod with a BusyBox image, by default, you will be able to use ping.

For example:

$ kubectl run ping-pod \
      --image=busybox --restart=Never \
      -it -- sh -c "ping 8.8.8.8"

64 bytes from 8.8.8.8: seq=1 ttl=61 time=13.500 ms
64 bytes from 8.8.8.8: seq=2 ttl=61 time=16.598 ms
64 bytes from 8.8.8.8: seq=3 ttl=61 time=16.262 ms

Now, let's say you don't want to allow the BusyBox pod to perform ping.

In this case, we drop the NET_RAW capability using the Security Context.

The NET_RAW capability allows a container to create and use raw network sockets. This is required for commands like ping and some network debugging tools.

By dropping NET_RAW, we prevent the container from sending raw packets

For example,

apiVersion: v1
kind: Pod
metadata:
  name: busybox-ping
spec:
  containers:
  - name: busybox
    image: busybox:latest
    command: ["sleep", "3600"]
    securityContext:
      capabilities:
        drop:
          - NET_RAW

If you deploy this pod and try ping, you will get the following error.

$ kubectl exec -it busybox-secure -- ping 8.8.8.8

PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: permission denied (are you root?)
command terminated with exit code 1

Wrapping Up

Linux capabilities play a important role in enforcing the principle of least privilege by allowing fine-grained control over what processes can do.

For DevOps engineers, being proactive about these best practices is important.

Security should not be an afterthought.

I’ve seen teams rushing to fix clusters after a security audit when issues could have been prevented earlier.

By taking small but smart security steps wherever necessary, you can avoid last-minute surprises and keep your infrastructure safe from the start.

Reply

or to participate.