Enabling `perf` in Kubernetes with Docker’s default seccomp profile

Have you been trying to profile your Kubernetes applications with perf? Maybe you want to see what all the FlameGraphs fuss is about? If your version of Docker was upgraded within the last year, you’ll likely run into issues.

Starting in v17.06 of Docker, perf_event_open is blocked by the default seccomp profile. Which means running perf inside your container will get you this:

perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected error 1 (Operation not permitted)
perf_event_open(..., 0) failed unexpectedly with error 1 (Operation not permitted)
You may not have permission to collect stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
 -1 - Not paranoid at all
  0 - Disallow raw tracepoint access for unpriv
  1 - Disallow cpu events for unpriv
  2 - Disallow kernel profiling for unpriv

Trying to alter the suggested /proc/sys/kernel/perf_event_paranoid from within the container gets you the expected:

bash: /proc/sys/kernel/perf_event_paranoid: Read-only file system

What to do? You’ll need to enable CAP_SYS_ADMIN. This flag is one of many Linux capabilities, so named for the extra capabilities they grant. These flags grant scoped permission escalations for threads to perform specific tasks, from changing file attributes to altering the system clock. CAP_SYS_ADMIN is a particularly overloaded one, a kitchen sink of permissions escalations mostly geared toward profiling work.

If you’re only working with Docker, you can add --cap-add SYS_ADMIN to your docker run command, as explored here.

However, if you’re living that Kubernetes life, you’ll need to enable it using a securityContext. In the container spec of your deployment file, add:

securityContext:
    capabilities:
        add: ["SYS_ADMIN"]

And you’ll be good to go!

Note that you need to strip the CAP prefix when adding capabilities in Kubernetes. You can read more about container privileges in Kubernetes here.

Remember to remove this setting when you’re done using it! perf_event_open is blocked by default because it grants user processes privileged access to the system. Branch deploy your change, use it, then rollback.

Alice Goldfuss

Alice Goldfuss
Alice Goldfuss is a systems punk currently helping GitHub run their cutting-edge container platform. She loves kernel crashes, memory design, and performance hacks. :rainbow: :floppy_disk: :octocat:

Alice has consulted on some books (Docker: Up & Running, Effective DevOps, Site Reliability Engineering vol 2), presented at some conferences (SREcon, Velocity, Container Summit), and run some others (LISA17, DevOps Days Portland). You can follow her on Twitter :bird: (@alicegoldfuss), but you’ll probably regret it.

Function Dispatch Tables in C

Published on March 07, 2019

2018 Year in Review

Published on December 27, 2018