Enabling `perf` in Kubernetes with Docker’s default seccomp profile
Have you been trying to profile your Kubernetes applications with
perf? Maybe you want to see what all the FlameGraphs fuss is about? If your version of Docker was upgraded within the last year, you’ll likely run into issues.
Starting in v17.06 of Docker,
perf_event_open is blocked by the default seccomp profile. Which means running
perf inside your container will get you this:
perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected error 1 (Operation not permitted) perf_event_open(..., 0) failed unexpectedly with error 1 (Operation not permitted) You may not have permission to collect stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid: -1 - Not paranoid at all 0 - Disallow raw tracepoint access for unpriv 1 - Disallow cpu events for unpriv 2 - Disallow kernel profiling for unpriv
Trying to alter the suggested
/proc/sys/kernel/perf_event_paranoid from within the container gets you the expected:
bash: /proc/sys/kernel/perf_event_paranoid: Read-only file system
What to do? You’ll need to enable
CAP_SYS_ADMIN. This flag is one of many Linux capabilities, so named for the extra capabilities they grant. These flags grant scoped permission escalations for threads to perform specific tasks, from changing file attributes to altering the system clock.
CAP_SYS_ADMIN is a particularly overloaded one, a kitchen sink of permissions escalations mostly geared toward profiling work.
If you’re only working with Docker, you can add
--cap-add SYS_ADMIN to your
docker run command, as explored here.
However, if you’re living that Kubernetes life, you’ll need to enable it using a
securityContext. In the container spec of your deployment file, add:
securityContext: capabilities: add: ["SYS_ADMIN"]
And you’ll be good to go!
Note that you need to strip the
CAP prefix when adding capabilities in Kubernetes. You can read more about container privileges in Kubernetes here.
Remember to remove this setting when you’re done using it!
perf_event_open is blocked by default because it grants user processes privileged access to the system. Branch deploy your change, use it, then rollback.