I recently found myself needing a way to better handle rollouts on new deployments in Kubernetes for some applications. In certain situations, updates to a deployment would stop a pod which was in the middle of processing a request. Whilst these failed requests can be retried - none of our systems should expect everything to work 100% - stopping a pod whilst it only needed a few seconds to complete its work, is just a bit of a naive approach.

After Googling a better approach, I was gleeful to find Pod Lifecycle Handlers, a ways to hook handlers into the lifecycle events of a pod. For me this meant I can control the manner in which my more delicate application pods are terminated.

Lifecycle Hooks

The postStart hook is triggered immediately as the container is created. There are no guarantees this will run before the containers ENTRYPOINT as both are triggered at the same time. If the postStart handler takes too long or hangs, the container will not reach the running state.

The preStop hook is triggered immediately before the container is terminated. It blocks termination until the preStop handler completes or the pods grace period ends - which defaults to 30s. This allows a more graceful shutdown and reduces the number of errors seen during a rollout.

The handlers can either be commands as the following example shows or HTTP requests.

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: my-app
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "curl -O http://example.com/config.yaml.latest > config.yaml"]
      preStop:
        exec:
          command: ["/bin/sh","-c","/shutdown.sh"]

This example shows a simple usage of these hooks. Pulling some configuration from a remote source and triggering a script for a shutdown procedure. The pod will still live until the command in the preStop handler exits (or the terminationGracePeriodSeconds is hit), if f you required the pod to stop immediately you can call kubectl delete pod --grace-period=0 --force, this will terminates the pod with extreme prejudice šŸ˜ˆ.

Debugging Handlers

You can debug the handlers by running kubectl describe pod <pod_name>, which displays the events for the pod.

Events:
  FirstSeen  LastSeen  Count  From                                                   SubobjectPath          Type      Reason               Message
  ---------  --------  -----  ----                                                   -------------          --------  ------               -------
  1m         1m        1      {default-scheduler }                                                          Normal    Scheduled            Successfully assigned test-1730497541-cq1d2 to gke-test-cluster-default-pool-a07e5d30-siqd
  1m         1m        1      {kubelet gke-test-cluster-default-pool-a07e5d30-siqd}  spec.containers{main}  Normal    Pulling              pulling image "test:1.0"
  1m         1m        1      {kubelet gke-test-cluster-default-pool-a07e5d30-siqd}  spec.containers{main}  Normal    Created              Created container with docker id 5c6a256a2567; Security:[seccomp=unconfined]
  1m         1m        1      {kubelet gke-test-cluster-default-pool-a07e5d30-siqd}  spec.containers{main}  Normal    Pulled               Successfully pulled image "test:1.0"
  1m         1m        1      {kubelet gke-test-cluster-default-pool-a07e5d30-siqd}  spec.containers{main}  Normal    Started              Started container with docker id 5c6a256a2567
  38s        38s       1      {kubelet gke-test-cluster-default-pool-a07e5d30-siqd}  spec.containers{main}  Normal    Killing              Killing container with docker id 5c6a256a2567: PostStart handler: Error executing in Docker Container: 1
  37s        37s       1      {kubelet gke-test-cluster-default-pool-a07e5d30-siqd}  spec.containers{main}  Normal    Killing              Killing container with docker id 8df9fdfd7054: PostStart handler: Error executing in Docker Container: 1
  38s        37s       2      {kubelet gke-test-cluster-default-pool-a07e5d30-siqd}                         Warning   FailedSync           Error syncing pod, skipping: failed to "StartContainer" for "main" with RunContainerError: "PostStart handler: Error executing in Docker Container: 1"

Other notes

  • The handlers are intended to be triggered at least once but maybe triggered multiple time in certain events.
  • When a pod enters the terminating state it is removed from the endpoints list for a service.

As always, I appreciate any feedback or if you want to reach out, Iā€™m @neuralsandwich on twitter and most other places.

References: