Skip to content
Aleix Raventós
~10 min read

From Pending to CrashLoopBackOff: Understanding Every Pod Status

CrashLoopBackOff isn't a pod phase. Neither is ImagePullBackOff. Or Terminating. If that surprises you, you're not alone, and it's the root of most confusion around pod debugging. Kubernetes tracks pods at two levels, and kubectl mashes them together into a single status column that hides more than it reveals.

We're going to untangle that. This post walks through every pod status by deploying pods that break in every possible way, on a local cluster, so you can see exactly where each failure happens in the lifecycle and what to check first.

We'll use Kunobi to inspect the pods visually, but the concepts apply regardless of what tool you use.


Part 1: The Theory

Before we start breaking things, let's understand the model. Kubernetes tracks pod lifecycle at two levels: the pod phase and the container state. Mixing these up is the most common source of confusion.

Pod phases

Every pod is in exactly one of five phases. These are the official phases as defined by the Kubernetes API (and documented in the official docs):

PhaseWhat it means
PendingThe cluster accepted the pod, but it's not ready to run yet. Could be waiting for scheduling, image pulls, or init containers.
RunningThe pod is bound to a node and at least one container is running, starting or restarting.
SucceededAll containers exited with code 0 (success). They won't be restarted.
FailedAll containers terminated and at least one exited with a non-zero code. There's no automatic restart.
UnknownThe control plane lost contact with the node. Something is wrong at the infrastructure level and the state of the Pod couldn't be obtained.

Five phases. That's it.

So what's CrashLoopBackOff?

If you've looked at the official docs, you might have noticed that CrashLoopBackOff, ImagePullBackOff, Terminating, and CreateContainerConfigError aren't in the list of phases. That's because they're not phases. They're display statuses that kubectl computes from the container states inside the pod. A pod phase and a container state are two separate concepts.

A pod in CrashLoopBackOff is actually in the Running phase. Its container just keeps crashing and restarting. The display status tells you what's happening, but the phase tells you where in the lifecycle the pod is.

This distinction matters because it tells you where to look. A pod stuck in Pending has a scheduling problem. A pod in Running that shows CrashLoopBackOff has an application problem. Both mean there's a problem, but they require a different investigation.

Container states

Inside a pod, each container is in one of three states:

StateWhat it means
WaitingThe container isn't running yet. It could be pulling an image, waiting for an init container, or blocked by a missing config. The reason field tells you why.
RunningThe container is executing. If there's a postStart hook, it already ran.
TerminatedThe container stopped. The reason field says Completed (exit code 0) or Error (non-zero). The exitCode field has the number.

A pod's display status is derived from the combination of its phase and its containers' states. When you see ImagePullBackOff, that's a container in the Waiting state with reason ImagePullBackOff, inside a pod in the Pending phase.

Three concepts, then. The pod phase is where in the lifecycle the pod is. The container state is what each individual container is doing. The display status is the human-readable summary that kubectl shows you, computed from the other two. Once this distinction is internalized, things become more intuitive.

Restart policies

What happens after a container exits depends on the pod's restartPolicy:

PolicyBehaviorDefault for
AlwaysRestart the container no matter what, even if it exited cleanly.Deployments, ReplicaSets
OnFailureRestart only if the exit code is non-zero.Jobs (sometimes)
NeverDon't restart. The container stays dead.Jobs (default)

When restartPolicy: Always kicks in and the container keeps crashing, Kubernetes applies an exponential backoff: 10s, 20s, 40s, 80s, 160s, capped at 5 minutes. That's why CrashLoopBackOff takes longer and longer between restarts. The backoff resets after the container runs successfully for 10 minutes.

The event chain

When everything goes right, a pod produces five events in this order: Scheduled (the scheduler picked a node), Pulling (pulling the container image), Pulled (image pull succeeded), Created (container created), and Started (container started).

Every failure we'll see in the demo that comes afterwards aims to break this chain at a specific point. Once you find where the chain broke, you can tell what's wrong.

The lifecycle diagram

graph TD
    A[Pod Created] --> B[Pending]
    B -->|Scheduler finds a node| C[Scheduled]
    B -->|No node available| P1[Pending forever<br/>FailedScheduling]

    C -->|Missing Secret/ConfigMap| P2[CreateContainerConfigError]
    C -->|Image pull succeeds| D[Container Created]
    C -->|Image not found| P3[ImagePullBackOff]

    D --> E[Running]

    E -->|Exit code 0 + restartPolicy Never| F[Succeeded / Completed]
    E -->|Exit code != 0 + restartPolicy Never| G[Failed / Error]
    E -->|Exit code != 0 + restartPolicy Always| H[CrashLoopBackOff]
    E -->|Memory limit exceeded| I[OOMKilled]
    E -->|Liveness probe fails| J[Killed + Restarted]
    E -->|Pod deleted| K[Terminating]

    style P1 fill:#CCFF02,color:#171717
    style P2 fill:#CCFF02,color:#171717
    style P3 fill:#CCFF02,color:#171717
    style G fill:#CCFF02,color:#171717
    style H fill:#CCFF02,color:#171717
    style I fill:#CCFF02,color:#171717
    style F fill:#CCFF02,color:#171717
    style K fill:#CCFF02,color:#171717

Probes: how Kubernetes checks if your container is healthy

Kubernetes actively checks whether your container is alive, ready, and started. These checks are called probes.

ProbeWhat it checksWhat happens on failure
LivenessIs the container still alive?Kubernetes kills and restarts the container.
ReadinessCan the container handle traffic?The pod is removed from Service endpoints. No traffic is sent to it, but the container keeps running.
StartupHas the app finished initializing?Liveness and readiness probes are paused until the startup probe succeeds. Gives slow-starting apps time to boot.

Each probe can use one of four mechanisms: run a command inside the container (exec), make an HTTP GET request (httpGet), attempt a TCP connection (tcpSocket), or call a gRPC health endpoint (grpc).

The key configuration parameters are initialDelaySeconds (how long to wait before the first check), periodSeconds (how often to check, default 10s), failureThreshold (how many consecutive failures before taking action, default 3), and timeoutSeconds (how long to wait for a response, default 1s).

We'll see a liveness probe failure in action in the demo.


Demo time: Break Everything

We're going to deploy pods that fail in every possible way, ordered by where in the lifecycle chain they break.

Don't do this in production. You know better.

What you need if you want to follow along

  • Docker installed and running
  • kind for a local cluster
  • kubectl
  • Kunobi

Setup

kind create cluster --name pod-lifecycle-demo
kubectl create namespace demo

After the cluster is up, we'll open Kunobi and connect to the kind-pod-lifecycle-demo context, then switch to the demo namespace.

1. The happy path

Before breaking things, let's establish what normal looks like.

kubectl run healthy-nginx -n demo --image=nginx:alpine

Opening the pod in Kunobi and going to the Events tab, we can see the five events in order: Scheduled, Pulling, Pulled, Created, Started. The pod transitions from Pending to Running.

This is the baseline. Every scenario below breaks this chain at a different point. Enjoy it while it lasts.

For comparison, here's what kubectl describe pod healthy-nginx -n demo gives us for the same information.

2. Pending forever (FailedScheduling)

Let's start with a pod that never even gets off the ground, because the scheduler can't find a node for it.

kubectl apply -n demo -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: unschedulable
spec:
  containers:
  - name: app
    image: nginx:alpine
    resources:
      requests:
        cpu: "100"
        memory: "100Gi"
EOF

The pod requests 100 CPUs and 100Gi of RAM, which no node in a kind cluster can satisfy.

As you can see, the pod stays in Pending. The Events tab shows the message "Insufficient cpu" or "Insufficient memory." The chain never starts because there's no image pull, no container creation, nothing.

The Overview tab shows the Reason:

Two options here: reduce the resource requests to something the cluster can actually provide, or add nodes with more capacity. The Events tab tells you which resource is the bottleneck.

3. CreateContainerConfigError (missing Secret)

This one's interesting because the pod actually gets scheduled to a node, but then Kubernetes can't create the container because the pod spec references a resource that doesn't exist.

kubectl apply -n demo -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: missing-secret
spec:
  containers:
  - name: app
    image: nginx:alpine
    env:
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: database-credentials
          key: password
EOF

The container needs a Secret called database-credentials, which in this case doesn't exist.

The pod shows CreateContainerConfigError. The Events tab shows Successfully assigned ... (Scheduled OK, it found a node) and then Error with "secret 'database-credentials' not found." The image was never even pulled, because Kubernetes won't bother downloading your container if it can't wire up the config.

Create the missing Secret or ConfigMap, or fix the name if it's a typo. This is one of the most common config errors in production, easy to introduce and easy to fix once you know what to look for.

4. ImagePullBackOff (wrong image)

Now let's see what happens when we point a pod at an image that simply doesn't exist.

kubectl run bad-image -n demo --image=nginx:this-tag-does-not-exist

The pod moves from Pending to ErrImagePull, then to ImagePullBackOff after a few seconds. The Events tab shows Scheduled and Pulling, then Failed with "manifest unknown" or "not found." Kubernetes backs off and retries with increasing intervals.

The Events tab shows you the exact image it tried to pull. Check that the name and tag actually exist in the registry. Private registry? Check imagePullSecrets too.

5. CrashLoopBackOff (container keeps crashing)

Moving on, what happens when the event chain completes successfully and the container actually starts running, only to immediately exit with an error?

kubectl run crash-loop -n demo --image=busybox -- sh -c "echo 'Starting app...' && sleep 2 && echo 'Fatal error: database connection refused' && exit 1"

The pod briefly shows Running, then Error, then CrashLoopBackOff. The event chain completed successfully, the image pulled and the container started, but then it exited with code 1 and Kubernetes restarted it, so the restart count climbs and the backoff gets longer.

The Events tab looks almost normal because the container did start. The key insight here: the Events tab tells you that it crashed. The Logs tab tells you why.

As can be observed, the Restarts counter is going up:

Go straight to the Logs tab. From Kubernetes' perspective everything went fine, the container started, which is all it was asked to do. The crash reason lives in the application output, not the events.

6. Error (failed, no restart)

What happens now if we have the exact same bug, but the pod is configured to never restart?

kubectl run failed-no-restart -n demo --image=busybox --restart=Never -- sh -c "echo 'Connecting to database...' && sleep 2 && echo 'FATAL: connection refused' && exit 1"

The pod briefly shows Running, then Error. It stays there and there's no restart, there's no backoff, and there's no climbing restart count. The pod is in the Failed phase. It tried once, it didn't work, and it has accepted its fate.

Comparing this to the crash-loop pod from earlier:

  • crash-loop has restartPolicy: Always. It keeps restarting. You get CrashLoopBackOff.
  • failed-no-restart has restartPolicy: Never. It fails once and stays dead. You get Error.

Same bug but different behavior, because the restart policy is the only difference.

Same as CrashLoopBackOff: read the logs. The only difference is that this pod won't restart on its own, so after fixing the underlying issue you'll need to create a new one.

7. OOMKilled (out of memory)

Up until now the failures have been about things going wrong before or right after the container starts. This one is different: the container runs fine for a while, and then suddenly gets killed because it exceeded its memory limit.

kubectl apply -n demo -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: oom-killed
spec:
  containers:
  - name: app
    image: python:3.12-slim
    command: ["python3", "-c", "data = []; [data.append('x' * 10**6) for _ in range(1000)]"]
    resources:
      limits:
        memory: "50Mi"
      requests:
        memory: "50Mi"
EOF

This container allocates memory in a loop until it blows past the 50Mi limit.

The pod starts as Running, then transitions to OOMKilled. The container terminated with exit code 137. If you're wondering where that number comes from: Linux exit codes for killed processes follow the formula 128 + signal number. SIGKILL is signal 9, so 128 + 9 = 137. When you see 137 in Kubernetes, it almost always means the kernel's OOM killer stepped in because the container exceeded its memory limit. The kernel isn't nice enough to ask twice.

Before reaching for a higher limit, ask whether the limit was ever realistic for this workload. Sometimes the right answer is more memory. Sometimes it's a memory leak you didn't know about. Exit code 137 tells you that it was killed, but the logs and a memory profile tell you why.

8. Liveness probe failure (Kubernetes kills a "stuck" container)

But wait, what if the container doesn't crash and doesn't run out of memory, but just... stops responding? That's where liveness probes come in. The container keeps running, but it stops responding to health checks, and Kubernetes decides to take matters into its own hands.

kubectl apply -n demo -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: probe-failure
spec:
  containers:
  - name: app
    image: busybox
    command: ["sh", "-c", "touch /tmp/healthy && sleep 30 && rm /tmp/healthy && sleep 600"]
    livenessProbe:
      exec:
        command: ["cat", "/tmp/healthy"]
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 3
EOF

The container creates a file /tmp/healthy, waits 30 seconds, then deletes it. The liveness probe checks for that file every 5 seconds, so after the file is deleted the probe fails 3 times in a row and Kubernetes kills the container.

The pod runs normally for about 30 seconds. Then the Events tab shows Unhealthy events: "Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory." After 3 consecutive failures a Killing event appears, the container restarts, runs for another 30 seconds, and the cycle repeats. Kubernetes is doing exactly what you asked it to.

There are two possible culprits: the application (health endpoint genuinely broken) or the probe itself (timeout too short, failure threshold too low, wrong path). Check the Unhealthy event message first, it usually tells you whether the probe timed out or actively returned a failure, which points you toward one or the other.

9. Completed (pod finishes its work)

Not every pod is meant to run forever. Here's what a successful exit looks like.

kubectl run completed-job -n demo --image=busybox --restart=Never -- sh -c "echo 'Processing data...' && sleep 5 && echo 'Done.'"

The pod goes from Pending to Running to Completed. The event chain is identical to the healthy pod: all five events fire normally. The difference is that the container exits with code 0 and Kubernetes doesn't restart it because restartPolicy: Never.

This is how Jobs work. The pod runs its task, finishes, and stops with a phase of Succeeded.

10. Terminating (graceful shutdown)

In this scenario we're ending the lifecycle on purpose.

kubectl apply -n demo -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: slow-shutdown
spec:
  terminationGracePeriodSeconds: 30
  containers:
  - name: app
    image: busybox
    command: ["sh", "-c", "trap 'echo Received SIGTERM, cleaning up... && sleep 10 && echo Cleanup done.' TERM; echo 'Running...'; while true; do sleep 1; done"]
EOF

Once the pod is Running, we'll delete it. Kunobi gives us the option to delete a Pod by pressing Backspace. Alternatively, run this command on the CLI:

kubectl delete pod slow-shutdown -n demo

The pod transitions to Terminating. The Events tab shows a Killing event. In the Logs tab we can see "Received SIGTERM, cleaning up..." as the container handles the shutdown signal. After about 10 seconds the cleanup finishes and the container exits.

The termination sequence goes like this. First the preStop hook runs, if one is defined. Then Kubernetes sends SIGTERM to the container. The container gets terminationGracePeriodSeconds (default: 30s) to shut down cleanly. If it's still running after the grace period, Kubernetes sends SIGKILL.

SIGTERM is a polite request. SIGKILL is not. This is why graceful shutdown handlers matter in production. Without one, your app drops in-flight requests, leaves database connections open, and doesn't flush caches.


The Full Picture

After running all the scenarios, our demo namespace has pods in every possible state:

What's nice about seeing all of these together is that the status column alone tells the whole story. You can immediately tell which pods are healthy, which ones are stuck, and which ones have given up. And if you need more detail, the Events and Logs explain what went wrong.


Wrapping up

If there's one thing to take away from all of this, it's that pod statuses aren't random error messages. Each one tells you exactly where in the lifecycle something went wrong, and once you know where to look, the Events and Logs will tell you why.

The mental model is simple: there's a chain of events that every pod goes through, from scheduling to image pull to container start. When something breaks, it breaks at a specific link in that chain. Pending means it never got scheduled. ImagePullBackOff means the image couldn't be pulled. CrashLoopBackOff means the container started but keeps crashing. Once you internalize this, debugging pods stops feeling like guesswork and starts feeling like following a trail.

We used Kunobi throughout this demo because being able to see all of these statuses at a glance and click into Events and Logs without switching between terminal commands makes the whole process a lot faster. But regardless of what tool you use, the underlying concepts are the same.

Thanks for reading. Now you can go and break some pods safely.

Try Kunobi

Manage Kubernetes clusters and GitOps workflows from a single desktop app.

Available for:
Apple macOS logomacOSMicrosoft Windows logoWindowsLinux logoLinux
Download Kunobi