Graceful Shutdown Patterns for Laravel Queue Workers in Production - NextGenBeing Graceful Shutdown Patterns for Laravel Queue Workers in Production - NextGenBeing
Back to discoveries

Graceful Shutdown Patterns for Laravel Queue Workers in Production

Queue workers are long-running PHP processes. Unlike a web request that lives for a few hundred milliseconds, a worker boots once and then loops for hours or days, pulling jobs off a queue and…

Web Development 21 min read
Bekzod Erkinov

Bekzod Erkinov

Jun 28, 2026 0 views
Size:
Height:
📖 21 min read 📝 7,234 words 👁 Focus mode: ✨ Eye care:

Listen to Article

Loading...
0:00 / 0:00
0:00 0:00
Low High
0% 100%
⏸ Paused ▶️ Now playing... Ready to play ✓ Finished
Table of contents · 15 sections

Graceful Shutdown Patterns for Laravel Queue Workers in Production

Queue workers are long-running PHP processes. Unlike a web request that lives for a few hundred milliseconds, a worker boots once and then loops for hours or days, pulling jobs off a queue and executing them. That longevity is exactly what makes them powerful — and exactly what makes shutting them down dangerous.

Every deploy, every container restart, every autoscaling event, and every server reboot kills your workers. If that kill lands in the middle of a job, you can corrupt data, double-charge a customer, send a half-finished email, or lose work silently. Graceful shutdown is the discipline of stopping a worker between jobs — never during one — and of designing your jobs so that even a hard kill is survivable.

This tutorial covers the whole picture: how Laravel's worker loop actually responds to OS signals, how to configure Supervisor, Docker, and Kubernetes so they cooperate with that loop, how queue:restart and Horizon fit in, and how to design jobs that tolerate interruption. By the end you'll be able to deploy in the middle of the day without holding your breath.


Table of contents

  1. Why "just kill it" is wrong
  2. How a Laravel worker actually shuts down
  3. The signals that matter
  4. Worker flags that change shutdown behavior
  5. queue:restart — the soft restart
  6. Supervisor configuration
  7. Docker: the PID 1 problem
  8. Kubernetes: termination grace periods and preStop hooks
  9. Laravel Horizon
  10. Designing jobs that survive interruption
  11. A complete production deployment flow
  12. Observability: knowing it worked
  13. Checklist

1. Why "just kill it" is wrong

Consider this job:

public function handle(): void
{
    $order = Order::find($this->orderId);

    $this->paymentGateway->charge($order);   // (A) external side effect
    $order->update(['status' => 'paid']);     // (B) local state
    Mail::to($order->user)->send(new ReceiptMail($order)); // (C) external side effect
}

If the process receives SIGKILL (signal 9, which cannot be caught) between (A) and (B), the customer has been charged but your database still says the order is unpaid. When the job retries — or when the customer clicks "pay" again — you charge them twice.

The job is not the only thing at risk. The PHP process holds:

  • An open database connection (possibly inside an uncommitted transaction).
  • An open Redis connection.
  • A reserved job on the queue (in Redis, the job is moved to a "reserved" set with a timeout).

A clean shutdown lets the worker finish the current job, release its connections, and exit on its own terms. A hard kill leaves reserved jobs dangling until their visibility timeout expires, and may leave half-applied transactions.

The goal of graceful shutdown is simple to state:

Stop the worker only at a safe point — after the current job completes and before the next one starts.

Everything below is in service of that one sentence.


2. How a Laravel worker actually shuts down

Let's look at what php artisan queue:work does internally, because the configuration only makes sense once you understand the loop.

The worker runs an effectively infinite while (true) loop. Each iteration:

  1. Checks whether it should pause, quit, or keep going.
  2. Pulls the next job off the queue (blocking for up to --sleep seconds if the queue is empty).
  3. Runs the job inside a child-process-style boundary with a timeout alarm.
  4. Loops again.

The critical method is roughly this (simplified from Illuminate\Queue\Worker::daemon()):

public function daemon($connectionName, $queue, WorkerOptions $options)
{
    // Register POSIX signal handlers if the pcntl extension is available
    if ($this->supportsAsyncSignals()) {
        $this->listenForSignals();
    }

    $lastRestart = $this->getTimestampOfLastQueueRestart();

    while (true) {
        // 1. Is it safe to fetch a job right now?
        if (! $this->daemonShouldRun($options, $connectionName, $queue)) {
            $status = $this->pauseWorker($options, $lastRestart);
            if (! is_null($status)) {
                return $this->stop($status, $options);
            }
            continue;
        }

        // 2. Fetch and run the next job
        $job = $this->getNextJob(...);

        if ($job) {
            $this->runJob($job, $connectionName, $options);
        } else {
            $this->sleep($options->sleep);
        }

        // 3. After each job, decide whether to keep looping
        $status = $this->stopIfNecessary($options, $lastRestart, $startTime, $jobsProcessed, $job);

        if (! is_null($status)) {
            return $this->stop($status, $options);
        }
    }
}

Three things make graceful shutdown possible:

listenForSignals()

If the pcntl extension is loaded (it almost always is on Linux production images), the worker registers handlers for OS signals:

protected function listenForSignals()
{
    pcntl_async_signals(true);

    pcntl_signal(SIGQUIT, fn () => $this->shouldQuit = true);
    pcntl_signal(SIGTERM, fn () => $this->shouldQuit = true);
    pcntl_signal(SIGUSR2, fn () => $this->paused = true);
    pcntl_signal(SIGCONT, fn () => $this->paused = false);
}

Notice what the handler does: it does not stop anything immediately. It just flips a boolean, $this->shouldQuit = true. pcntl_async_signals(true) means PHP delivers the signal to this handler asynchronously between opcodes, but the handler is deliberately tiny. The actual stopping happens later, in the loop, at a safe point.

This is the heart of graceful shutdown. SIGTERM doesn't kill the worker. It sets a flag. The worker reads that flag after the current job finishes and exits cleanly.

stopIfNecessary()

After every job, the worker asks: should I stop now? It returns a non-null status when any of these is true:

protected function stopIfNecessary($options, $lastRestart, $startTime, $jobsProcessed, $job)
{
    if ($this->shouldQuit) {
        return static::EXIT_SUCCESS;                 // SIGTERM/SIGQUIT was received
    } elseif ($this->memoryExceeded($options->memory)) {
        return static::EXIT_MEMORY_LIMIT;            // --memory exceeded
    } elseif ($this->queueShouldRestart($lastRestart)) {
        return static::EXIT_SUCCESS;                 // queue:restart was called
    } elseif ($options->stopWhenEmpty && is_null($job)) {
        return static::EXIT_SUCCESS;                 // --stop-when-empty and queue drained
    } elseif ($options->maxTime && hr_time - $startTime >= $options->maxTime) {
        return static::EXIT_SUCCESS;                 // --max-time reached
    } elseif ($options->maxJobs && $jobsProcessed >= $options->maxJobs) {
        return static::EXIT_SUCCESS;                 // --max-jobs reached
    }

    return null; // keep looping
}

WorkerStopping event

When the worker calls stop(), it fires the Illuminate\Queue\Events\WorkerStopping event before calling exit. You can hook into this for cleanup or logging:

use Illuminate\Queue\Events\WorkerStopping;
use Illuminate\Support\Facades\Event;

Event::listen(function (WorkerStopping $event) {
    Log::info('Queue worker stopping', ['status' => $event->status]);
});

The one exception: a job already in progress

Here's the subtlety people miss. The signal flag is only checked between jobs. If a SIGTERM arrives while a job is actively running, the worker finishes that job first, then sees the flag, then exits. That's the desired behavior — but it means the shutdown is only as fast as your slowest job. A job that takes 10 minutes will delay shutdown by up to 10 minutes. This is the single most important fact for configuring the timeouts in the rest of this tutorial.


3. The signals that matter

Signal Number Catchable? Worker behavior
SIGTERM 15 Yes Graceful: finish current job, then exit. The standard "please stop" signal.
SIGQUIT 3 Yes Same as SIGTERM for the worker (finish job, then exit).
SIGUSR2 12 Yes Pause: stop fetching new jobs but keep the process alive. Used by Horizon.
SIGCONT 18 Yes Resume after a pause.
SIGINT 2 Yes Sent by Ctrl+C in a terminal; treated as a stop.
SIGKILL 9 No Immediate, unconditional kill. The kernel destroys the process. Job is interrupted mid-execution.

The whole game is: make sure your process manager sends SIGTERM and then waits long enough before escalating to SIGKILL. Every orchestrator — Supervisor, Docker, Kubernetes, systemd — has a configurable grace window between the polite signal and the fatal one. Your job is to make that window longer than your longest-running job.

You can send these manually to test:

# Find the worker PID
ps aux | grep 'queue:work'

# Ask it to stop gracefully
kill -TERM <pid>      # or: kill -SIGTERM <pid>

# Watch it: it will exit AFTER the current job finishes, not immediately

If you kill -9 <pid> instead, the job is severed mid-flight. Never do this in a deploy script.


4. Worker flags that change shutdown behavior

Several queue:work options interact directly with shutdown. Understanding them prevents the classic "my job got killed at the 60-second mark" bug.

--timeout (default: 60)

The maximum number of seconds a single job may run. This is enforced with a separate alarm using pcntl_alarm, not by the shutdown loop. When a job exceeds its timeout, the worker process is terminated (the job is marked failed).

Crucially, --timeout must be shorter than your queue connection's retry_after (in config/queue.php), otherwise a job can be retried while the original is still running — you'll process it twice.

// config/queue.php
'redis' => [
    'driver'      => 'redis',
    'retry_after' => 90,   // must be GREATER than the worker --timeout
    // ...
],

Rule of thumb: retry_after > --timeout. For example --timeout=60 with retry_after=90.

--max-time=3600

Tells the worker to exit (gracefully, between jobs) after it has been alive for this many seconds. Combined with Supervisor's auto-restart, this gives you a clean periodic recycle that frees leaked memory. The worker checks this in stopIfNecessary, so it never interrupts a job.

--max-jobs=1000

Exit gracefully after processing this many jobs. Same idea as --max-time, but counted in jobs rather than wall-clock time.

--memory=128

If process memory exceeds this many megabytes (checked between jobs), exit so the process manager restarts a fresh one. PHP doesn't reliably release memory back; long-lived workers leak, so this is your safety valve.

--rest=0

Seconds to sleep between jobs. Usually 0; raise it only to throttle.

--stop-when-empty

Exit once the queue is drained. Useful for ephemeral/batch workers (e.g. a Kubernetes Job that should terminate when there's nothing left to do), but not what you want for a long-running daemon.

Putting them together

A typical production daemon command:

php artisan queue:work redis \
    --queue=high,default,low \
    --tries=3 \
    --backoff=10 \
    --timeout=60 \
    --max-time=3600 \
    --max-jobs=1000 \
    --memory=256 \
    --sleep=3

This worker:

  • Processes high before default before low.
  • Retries each job up to 3 times with a 10s backoff.
  • Kills any single job exceeding 60s.
  • Recycles itself every hour or every 1000 jobs (whichever first), gracefully.
  • Recycles if it crosses 256 MB.

Every one of those recycles happens between jobs, so they're safe. The only unsafe stop is an external SIGKILL, which the next sections are about preventing.

queue:work vs queue:listen. queue:work boots the framework once and is the production choice. queue:listen boots a fresh process per job (slow, but picks up code changes without a restart) and is for local development only. Everything in this tutorial assumes queue:work.


5. queue:restart — the soft restart

How do you tell a fleet of workers spread across many servers to restart and pick up newly deployed code, without signaling each process individually?

php artisan queue:restart

This command writes a timestamp to the cache (illuminate:queue:restart). Every worker, in its loop, compares that timestamp against the one it cached at boot (queueShouldRestart($lastRestart)). When the cached timestamp is newer, the worker finishes its current job and exits gracefully — and Supervisor (or whatever manager) immediately starts a fresh one running the new code.

protected function queueShouldRestart($lastRestart)
{
    return $this->getTimestampOfLastQueueRestart() != $lastRestart;
}

Key properties:

  • It's asynchronous and eventually-consistent. Workers don't all stop at the same instant; each stops after its current job. A worker mid-job on a 5-minute task won't restart for up to 5 minutes.
  • It requires a shared cache (Redis, Memcached, database) that all workers read. If each server has its own isolated array/file cache, queue:restart on one box won't reach workers on another.
  • It is the canonical way to roll out code to workers. You must run it on every deploy, because queue:work holds your application code in memory and will keep running the old code until restarted.
# In your deploy script, after pulling new code:
php artisan migrate --force
php artisan queue:restart   # <- tell workers to gracefully recycle into new code

Horizon has its own equivalent, php artisan horizon:terminate (covered below).


6. Supervisor configuration

Supervisor is the most common process manager for Laravel workers on a traditional (non-container) host. It keeps N worker processes alive, restarts them when they exit, and — importantly — controls the SIGTERM→SIGKILL grace window.

A production config at /etc/supervisor/conf.d/laravel-worker.conf:

[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/app/artisan queue:work redis --queue=high,default --tries=3 --timeout=60 --max-time=3600 --sleep=3
directory=/var/www/app
autostart=true
autorestart=true
user=www-data
numprocs=8
redirect_stderr=true
stdout_logfile=/var/www/app/storage/logs/worker.log
stdout_logfile_maxbytes=50MB
stopwaitsecs=3600
stopsignal=TERM
killasgroup=true
stopasgroup=true

The graceful-shutdown-critical lines:

stopsignal=TERM

When you run supervisorctl stop laravel-worker (or Supervisor restarts the group), it sends SIGTERM. This is the default for most setups but make it explicit. Do not set it to KILL.

stopwaitsecs=3600

This is the single most important line. After sending SIGTERM, Supervisor waits up to stopwaitsecs seconds for the process to exit on its own. If the process hasn't exited by then, Supervisor escalates to SIGKILL.

stopwaitsecs must be greater than your longest-running job. If your longest job can run for 50 minutes, and stopwaitsecs is the default 10, Supervisor will SIGKILL your worker 10 seconds into a 50-minute job — exactly the interruption you were trying to avoid. Set it generously. Many teams set it to 3600 (an hour) so it effectively never escalates.

Note the relationship with --timeout: --timeout caps a single job at, say, 60s. If every job is capped at 60s, then stopwaitsecs=120 is plenty. But if you have a special long-running queue with --timeout=3600, then stopwaitsecs for that worker program must also be ≥ 3600. Match them per worker program.

stopasgroup=true and killasgroup=true

PHP may spawn child processes (e.g. if a job shells out). These ensure the signal reaches the whole process group, so children don't get orphaned and the SIGKILL escalation also applies to the group.

Reloading config

sudo supervisorctl reread       # detect config changes
sudo supervisorctl update       # apply them (restarts changed programs gracefully)
sudo supervisorctl status       # check
sudo supervisorctl restart laravel-worker:*   # restart all 8 processes

Deploy ordering with Supervisor. On deploy you typically run php artisan queue:restart (soft, graceful, per-job) rather than supervisorctl restart (which also respects stopwaitsecs but is a heavier hammer). queue:restart is enough because workers self-recycle into the new code.


7. Docker: the PID 1 problem

Containers introduce a subtle trap that breaks graceful shutdown even when your Laravel config is perfect.

The trap

When Docker stops a container (docker stop, or an orchestrator scaling down), it sends SIGTERM to PID 1 inside the container, waits --time seconds (default 10), then SIGKILLs everything.

If your Dockerfile uses the shell form of CMD:

# ❌ WRONG — runs via /bin/sh -c, so PID 1 is the shell, not PHP
CMD php artisan queue:work redis --timeout=60

then PID 1 is /bin/sh, and sh does not forward SIGTERM to its child PHP process. Docker's SIGTERM hits the shell, which ignores it; PHP never sees it; 10 seconds later Docker SIGKILLs the whole container — interrupting whatever job was running. Graceful shutdown silently never happens.

Fix 1: exec form

Use the exec form of CMD so PHP becomes PID 1 directly and receives the signal:

# ✅ exec form — PHP is PID 1 and receives SIGTERM
CMD ["php", "artisan", "queue:work", "redis", "--timeout=60", "--max-time=3600"]

Fix 2: a proper init (tini)

Even as PID 1, PHP isn't a great init process — it won't reap zombie children if your jobs shell out. The robust fix is a tiny init that forwards signals and reaps zombies. Docker has one built in:

docker run --init your-image

Or in docker-compose.yml:

services:
  worker:
    image: your-app
    init: true                # use tini as PID 1; forwards signals, reaps zombies
    command: ["php", "artisan", "queue:work", "redis", "--timeout=60", "--max-time=3600"]
    stop_grace_period: 1h     # <- Docker's equivalent of stopwaitsecs
    stop_signal: SIGTERM

stop_grace_period

This is Docker Compose's grace window — the Docker analogue of Supervisor's stopwaitsecs. The default is 10 seconds, which is almost certainly too short for a worker. Set it to comfortably exceed your longest job. With raw docker stop, the flag is --time / -t:

docker stop -t 3600 worker_container

STOPSIGNAL

You can also declare the stop signal in the image:

STOPSIGNAL SIGTERM

Summary for Docker: exec-form CMD (or init: true), STOPSIGNAL SIGTERM, and a generous stop_grace_period. Without all three, your beautifully configured Laravel worker still gets SIGKILLed mid-job.


8. Kubernetes: termination grace periods and preStop hooks

Kubernetes adds more moving parts, but the principle is identical: send SIGTERM, wait, then SIGKILL. The wait is terminationGracePeriodSeconds.

The pod termination sequence

When a worker pod is deleted (deploy, scale-down, node drain):

  1. The pod is marked Terminating and removed from Service endpoints.
  2. The preStop hook (if any) runs to completion.
  3. Kubernetes sends SIGTERM to PID 1 of each container.
  4. Kubernetes waits up to terminationGracePeriodSeconds (default 30).
  5. If the container is still alive, Kubernetes sends SIGKILL.

The default 30 seconds is dangerous for workers — a job longer than 30s gets killed. You must raise it.

A worker Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: laravel-worker
spec:
  replicas: 4
  selector:
    matchLabels: { app: laravel-worker }
  template:
    metadata:
      labels: { app: laravel-worker }
    spec:
      terminationGracePeriodSeconds: 3700   # > longest job + preStop sleep
      containers:
        - name: worker
          image: your-app:latest
          # exec form: PHP is PID 1 and receives SIGTERM
          command: ["php", "artisan", "queue:work", "redis",
                    "--timeout=60", "--max-time=3600", "--tries=3"]
          lifecycle:
            preStop:
              exec:
                # Optional: stop fetching new jobs immediately, let current finish
                command: ["php", "artisan", "queue:restart"]

Key points

  • terminationGracePeriodSeconds is the Kubernetes equivalent of stopwaitsecs / stop_grace_period. Set it to longer than your longest job (plus any preStop time). Here, with a 1-hour max job, ~3700s gives headroom.
  • exec-form command so PHP is PID 1 and receives SIGTERM. (Same PID 1 rule as Docker.) If you wrap your container with an entrypoint shell script, make sure it exec php ... so PHP replaces the shell.
  • preStop hook: optional but useful. Kubernetes runs it before sending SIGTERM. A common pattern is to call queue:restart (or sleep) so workers stop pulling new jobs as soon as termination begins, shortening the drain. Note preStop time counts against the grace period.
  • Liveness/readiness probes: workers don't serve HTTP, so use a process-based or exec probe (e.g. check the PHP process exists) rather than an HTTP probe, or omit liveness and rely on --max-time recycling.

--max-time is your friend in Kubernetes

Because pods are cattle, a worker that gracefully self-terminates every hour (--max-time=3600) and is restarted by the Deployment controller is a clean, predictable lifecycle that naturally limits memory growth and picks up new config on schedule.

Why --stop-when-empty + Jobs for batch work

For finite batch processing, model the worker as a Kubernetes Job with --stop-when-empty so the pod terminates itself when the queue drains, rather than a Deployment that runs forever.


9. Laravel Horizon

Horizon is Laravel's Redis-backed queue dashboard and supervisor. It manages worker processes for you, so you don't write Supervisor queue:work lines directly — but graceful shutdown still applies, just through Horizon's own commands.

Running Horizon under Supervisor

You run the single horizon command under Supervisor (or systemd/Docker), and Horizon spawns and manages the actual worker processes:

[program:horizon]
process_name=%(program_name)s
command=php /var/www/app/artisan horizon
directory=/var/www/app
autostart=true
autorestart=true
user=www-data
redirect_stderr=true
stdout_logfile=/var/www/app/storage/logs/horizon.log
stopwaitsecs=3600
stopsignal=TERM

The same stopwaitsecs=3600 rule applies: give Horizon time to drain.

horizon:terminate — the graceful restart

This is Horizon's queue:restart. On deploy:

php artisan horizon:terminate

Horizon's master process receives the instruction, signals each of its worker processes to finish their current job, then exits. Supervisor restarts the horizon program with the new code. Run horizon:terminate on every deploy, after migrations.

When the master process itself receives SIGTERM, Horizon handles it gracefully too: it instructs its supervisors and workers to wind down, waiting for in-flight jobs.

--wait and graceful timeouts

horizon:terminate accepts a --wait option to block until in-flight jobs are done:

php artisan horizon:terminate --wait

Horizon also exposes a waits / timeout config in config/horizon.php and respects per-job timeouts. The mental model is unchanged: Horizon never interrupts a running job on a graceful terminate; it waits for the current job and then recycles.

Horizon and SIGTERM directly

If your orchestrator sends SIGTERM to the Horizon master (Docker stop, K8s pod deletion), Horizon catches it and performs the same graceful drain — provided PID 1 is actually the PHP/Horizon process (the same exec-form rule from the Docker/K8s sections).

Deploy with Horizon: git pullcomposer installphp artisan migrate --forcephp artisan horizon:terminate. Supervisor brings Horizon back up on the new code.


10. Designing jobs that survive interruption

Configuration gets you graceful shutdown most of the time. But you will eventually hit a SIGKILL: a node dies, the OOM killer fires, someone runs kill -9, a job overruns the grace period. Robust systems assume jobs can be interrupted at any point and design for it. Graceful shutdown reduces the frequency; idempotency makes the rare hard kill harmless.

Make jobs idempotent

An idempotent job produces the same result whether it runs once or five times. This is the single most valuable property for queue reliability.

The dangerous version from the intro, made safe:

public function handle(): void
{
    $order = Order::findOrFail($this->orderId);

    // Guard: if already paid, do nothing. Safe to re-run.
    if ($order->status === 'paid') {
        return;
    }

    // Use an idempotency key so the gateway dedupes a retried charge.
    $charge = $this->paymentGateway->charge($order, idempotencyKey: "order-{$order->id}");

    DB::transaction(function () use ($order, $charge) {
        $order->update([
            'status'      => 'paid',
            'charge_id'   => $charge->id,
        ]);
    });

    // Dispatch the email as a SEPARATE job so a mail failure doesn't
    // re-trigger the charge on retry.
    SendReceiptEmail::dispatch($order->id);
}

Techniques in play:

  • Status guard / dedupe check at the top — re-running is a no-op once done.
  • Idempotency keys on external calls (Stripe, payment gateways, and most modern APIs support these) so the provider dedupes retried side effects.
  • Split side effects into separate jobs so a retry of step C doesn't repeat step A. Each job should own one externally-observable side effect where possible.

Wrap local state changes in transactions

If a job mutates several rows, wrap them in DB::transaction(). A hard kill mid-transaction rolls back cleanly — the database is never left half-updated:

DB::transaction(function () {
    $this->debitAccount($this->from, $this->amount);
    $this->creditAccount($this->to, $this->amount);
});

Without the transaction, a SIGKILL between the debit and credit loses money.

Keep jobs short and chunked

A 30-minute job is a 30-minute window during which any shutdown must either wait (long drains) or kill (lost work). Break large work into many small jobs:

// Instead of one job that processes 1,000,000 rows...
User::where('needs_export', true)
    ->chunkById(500, function ($users) {
        ExportUserChunk::dispatch($users->pluck('id')->all());
    });

Each ExportUserChunk runs in seconds, so the worker reaches a safe stop point quickly. Shutdowns drain in seconds, not minutes, and a hard kill loses at most one small chunk's worth of work (which retries).

Use batches for "all-or-progress" semantics

Bus::batch() lets you dispatch many jobs, track progress, and handle partial completion — ideal when chunking, because an interrupted batch resumes from where it left off (completed chunk jobs aren't re-run).

Bus::batch($chunks->map(fn ($ids) => new ExportUserChunk($ids)))
    ->name('user-export')
    ->allowFailures()
    ->dispatch();

Set tries / backoff / retryUntil

Graceful shutdown and retries work together. If a job is interrupted (hard kill), it must be retried. Configure that explicitly rather than relying on infinite retries:

class ExportUserChunk implements ShouldQueue
{
    public int $tries = 3;
    public int $backoff = 30;      // wait 30s between retries
    public int $timeout = 120;     // hard cap per attempt

    public function retryUntil(): \DateTime
    {
        return now()->addMinutes(10);
    }
}

Make sure failures land in the failed_jobs table (--tries > 1 or per-job $tries) so nothing is silently lost. Monitor that table.

Respect WithoutOverlapping and locks

If a job acquires a lock, a hard kill can leave the lock held until it expires. Set a sane lock expiry so an interrupted job's lock self-heals:

public function middleware(): array
{
    return [(new WithoutOverlapping($this->orderId))->expireAfter(180)];
}

Watch for the --timeout vs retry_after double-run

Reiterating because it bites everyone: if a job's runtime can exceed the connection's retry_after, Laravel will make the job visible again and a second worker will start it while the first is still going. Always keep retry_after > the worker --timeout > realistic job duration. This isn't strictly "shutdown," but it's the same family of "a job running twice" bug.


11. A complete production deployment flow

Putting it together, here's a zero-interruption deploy for a Supervisor + Redis setup. The principle generalizes to Docker/K8s.

#!/usr/bin/env bash
set -euo pipefail

cd /var/www/app

# 1. Pull new code into a release directory (or in place).
git pull --ff-only origin main

# 2. Install dependencies without dev packages.
composer install --no-dev --optimize-autoloader --no-interaction

# 3. Run migrations. (Design migrations to be backward-compatible so
#    old-code workers still running don't break against the new schema.)
php artisan migrate --force

# 4. Rebuild caches.
php artisan config:cache
php artisan route:cache
php artisan event:cache

# 5. Tell workers to gracefully recycle into the new code.
#    Each worker finishes its current job, then exits; Supervisor
#    restarts it on the new code. NO running job is interrupted.
php artisan queue:restart
#    (Horizon: php artisan horizon:terminate)

The backward-compatible migration rule

During the window between migrate and the last old-code worker recycling, both old and new code run against the new schema simultaneously. So migrations must be backward-compatible:

  • Adding a nullable column or new table: safe.
  • Renaming or dropping a column the old code still reads/writes: unsafe — do it in two deploys (add new, migrate code to use it, then drop old in a later deploy). This "expand/contract" pattern keeps graceful worker recycling truly graceful.

What the user never notices

Because queue:restart is graceful, at no point is a job killed mid-execution. Jobs in flight complete on the old code; new jobs start on the new code. No double-charges, no half-sent emails, no corrupted rows. That's the entire payoff.


12. Observability: knowing it worked

Graceful shutdown is invisible when it works, which makes it easy to think it works while it silently doesn't (the classic shell-form Docker CMD swallowing SIGTERM). Verify it.

Log worker lifecycle events

use Illuminate\Queue\Events\WorkerStopping;
use Illuminate\Queue\Events\JobProcessing;
use Illuminate\Queue\Events\JobProcessed;
use Illuminate\Queue\Events\JobFailed;

Event::listen(function (WorkerStopping $e) {
    Log::info('worker.stopping', ['exit_status' => $e->status]);
});
Event::listen(function (JobFailed $e) {
    Log::error('job.failed', [
        'job'  => $e->job->resolveName(),
        'uuid' => $e->job->uuid(),
        'ex'   => $e->exception->getMessage(),
    ]);
});

A clean shutdown logs worker.stopping with status 0 and no job.failed from interruption. If you see jobs failing with "process killed" or timeouts clustered around deploy times, your grace window is too short or SIGTERM isn't reaching PHP.

Test SIGTERM handling directly

Before trusting it in production, prove the loop sees the signal. Start a worker, dispatch a slow job, and signal it:

# Terminal 1
php artisan queue:work redis --timeout=300

# Terminal 2: dispatch a job that sleeps 30s, then immediately:
kill -TERM $(pgrep -f 'queue:work')

# Expected: the worker prints the job completing (~30s later), THEN exits.
# It does NOT exit immediately, and does NOT report the job as failed.

Do the same docker stop test against your actual image to catch the PID 1 problem:

docker run --name t --init your-image &   # dispatch a slow job into it
time docker stop t                        # should take as long as the job, then exit 0

If docker stop returns in ~10 seconds while a 30-second job was running, SIGTERM isn't reaching PHP — fix your CMD/init/grace period.

Monitor the failed_jobs table and reserved jobs

  • Alert on growth in failed_jobs.
  • For Redis, watch for jobs stuck in the reserved set past retry_after (a symptom of hard-killed workers).
  • Horizon's dashboard surfaces throughput, wait times, and failures directly.

13. Checklist

Use this as a pre-production audit.

Worker configuration

  • Using queue:work (not queue:listen) in production.
  • --timeout set and less than the connection's retry_after in config/queue.php.
  • --max-time and/or --max-jobs set so workers recycle and shed leaked memory.
  • --memory set as a safety valve.
  • pcntl extension installed in the production PHP image (it's what makes signal handling work).

Process manager (Supervisor)

  • stopsignal=TERM (never KILL).
  • stopwaitsecs greater than your longest possible job.
  • stopasgroup=true and killasgroup=true.

Docker

  • CMD/command in exec form (JSON array), or init: true.
  • STOPSIGNAL SIGTERM.
  • stop_grace_period (compose) / -t (CLI) longer than the longest job.

Kubernetes

  • terminationGracePeriodSeconds longer than longest job (+ any preStop time).
  • Container command in exec form so PHP is PID 1.
  • Optional preStop running queue:restart to stop intake early.

Deploy process

  • php artisan queue:restart (or horizon:terminate) runs on every deploy, after migrations.
  • Shared cache (Redis/Memcached/DB) so queue:restart reaches all workers.
  • Migrations are backward-compatible (expand/contract) so old- and new-code workers coexist.

Job design

  • Jobs are idempotent (status guards + idempotency keys on external calls).
  • Multi-step local state wrapped in DB::transaction().
  • Large work chunked into small jobs (or batched).
  • tries/backoff/timeout/retryUntil set; failures land in failed_jobs.
  • Locks (WithoutOverlapping) have a sane expireAfter.

Verification

  • Manually tested kill -TERM finishes the current job before exiting.
  • Manually tested docker stop waits for the job (proves SIGTERM reaches PHP).
  • Monitoring on failed_jobs growth and worker lifecycle logs.

Closing thoughts

Graceful shutdown for Laravel queue workers comes down to a chain of cooperating timeouts and one architectural insight:

The worker only stops between jobs. So every layer above it — Supervisor, Docker, Kubernetes — must send SIGTERM (not SIGKILL) and then wait longer than your longest job before escalating. And because that wait can never be infinite, your jobs must be idempotent so the rare hard kill is harmless.

Get the signal right (SIGTERM reaches PHP as PID 1), get the grace window right (longer than your longest job), run queue:restart on every deploy, and design jobs that don't care if they run twice. Do that, and you can deploy at 2pm on a Tuesday without anyone noticing.

Bekzod Erkinov

Bekzod Erkinov

Author

Founder of NextGenBeing. Software engineer working with Laravel, Python, and cloud infrastructure. Writes about patterns that actually hold up in production. Based in Tashkent, Uzbekistan.

Never Miss an Article

Get our best content delivered to your inbox weekly. No spam, unsubscribe anytime.

Comments (0)

Please log in to leave a comment.

Log In

Related Articles

Don't miss the next deep dive

Get one well-researched tutorial in your inbox each week. No spam, unsubscribe anytime.