Bekzod Erkinov
Listen to Article
Loading...Table of contents · 15 sections
Graceful Shutdown Patterns for Laravel Queue Workers in Production
Queue workers are long-running PHP processes. Unlike a web request that lives for a few hundred milliseconds, a worker boots once and then loops for hours or days, pulling jobs off a queue and executing them. That longevity is exactly what makes them powerful — and exactly what makes shutting them down dangerous.
Every deploy, every container restart, every autoscaling event, and every server reboot kills your workers. If that kill lands in the middle of a job, you can corrupt data, double-charge a customer, send a half-finished email, or lose work silently. Graceful shutdown is the discipline of stopping a worker between jobs — never during one — and of designing your jobs so that even a hard kill is survivable.
This tutorial covers the whole picture: how Laravel's worker loop actually responds to OS signals, how to configure Supervisor, Docker, and Kubernetes so they cooperate with that loop, how queue:restart and Horizon fit in, and how to design jobs that tolerate interruption. By the end you'll be able to deploy in the middle of the day without holding your breath.
Table of contents
- Why "just kill it" is wrong
- How a Laravel worker actually shuts down
- The signals that matter
- Worker flags that change shutdown behavior
queue:restart— the soft restart- Supervisor configuration
- Docker: the PID 1 problem
- Kubernetes: termination grace periods and preStop hooks
- Laravel Horizon
- Designing jobs that survive interruption
- A complete production deployment flow
- Observability: knowing it worked
- Checklist
1. Why "just kill it" is wrong
Consider this job:
public function handle(): void
{
$order = Order::find($this->orderId);
$this->paymentGateway->charge($order); // (A) external side effect
$order->update(['status' => 'paid']); // (B) local state
Mail::to($order->user)->send(new ReceiptMail($order)); // (C) external side effect
}
If the process receives SIGKILL (signal 9, which cannot be caught) between (A) and (B), the customer has been charged but your database still says the order is unpaid. When the job retries — or when the customer clicks "pay" again — you charge them twice.
The job is not the only thing at risk. The PHP process holds:
- An open database connection (possibly inside an uncommitted transaction).
- An open Redis connection.
- A reserved job on the queue (in Redis, the job is moved to a "reserved" set with a timeout).
A clean shutdown lets the worker finish the current job, release its connections, and exit on its own terms. A hard kill leaves reserved jobs dangling until their visibility timeout expires, and may leave half-applied transactions.
The goal of graceful shutdown is simple to state:
Stop the worker only at a safe point — after the current job completes and before the next one starts.
Everything below is in service of that one sentence.
2. How a Laravel worker actually shuts down
Let's look at what php artisan queue:work does internally, because the configuration only makes sense once you understand the loop.
The worker runs an effectively infinite while (true) loop. Each iteration:
- Checks whether it should pause, quit, or keep going.
- Pulls the next job off the queue (blocking for up to
--sleepseconds if the queue is empty). - Runs the job inside a child-process-style boundary with a timeout alarm.
- Loops again.
The critical method is roughly this (simplified from Illuminate\Queue\Worker::daemon()):
public function daemon($connectionName, $queue, WorkerOptions $options)
{
// Register POSIX signal handlers if the pcntl extension is available
if ($this->supportsAsyncSignals()) {
$this->listenForSignals();
}
$lastRestart = $this->getTimestampOfLastQueueRestart();
while (true) {
// 1. Is it safe to fetch a job right now?
if (! $this->daemonShouldRun($options, $connectionName, $queue)) {
$status = $this->pauseWorker($options, $lastRestart);
if (! is_null($status)) {
return $this->stop($status, $options);
}
continue;
}
// 2. Fetch and run the next job
$job = $this->getNextJob(...);
if ($job) {
$this->runJob($job, $connectionName, $options);
} else {
$this->sleep($options->sleep);
}
// 3. After each job, decide whether to keep looping
$status = $this->stopIfNecessary($options, $lastRestart, $startTime, $jobsProcessed, $job);
if (! is_null($status)) {
return $this->stop($status, $options);
}
}
}
Three things make graceful shutdown possible:
listenForSignals()
If the pcntl extension is loaded (it almost always is on Linux production images), the worker registers handlers for OS signals:
protected function listenForSignals()
{
pcntl_async_signals(true);
pcntl_signal(SIGQUIT, fn () => $this->shouldQuit = true);
pcntl_signal(SIGTERM, fn () => $this->shouldQuit = true);
pcntl_signal(SIGUSR2, fn () => $this->paused = true);
pcntl_signal(SIGCONT, fn () => $this->paused = false);
}
Notice what the handler does: it does not stop anything immediately. It just flips a boolean, $this->shouldQuit = true. pcntl_async_signals(true) means PHP delivers the signal to this handler asynchronously between opcodes, but the handler is deliberately tiny. The actual stopping happens later, in the loop, at a safe point.
This is the heart of graceful shutdown.
SIGTERMdoesn't kill the worker. It sets a flag. The worker reads that flag after the current job finishes and exits cleanly.
stopIfNecessary()
After every job, the worker asks: should I stop now? It returns a non-null status when any of these is true:
protected function stopIfNecessary($options, $lastRestart, $startTime, $jobsProcessed, $job)
{
if ($this->shouldQuit) {
return static::EXIT_SUCCESS; // SIGTERM/SIGQUIT was received
} elseif ($this->memoryExceeded($options->memory)) {
return static::EXIT_MEMORY_LIMIT; // --memory exceeded
} elseif ($this->queueShouldRestart($lastRestart)) {
return static::EXIT_SUCCESS; // queue:restart was called
} elseif ($options->stopWhenEmpty && is_null($job)) {
return static::EXIT_SUCCESS; // --stop-when-empty and queue drained
} elseif ($options->maxTime && hr_time - $startTime >= $options->maxTime) {
return static::EXIT_SUCCESS; // --max-time reached
} elseif ($options->maxJobs && $jobsProcessed >= $options->maxJobs) {
return static::EXIT_SUCCESS; // --max-jobs reached
}
return null; // keep looping
}
WorkerStopping event
When the worker calls stop(), it fires the Illuminate\Queue\Events\WorkerStopping event before calling exit. You can hook into this for cleanup or logging:
use Illuminate\Queue\Events\WorkerStopping;
use Illuminate\Support\Facades\Event;
Event::listen(function (WorkerStopping $event) {
Log::info('Queue worker stopping', ['status' => $event->status]);
});
The one exception: a job already in progress
Here's the subtlety people miss. The signal flag is only checked between jobs. If a SIGTERM arrives while a job is actively running, the worker finishes that job first, then sees the flag, then exits. That's the desired behavior — but it means the shutdown is only as fast as your slowest job. A job that takes 10 minutes will delay shutdown by up to 10 minutes. This is the single most important fact for configuring the timeouts in the rest of this tutorial.
3. The signals that matter
| Signal | Number | Catchable? | Worker behavior |
|---|---|---|---|
SIGTERM |
15 | Yes | Graceful: finish current job, then exit. The standard "please stop" signal. |
SIGQUIT |
3 | Yes | Same as SIGTERM for the worker (finish job, then exit). |
SIGUSR2 |
12 | Yes | Pause: stop fetching new jobs but keep the process alive. Used by Horizon. |
SIGCONT |
18 | Yes | Resume after a pause. |
SIGINT |
2 | Yes | Sent by Ctrl+C in a terminal; treated as a stop. |
SIGKILL |
9 | No | Immediate, unconditional kill. The kernel destroys the process. Job is interrupted mid-execution. |
The whole game is: make sure your process manager sends SIGTERM and then waits long enough before escalating to SIGKILL. Every orchestrator — Supervisor, Docker, Kubernetes, systemd — has a configurable grace window between the polite signal and the fatal one. Your job is to make that window longer than your longest-running job.
You can send these manually to test:
# Find the worker PID
ps aux | grep 'queue:work'
# Ask it to stop gracefully
kill -TERM <pid> # or: kill -SIGTERM <pid>
# Watch it: it will exit AFTER the current job finishes, not immediately
If you kill -9 <pid> instead, the job is severed mid-flight. Never do this in a deploy script.
4. Worker flags that change shutdown behavior
Several queue:work options interact directly with shutdown. Understanding them prevents the classic "my job got killed at the 60-second mark" bug.
--timeout (default: 60)
The maximum number of seconds a single job may run. This is enforced with a separate alarm using pcntl_alarm, not by the shutdown loop. When a job exceeds its timeout, the worker process is terminated (the job is marked failed).
Crucially, --timeout must be shorter than your queue connection's retry_after (in config/queue.php), otherwise a job can be retried while the original is still running — you'll process it twice.
// config/queue.php
'redis' => [
'driver' => 'redis',
'retry_after' => 90, // must be GREATER than the worker --timeout
// ...
],
Rule of thumb: retry_after > --timeout. For example --timeout=60 with retry_after=90.
--max-time=3600
Tells the worker to exit (gracefully, between jobs) after it has been alive for this many seconds. Combined with Supervisor's auto-restart, this gives you a clean periodic recycle that frees leaked memory. The worker checks this in stopIfNecessary, so it never interrupts a job.
--max-jobs=1000
Exit gracefully after processing this many jobs. Same idea as --max-time, but counted in jobs rather than wall-clock time.
--memory=128
If process memory exceeds this many megabytes (checked between jobs), exit so the process manager restarts a fresh one. PHP doesn't reliably release memory back; long-lived workers leak, so this is your safety valve.
--rest=0
Seconds to sleep between jobs. Usually 0; raise it only to throttle.
--stop-when-empty
Exit once the queue is drained. Useful for ephemeral/batch workers (e.g. a Kubernetes Job that should terminate when there's nothing left to do), but not what you want for a long-running daemon.
Putting them together
A typical production daemon command:
php artisan queue:work redis \
--queue=high,default,low \
--tries=3 \
--backoff=10 \
--timeout=60 \
--max-time=3600 \
--max-jobs=1000 \
--memory=256 \
--sleep=3
This worker:
- Processes
highbeforedefaultbeforelow. - Retries each job up to 3 times with a 10s backoff.
- Kills any single job exceeding 60s.
- Recycles itself every hour or every 1000 jobs (whichever first), gracefully.
- Recycles if it crosses 256 MB.
Every one of those recycles happens between jobs, so they're safe. The only unsafe stop is an external SIGKILL, which the next sections are about preventing.
queue:workvsqueue:listen.queue:workboots the framework once and is the production choice.queue:listenboots a fresh process per job (slow, but picks up code changes without a restart) and is for local development only. Everything in this tutorial assumesqueue:work.
5. queue:restart — the soft restart
How do you tell a fleet of workers spread across many servers to restart and pick up newly deployed code, without signaling each process individually?
php artisan queue:restart
This command writes a timestamp to the cache (illuminate:queue:restart). Every worker, in its loop, compares that timestamp against the one it cached at boot (queueShouldRestart($lastRestart)). When the cached timestamp is newer, the worker finishes its current job and exits gracefully — and Supervisor (or whatever manager) immediately starts a fresh one running the new code.
protected function queueShouldRestart($lastRestart)
{
return $this->getTimestampOfLastQueueRestart() != $lastRestart;
}
Key properties:
- It's asynchronous and eventually-consistent. Workers don't all stop at the same instant; each stops after its current job. A worker mid-job on a 5-minute task won't restart for up to 5 minutes.
- It requires a shared cache (Redis, Memcached, database) that all workers read. If each server has its own isolated
array/filecache,queue:restarton one box won't reach workers on another. - It is the canonical way to roll out code to workers. You must run it on every deploy, because
queue:workholds your application code in memory and will keep running the old code until restarted.
# In your deploy script, after pulling new code:
php artisan migrate --force
php artisan queue:restart # <- tell workers to gracefully recycle into new code
Horizon has its own equivalent, php artisan horizon:terminate (covered below).
6. Supervisor configuration
Supervisor is the most common process manager for Laravel workers on a traditional (non-container) host. It keeps N worker processes alive, restarts them when they exit, and — importantly — controls the SIGTERM→SIGKILL grace window.
A production config at /etc/supervisor/conf.d/laravel-worker.conf:
[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/app/artisan queue:work redis --queue=high,default --tries=3 --timeout=60 --max-time=3600 --sleep=3
directory=/var/www/app
autostart=true
autorestart=true
user=www-data
numprocs=8
redirect_stderr=true
stdout_logfile=/var/www/app/storage/logs/worker.log
stdout_logfile_maxbytes=50MB
stopwaitsecs=3600
stopsignal=TERM
killasgroup=true
stopasgroup=true
The graceful-shutdown-critical lines:
stopsignal=TERM
When you run supervisorctl stop laravel-worker (or Supervisor restarts the group), it sends SIGTERM. This is the default for most setups but make it explicit. Do not set it to KILL.
stopwaitsecs=3600
This is the single most important line. After sending SIGTERM, Supervisor waits up to stopwaitsecs seconds for the process to exit on its own. If the process hasn't exited by then, Supervisor escalates to SIGKILL.
stopwaitsecs must be greater than your longest-running job. If your longest job can run for 50 minutes, and stopwaitsecs is the default 10, Supervisor will SIGKILL your worker 10 seconds into a 50-minute job — exactly the interruption you were trying to avoid. Set it generously. Many teams set it to 3600 (an hour) so it effectively never escalates.
Note the relationship with --timeout: --timeout caps a single job at, say, 60s. If every job is capped at 60s, then stopwaitsecs=120 is plenty. But if you have a special long-running queue with --timeout=3600, then stopwaitsecs for that worker program must also be ≥ 3600. Match them per worker program.
stopasgroup=true and killasgroup=true
PHP may spawn child processes (e.g. if a job shells out). These ensure the signal reaches the whole process group, so children don't get orphaned and the SIGKILL escalation also applies to the group.
Reloading config
sudo supervisorctl reread # detect config changes
sudo supervisorctl update # apply them (restarts changed programs gracefully)
sudo supervisorctl status # check
sudo supervisorctl restart laravel-worker:* # restart all 8 processes
Deploy ordering with Supervisor. On deploy you typically run
php artisan queue:restart(soft, graceful, per-job) rather thansupervisorctl restart(which also respectsstopwaitsecsbut is a heavier hammer).queue:restartis enough because workers self-recycle into the new code.
7. Docker: the PID 1 problem
Containers introduce a subtle trap that breaks graceful shutdown even when your Laravel config is perfect.
The trap
When Docker stops a container (docker stop, or an orchestrator scaling down), it sends SIGTERM to PID 1 inside the container, waits --time seconds (default 10), then SIGKILLs everything.
If your Dockerfile uses the shell form of CMD:
# ❌ WRONG — runs via /bin/sh -c, so PID 1 is the shell, not PHP
CMD php artisan queue:work redis --timeout=60
then PID 1 is /bin/sh, and sh does not forward SIGTERM to its child PHP process. Docker's SIGTERM hits the shell, which ignores it; PHP never sees it; 10 seconds later Docker SIGKILLs the whole container — interrupting whatever job was running. Graceful shutdown silently never happens.
Fix 1: exec form
Use the exec form of CMD so PHP becomes PID 1 directly and receives the signal:
# ✅ exec form — PHP is PID 1 and receives SIGTERM
CMD ["php", "artisan", "queue:work", "redis", "--timeout=60", "--max-time=3600"]
Fix 2: a proper init (tini)
Even as PID 1, PHP isn't a great init process — it won't reap zombie children if your jobs shell out. The robust fix is a tiny init that forwards signals and reaps zombies. Docker has one built in:
docker run --init your-image
Or in docker-compose.yml:
services:
worker:
image: your-app
init: true # use tini as PID 1; forwards signals, reaps zombies
command: ["php", "artisan", "queue:work", "redis", "--timeout=60", "--max-time=3600"]
stop_grace_period: 1h # <- Docker's equivalent of stopwaitsecs
stop_signal: SIGTERM
stop_grace_period
This is Docker Compose's grace window — the Docker analogue of Supervisor's stopwaitsecs. The default is 10 seconds, which is almost certainly too short for a worker. Set it to comfortably exceed your longest job. With raw docker stop, the flag is --time / -t:
docker stop -t 3600 worker_container
STOPSIGNAL
You can also declare the stop signal in the image:
STOPSIGNAL SIGTERM
Summary for Docker: exec-form
CMD(orinit: true),STOPSIGNAL SIGTERM, and a generousstop_grace_period. Without all three, your beautifully configured Laravel worker still getsSIGKILLed mid-job.
8. Kubernetes: termination grace periods and preStop hooks
Kubernetes adds more moving parts, but the principle is identical: send SIGTERM, wait, then SIGKILL. The wait is terminationGracePeriodSeconds.
The pod termination sequence
When a worker pod is deleted (deploy, scale-down, node drain):
- The pod is marked
Terminatingand removed from Service endpoints. - The
preStophook (if any) runs to completion. - Kubernetes sends
SIGTERMto PID 1 of each container. - Kubernetes waits up to
terminationGracePeriodSeconds(default 30). - If the container is still alive, Kubernetes sends
SIGKILL.
The default 30 seconds is dangerous for workers — a job longer than 30s gets killed. You must raise it.
A worker Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: laravel-worker
spec:
replicas: 4
selector:
matchLabels: { app: laravel-worker }
template:
metadata:
labels: { app: laravel-worker }
spec:
terminationGracePeriodSeconds: 3700 # > longest job + preStop sleep
containers:
- name: worker
image: your-app:latest
# exec form: PHP is PID 1 and receives SIGTERM
command: ["php", "artisan", "queue:work", "redis",
"--timeout=60", "--max-time=3600", "--tries=3"]
lifecycle:
preStop:
exec:
# Optional: stop fetching new jobs immediately, let current finish
command: ["php", "artisan", "queue:restart"]
Key points
terminationGracePeriodSecondsis the Kubernetes equivalent ofstopwaitsecs/stop_grace_period. Set it to longer than your longest job (plus anypreStoptime). Here, with a 1-hour max job, ~3700s gives headroom.- exec-form
commandso PHP is PID 1 and receivesSIGTERM. (Same PID 1 rule as Docker.) If you wrap your container with an entrypoint shell script, make sure itexec php ...so PHP replaces the shell. preStophook: optional but useful. Kubernetes runs it before sendingSIGTERM. A common pattern is to callqueue:restart(or sleep) so workers stop pulling new jobs as soon as termination begins, shortening the drain. NotepreStoptime counts against the grace period.- Liveness/readiness probes: workers don't serve HTTP, so use a process-based or exec probe (e.g. check the PHP process exists) rather than an HTTP probe, or omit liveness and rely on
--max-timerecycling.
--max-time is your friend in Kubernetes
Because pods are cattle, a worker that gracefully self-terminates every hour (--max-time=3600) and is restarted by the Deployment controller is a clean, predictable lifecycle that naturally limits memory growth and picks up new config on schedule.
Why --stop-when-empty + Jobs for batch work
For finite batch processing, model the worker as a Kubernetes Job with --stop-when-empty so the pod terminates itself when the queue drains, rather than a Deployment that runs forever.
9. Laravel Horizon
Horizon is Laravel's Redis-backed queue dashboard and supervisor. It manages worker processes for you, so you don't write Supervisor queue:work lines directly — but graceful shutdown still applies, just through Horizon's own commands.
Running Horizon under Supervisor
You run the single horizon command under Supervisor (or systemd/Docker), and Horizon spawns and manages the actual worker processes:
[program:horizon]
process_name=%(program_name)s
command=php /var/www/app/artisan horizon
directory=/var/www/app
autostart=true
autorestart=true
user=www-data
redirect_stderr=true
stdout_logfile=/var/www/app/storage/logs/horizon.log
stopwaitsecs=3600
stopsignal=TERM
The same stopwaitsecs=3600 rule applies: give Horizon time to drain.
horizon:terminate — the graceful restart
This is Horizon's queue:restart. On deploy:
php artisan horizon:terminate
Horizon's master process receives the instruction, signals each of its worker processes to finish their current job, then exits. Supervisor restarts the horizon program with the new code. Run horizon:terminate on every deploy, after migrations.
When the master process itself receives SIGTERM, Horizon handles it gracefully too: it instructs its supervisors and workers to wind down, waiting for in-flight jobs.
--wait and graceful timeouts
horizon:terminate accepts a --wait option to block until in-flight jobs are done:
php artisan horizon:terminate --wait
Horizon also exposes a waits / timeout config in config/horizon.php and respects per-job timeouts. The mental model is unchanged: Horizon never interrupts a running job on a graceful terminate; it waits for the current job and then recycles.
Horizon and SIGTERM directly
If your orchestrator sends SIGTERM to the Horizon master (Docker stop, K8s pod deletion), Horizon catches it and performs the same graceful drain — provided PID 1 is actually the PHP/Horizon process (the same exec-form rule from the Docker/K8s sections).
Deploy with Horizon:
git pull→composer install→php artisan migrate --force→php artisan horizon:terminate. Supervisor brings Horizon back up on the new code.
10. Designing jobs that survive interruption
Configuration gets you graceful shutdown most of the time. But you will eventually hit a SIGKILL: a node dies, the OOM killer fires, someone runs kill -9, a job overruns the grace period. Robust systems assume jobs can be interrupted at any point and design for it. Graceful shutdown reduces the frequency; idempotency makes the rare hard kill harmless.
Make jobs idempotent
An idempotent job produces the same result whether it runs once or five times. This is the single most valuable property for queue reliability.
The dangerous version from the intro, made safe:
public function handle(): void
{
$order = Order::findOrFail($this->orderId);
// Guard: if already paid, do nothing. Safe to re-run.
if ($order->status === 'paid') {
return;
}
// Use an idempotency key so the gateway dedupes a retried charge.
$charge = $this->paymentGateway->charge($order, idempotencyKey: "order-{$order->id}");
DB::transaction(function () use ($order, $charge) {
$order->update([
'status' => 'paid',
'charge_id' => $charge->id,
]);
});
// Dispatch the email as a SEPARATE job so a mail failure doesn't
// re-trigger the charge on retry.
SendReceiptEmail::dispatch($order->id);
}
Techniques in play:
- Status guard / dedupe check at the top — re-running is a no-op once done.
- Idempotency keys on external calls (Stripe, payment gateways, and most modern APIs support these) so the provider dedupes retried side effects.
- Split side effects into separate jobs so a retry of step C doesn't repeat step A. Each job should own one externally-observable side effect where possible.
Wrap local state changes in transactions
If a job mutates several rows, wrap them in DB::transaction(). A hard kill mid-transaction rolls back cleanly — the database is never left half-updated:
DB::transaction(function () {
$this->debitAccount($this->from, $this->amount);
$this->creditAccount($this->to, $this->amount);
});
Without the transaction, a SIGKILL between the debit and credit loses money.
Keep jobs short and chunked
A 30-minute job is a 30-minute window during which any shutdown must either wait (long drains) or kill (lost work). Break large work into many small jobs:
// Instead of one job that processes 1,000,000 rows...
User::where('needs_export', true)
->chunkById(500, function ($users) {
ExportUserChunk::dispatch($users->pluck('id')->all());
});
Each ExportUserChunk runs in seconds, so the worker reaches a safe stop point quickly. Shutdowns drain in seconds, not minutes, and a hard kill loses at most one small chunk's worth of work (which retries).
Use batches for "all-or-progress" semantics
Bus::batch() lets you dispatch many jobs, track progress, and handle partial completion — ideal when chunking, because an interrupted batch resumes from where it left off (completed chunk jobs aren't re-run).
Bus::batch($chunks->map(fn ($ids) => new ExportUserChunk($ids)))
->name('user-export')
->allowFailures()
->dispatch();
Set tries / backoff / retryUntil
Graceful shutdown and retries work together. If a job is interrupted (hard kill), it must be retried. Configure that explicitly rather than relying on infinite retries:
class ExportUserChunk implements ShouldQueue
{
public int $tries = 3;
public int $backoff = 30; // wait 30s between retries
public int $timeout = 120; // hard cap per attempt
public function retryUntil(): \DateTime
{
return now()->addMinutes(10);
}
}
Make sure failures land in the failed_jobs table (--tries > 1 or per-job $tries) so nothing is silently lost. Monitor that table.
Respect WithoutOverlapping and locks
If a job acquires a lock, a hard kill can leave the lock held until it expires. Set a sane lock expiry so an interrupted job's lock self-heals:
public function middleware(): array
{
return [(new WithoutOverlapping($this->orderId))->expireAfter(180)];
}
Watch for the --timeout vs retry_after double-run
Reiterating because it bites everyone: if a job's runtime can exceed the connection's retry_after, Laravel will make the job visible again and a second worker will start it while the first is still going. Always keep retry_after > the worker --timeout > realistic job duration. This isn't strictly "shutdown," but it's the same family of "a job running twice" bug.
11. A complete production deployment flow
Putting it together, here's a zero-interruption deploy for a Supervisor + Redis setup. The principle generalizes to Docker/K8s.
#!/usr/bin/env bash
set -euo pipefail
cd /var/www/app
# 1. Pull new code into a release directory (or in place).
git pull --ff-only origin main
# 2. Install dependencies without dev packages.
composer install --no-dev --optimize-autoloader --no-interaction
# 3. Run migrations. (Design migrations to be backward-compatible so
# old-code workers still running don't break against the new schema.)
php artisan migrate --force
# 4. Rebuild caches.
php artisan config:cache
php artisan route:cache
php artisan event:cache
# 5. Tell workers to gracefully recycle into the new code.
# Each worker finishes its current job, then exits; Supervisor
# restarts it on the new code. NO running job is interrupted.
php artisan queue:restart
# (Horizon: php artisan horizon:terminate)
The backward-compatible migration rule
During the window between migrate and the last old-code worker recycling, both old and new code run against the new schema simultaneously. So migrations must be backward-compatible:
- Adding a nullable column or new table: safe.
- Renaming or dropping a column the old code still reads/writes: unsafe — do it in two deploys (add new, migrate code to use it, then drop old in a later deploy). This "expand/contract" pattern keeps graceful worker recycling truly graceful.
What the user never notices
Because queue:restart is graceful, at no point is a job killed mid-execution. Jobs in flight complete on the old code; new jobs start on the new code. No double-charges, no half-sent emails, no corrupted rows. That's the entire payoff.
12. Observability: knowing it worked
Graceful shutdown is invisible when it works, which makes it easy to think it works while it silently doesn't (the classic shell-form Docker CMD swallowing SIGTERM). Verify it.
Log worker lifecycle events
use Illuminate\Queue\Events\WorkerStopping;
use Illuminate\Queue\Events\JobProcessing;
use Illuminate\Queue\Events\JobProcessed;
use Illuminate\Queue\Events\JobFailed;
Event::listen(function (WorkerStopping $e) {
Log::info('worker.stopping', ['exit_status' => $e->status]);
});
Event::listen(function (JobFailed $e) {
Log::error('job.failed', [
'job' => $e->job->resolveName(),
'uuid' => $e->job->uuid(),
'ex' => $e->exception->getMessage(),
]);
});
A clean shutdown logs worker.stopping with status 0 and no job.failed from interruption. If you see jobs failing with "process killed" or timeouts clustered around deploy times, your grace window is too short or SIGTERM isn't reaching PHP.
Test SIGTERM handling directly
Before trusting it in production, prove the loop sees the signal. Start a worker, dispatch a slow job, and signal it:
# Terminal 1
php artisan queue:work redis --timeout=300
# Terminal 2: dispatch a job that sleeps 30s, then immediately:
kill -TERM $(pgrep -f 'queue:work')
# Expected: the worker prints the job completing (~30s later), THEN exits.
# It does NOT exit immediately, and does NOT report the job as failed.
Do the same docker stop test against your actual image to catch the PID 1 problem:
docker run --name t --init your-image & # dispatch a slow job into it
time docker stop t # should take as long as the job, then exit 0
If docker stop returns in ~10 seconds while a 30-second job was running, SIGTERM isn't reaching PHP — fix your CMD/init/grace period.
Monitor the failed_jobs table and reserved jobs
- Alert on growth in
failed_jobs. - For Redis, watch for jobs stuck in the reserved set past
retry_after(a symptom of hard-killed workers). - Horizon's dashboard surfaces throughput, wait times, and failures directly.
13. Checklist
Use this as a pre-production audit.
Worker configuration
- Using
queue:work(notqueue:listen) in production. -
--timeoutset and less than the connection'sretry_afterinconfig/queue.php. -
--max-timeand/or--max-jobsset so workers recycle and shed leaked memory. -
--memoryset as a safety valve. -
pcntlextension installed in the production PHP image (it's what makes signal handling work).
Process manager (Supervisor)
-
stopsignal=TERM(neverKILL). -
stopwaitsecsgreater than your longest possible job. -
stopasgroup=trueandkillasgroup=true.
Docker
-
CMD/commandin exec form (JSON array), orinit: true. -
STOPSIGNAL SIGTERM. -
stop_grace_period(compose) /-t(CLI) longer than the longest job.
Kubernetes
-
terminationGracePeriodSecondslonger than longest job (+ any preStop time). - Container
commandin exec form so PHP is PID 1. - Optional
preStoprunningqueue:restartto stop intake early.
Deploy process
-
php artisan queue:restart(orhorizon:terminate) runs on every deploy, after migrations. - Shared cache (Redis/Memcached/DB) so
queue:restartreaches all workers. - Migrations are backward-compatible (expand/contract) so old- and new-code workers coexist.
Job design
- Jobs are idempotent (status guards + idempotency keys on external calls).
- Multi-step local state wrapped in
DB::transaction(). - Large work chunked into small jobs (or batched).
-
tries/backoff/timeout/retryUntilset; failures land infailed_jobs. - Locks (
WithoutOverlapping) have a saneexpireAfter.
Verification
- Manually tested
kill -TERMfinishes the current job before exiting. - Manually tested
docker stopwaits for the job (proves SIGTERM reaches PHP). - Monitoring on
failed_jobsgrowth and worker lifecycle logs.
Closing thoughts
Graceful shutdown for Laravel queue workers comes down to a chain of cooperating timeouts and one architectural insight:
The worker only stops between jobs. So every layer above it — Supervisor, Docker, Kubernetes — must send
SIGTERM(notSIGKILL) and then wait longer than your longest job before escalating. And because that wait can never be infinite, your jobs must be idempotent so the rare hard kill is harmless.
Get the signal right (SIGTERM reaches PHP as PID 1), get the grace window right (longer than your longest job), run queue:restart on every deploy, and design jobs that don't care if they run twice. Do that, and you can deploy at 2pm on a Tuesday without anyone noticing.
Keep reading
Mastering Multi-Tenant SaaS Architecture Patterns for Enterprise Applications
27 min · 313 views
Web Development10 Essential Tools for Modern Frontend Development
16 min · 289 views
Web DevelopmentDeep-Dive: Understanding and Implementing Microservices Architecture
33 min · 77 views
Bekzod Erkinov
AuthorFounder of NextGenBeing. Software engineer working with Laravel, Python, and cloud infrastructure. Writes about patterns that actually hold up in production. Based in Tashkent, Uzbekistan.
Never Miss an Article
Get our best content delivered to your inbox weekly. No spam, unsubscribe anytime.
Comments (0)
Please log in to leave a comment.
Log In