When Should You Use Gunicorn + Uvicorn Together?

Published on March 2, 2026

You've chosen FastAPI. You've written your async endpoints. You've tested locally with uvicorn main:app --reload and everything is beautiful. Now it's time to deploy, and you're reading through deployment guides that say: "Use Gunicorn with Uvicorn workers in production."

Wait - why? Uvicorn already runs your app. Why do you need another server wrapped around it? Isn't that just adding complexity for no reason?

Sometimes it is. Sometimes it's exactly what you need. The answer depends on where you're deploying, how you're scaling, and what happens when things go wrong at 3 AM. Let's walk through when the combination makes sense and when it doesn't.

What the Combined Setup Actually Looks Like

First, let's be clear about what "Gunicorn + Uvicorn" means in practice:

gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000

Here's what happens when you run this:

Gunicorn's master process starts. It binds to port 8000.
Gunicorn forks 4 worker processes (the -w 4).
Each worker process is a UvicornWorker - which means each worker runs its own Uvicorn instance with its own async event loop.
The master process doesn't handle HTTP requests. It manages the workers: monitoring health, restarting crashes, handling system signals.

So you get: Gunicorn's process management wrapping Uvicorn's async request handling. The master is Gunicorn. The workers are Uvicorn. Each worker handles hundreds of concurrent async requests through its event loop.

This is different from uvicorn main:app --workers 4, which uses Uvicorn's own built-in multiprocess mode. Both give you 4 async workers. The difference is in who manages those workers.

Why Gunicorn's Process Management Matters

To understand when you need Gunicorn in front, you need to understand what process management actually involves in production.

Worker Crash Recovery

Your worker processes will crash. Maybe a third-party library segfaults. Maybe an edge case in your code raises an unhandled exception at the wrong level. Maybe the OS kills a worker due to memory pressure (OOM killer).

Gunicorn's master process detects dead workers and immediately spawns replacements. It's been doing this reliably for over a decade. The detection is fast - Gunicorn monitors workers via a heartbeat mechanism. If a worker stops sending heartbeats (configurable via timeout), the master kills and restarts it.

Uvicorn's --workers mode also restarts crashed workers, but Gunicorn's implementation is more battle-tested and has more configurability around timeouts, graceful shutdown periods, and max requests per worker (useful for mitigating memory leaks).

Graceful Reloads (Zero-Downtime Deploys)

You push new code and need to reload the application without dropping active requests.

With Gunicorn, you send SIGHUP to the master process:

kill -HUP $(cat /var/run/gunicorn.pid)

Gunicorn then:

Spawns new workers with the updated code
Lets old workers finish their current requests (up to graceful_timeout)
Shuts down old workers once they're done or the timeout expires

Active requests are not dropped. New requests go to new workers. This is a well-understood, reliable pattern.

With Uvicorn's --workers mode, graceful reload support exists but is less mature. In many containerized deployments, you sidestep this entirely by doing rolling deployments at the container level - but on VMs, Gunicorn's signal handling is significantly more robust.

Dynamic Worker Scaling

Gunicorn supports scaling workers up and down at runtime via signals:

kill -TTIN $(cat /var/run/gunicorn.pid)  # Add a worker
kill -TTOU $(cat /var/run/gunicorn.pid)  # Remove a worker

This is useful for manual scaling during traffic spikes on VMs. Uvicorn's --workers mode doesn't support this - you'd need to restart the entire process.

Max Requests (Memory Leak Protection)

Gunicorn's --max-requests flag automatically restarts a worker after it has handled N requests. Combined with --max-requests-jitter, workers restart at staggered intervals:

gunicorn main:app -k uvicorn.workers.UvicornWorker \
  --max-requests 10000 \
  --max-requests-jitter 1000

This is a pragmatic defense against slow memory leaks in your application or third-party libraries. Each worker gets recycled periodically, keeping memory usage stable over days and weeks. Uvicorn's --workers mode has --limit-max-requests but lacks the jitter option.

When You Should Use Gunicorn + Uvicorn Together

1. Deploying on Bare Metal or VMs

This is the clearest case. If you're running your app on a VM (EC2 instance, DigitalOcean droplet, on-prem server) without container orchestration:

Client -> Nginx -> Gunicorn (Uvicorn workers) -> Your FastAPI App

You need something to:

Run multiple worker processes to use all CPU cores
Restart workers that crash
Handle graceful reloads when you deploy new code
Manage the application lifecycle via systemd

Gunicorn fills all of these roles. Uvicorn alone doesn't cover graceful reloads or dynamic scaling well enough for production VMs.

A typical systemd service file:

[Unit]
Description=FastAPI App
After=network.target

[Service]
User=app
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/venv/bin/gunicorn main:app \
  -k uvicorn.workers.UvicornWorker \
  -w 4 \
  -b 0.0.0.0:8000 \
  --access-logfile - \
  --error-logfile -
ExecReload=/bin/kill -HUP $MAINPID
Restart=always

[Install]
WantedBy=multi-user.target

2. Applications With Memory Leaks You Can't Fix Immediately

Every production application has dependencies. Some of those dependencies leak memory - slowly, over days. You've filed the issue upstream, but the fix isn't coming soon.

Gunicorn's --max-requests is your friend:

gunicorn main:app -k uvicorn.workers.UvicornWorker \
  -w 4 \
  --max-requests 5000 \
  --max-requests-jitter 500

Workers get recycled after ~5000 requests (±500 for jitter). Memory stays stable. Your ops team sleeps through the night.

3. When You Need Advanced Configuration Hooks

Gunicorn supports server hooks that execute at specific points in the worker lifecycle:

# gunicorn_config.py
def on_starting(server):
    """Called just before the master process is initialized."""
    pass

def pre_fork(server, worker):
    """Called just before a worker is forked."""
    pass

def post_fork(server, worker):
    """Called just after a worker has been forked."""
    pass

def pre_exec(server):
    """Called just before a new master process is forked (for hot code reload)."""
    pass

def when_ready(server):
    """Called just after the server is started."""
    pass

def worker_exit(server, worker):
    """Called when a worker exits."""
    # Close database connections, flush logs, etc.
    pass

These hooks let you run initialization code, clean up resources, emit metrics, or implement custom health checks at precise points. Uvicorn's --workers mode provides ASGI lifespan events (startup/shutdown) within each worker, but not the process-level hooks that Gunicorn offers.

4. Mixed Workloads With CPU-Bound Operations

If your FastAPI app has some endpoints that do real CPU work (data processing, PDF generation, image manipulation), the combined setup helps:

Multiple Uvicorn workers = multiple event loops on separate cores
CPU work in one worker doesn't block the event loops of other workers
Gunicorn's timeout kills and restarts workers that get stuck on expensive computations

Without Gunicorn, a single uvicorn --workers 4 process group has simpler timeout handling. Gunicorn's timeout and graceful_timeout give you more control over how long a worker can be unresponsive before it's killed.

When You Should NOT Use Gunicorn + Uvicorn Together

1. Kubernetes or Docker Swarm Deployments

In container orchestration, the recommended pattern is:

One container = One Uvicorn process

uvicorn main:app --host 0.0.0.0 --port 8000

Why skip Gunicorn here?

Crash recovery: Kubernetes restarts crashed pods automatically
Scaling: Kubernetes scales by adding pods, not workers inside a pod
Health checks: Kubernetes has liveness and readiness probes
Graceful shutdown: Kubernetes sends SIGTERM and waits for terminationGracePeriodSeconds
Zero-downtime deploys: Rolling deployments are built into Kubernetes

Adding Gunicorn inside the container adds a layer of process management that duplicates what the orchestrator already does. It also makes resource limits harder to reason about - Kubernetes sees one process (Gunicorn master), but you actually have N+1 processes competing for the pod's CPU/memory limits.

The exception: if you want to run multiple workers in a single pod to maximize CPU utilization (e.g., a large VM node with 16 cores and pods that get 4 cores each), you might use Gunicorn inside the container. But the Kubernetes community generally recommends scaling horizontally with more pods rather than vertically with more workers per pod.

2. Serverless or Edge Deployments

If you're deploying to AWS Lambda, Google Cloud Functions, Vercel, or similar serverless platforms, you don't need either server - the platform handles process lifecycle entirely. Your app is invoked as a function, not as a long-running server.

3. Simple Internal Tools or Low-Traffic APIs

If your API serves 10 requests per minute and runs on a small VM, the added complexity of Gunicorn isn't worth it. Run Uvicorn directly, put it behind a systemd service for auto-restart, and move on:

uvicorn main:app --host 0.0.0.0 --port 8000

Systemd already restarts crashed processes. For low-traffic internal tools, this is plenty.

4. Development and Staging

This should be obvious, but: never use the Gunicorn + Uvicorn combo for development. Use:

uvicorn main:app --reload

The --reload flag watches your files and restarts the server when code changes. Gunicorn's reload is for production deploys, not development iteration.

Configuration: Getting the Combined Setup Right

If you've decided the combo is right for your deployment, here's a production-ready configuration:

# gunicorn_config.py
import multiprocessing

# Server socket
bind = "0.0.0.0:8000"

# Worker processes
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"

# Worker lifecycle
max_requests = 10000
max_requests_jitter = 1000
timeout = 120
graceful_timeout = 30
keepalive = 5

# Logging
accesslog = "-"
errorlog = "-"
loglevel = "info"

# Security
limit_request_line = 8190
limit_request_fields = 100
limit_request_field_size = 8190

Worker Count Formula

The classic recommendation is (2 × CPU_CORES) + 1. But for async workers, this is a starting point, not a rule:

I/O-heavy apps (lots of await, little CPU): You can get away with fewer workers because each one handles hundreds of concurrent requests. Start with CPU_CORES + 1.
Mixed workloads (some CPU-bound endpoints): Use (2 × CPU_CORES) + 1 to absorb CPU spikes without blocking all event loops.
Memory-constrained environments: Each worker has its own memory footprint. Monitor RSS and adjust downward if you're hitting limits.

Always benchmark with your actual workload. Synthetic formulas are starting points, not answers.

Decision Framework

Question	Gunicorn + Uvicorn	Uvicorn Alone
Deploying on a VM without orchestration?	Yes	No
Running in Kubernetes/Docker?	Usually no	Yes
Need graceful zero-downtime reloads?	Yes	Basic support
Need worker crash auto-recovery?	Yes (more mature)	Yes (simpler)
Need --max-requests with jitter?	Yes	Limited
Need dynamic worker scaling at runtime?	Yes (TTIN/TTOU)	No
Need server lifecycle hooks?	Yes	No (ASGI lifespan only)
Simple low-traffic internal API?	Overkill	Yes
Development environment?	Never	Yes (--reload)

Final Recommendation

Use Gunicorn + Uvicorn workers when you're deploying an async Python app on VMs or bare metal and you need the operational maturity that comes with a battle-tested process manager. The combination gives you Uvicorn's async performance with Gunicorn's production-grade worker lifecycle management.

Use Uvicorn alone when you're in a containerized environment where the orchestrator already handles restarts, scaling, and health checks. Adding Gunicorn inside a Kubernetes pod usually duplicates functionality and adds unnecessary complexity.

The real insight is this: Gunicorn isn't competing with Uvicorn in this setup - it's complementing it. Gunicorn manages processes. Uvicorn handles requests. Each does what it's best at. The question isn't "which is better," it's "does my deployment need a process manager that runs inside the application layer?"

If your orchestrator already provides that - skip Gunicorn. If it doesn't - add it. That's the entire decision.