Why FastAPI Should Not Be Run with Default Uvicorn in Production

Published on March 1, 2026

It's Friday evening. Your FastAPI app has been running on a single Uvicorn process for three weeks. Traffic is growing. Then a Slack message from your CEO: "The API is down." You SSH into the server. The Uvicorn process is gone. No log, no error, no trace. A C extension in one of your dependencies hit a segfault, the process died, and nobody restarted it. Your entire API was a single process away from a total outage.

This scenario plays out more often than you'd think. And it's completely preventable. The problem isn't Uvicorn - it's running Uvicorn the way you'd run it during development and expecting production-grade behavior.

What "Default Uvicorn" Actually Means

When we say "default Uvicorn," we mean this:

uvicorn main:app --host 0.0.0.0 --port 8000

This starts one process with one event loop on one CPU core. No process supervision. No crash recovery. No graceful reloading. It's the equivalent of running Django with python manage.py runserver - perfectly fine for development, not designed for production traffic.

Let's be specific about what's missing.

Problem 1: Single Process = Single Point of Failure

The most dangerous aspect of running default Uvicorn is that your entire API runs in a single OS process. If that process dies - for any reason - your API is completely offline.

What kills a Uvicorn process in production:

Segfaults in C extensions (numpy, pandas, lxml, certain database drivers)
OOM killer - the Linux kernel kills your process when the server runs out of memory
Unhandled exceptions at the wrong level (not in your endpoint, but in middleware or ASGI lifecycle)
Stuck I/O - a blocking call that hangs indefinitely, eventually consuming all resources
OS-level issues - disk full, network interface reset, file descriptor exhaustion

With Gunicorn managing multiple workers, one worker crash is invisible to users. The master process spawns a replacement in milliseconds. With a single Uvicorn process, one crash = total downtime.

The Math Is Brutal

If a single Uvicorn process has a 0.1% chance of crashing in any given hour (which is optimistic for a complex app with C dependencies), that's:

~99.9% uptime per hour
~93% uptime per month (about 50 hours of downtime)

With 4 Gunicorn workers, all 4 would need to crash simultaneously for a full outage. The probability drops to nearly zero for independent failures.

Problem 2: You're Using 1 of N CPU Cores

Python's Global Interpreter Lock (GIL) means a single Python process can only use one CPU core for Python execution at a time. Yes, async I/O helps - while one coroutine awaits a database response, another can run. But:

You're limited to one core for CPU work (serialization, validation, business logic)
The event loop itself runs on one core
Your 4-core server is 75% idle; your 8-core server is 87.5% idle

This isn't just waste - it's a scalability ceiling. When traffic spikes, you can't utilize your hardware.

# Default Uvicorn: uses 1 of 4 cores
uvicorn main:app --host 0.0.0.0 --port 8000

# With Gunicorn: uses all 4 cores
gunicorn main:app -k uvicorn.workers.UvicornWorker -w 4

Each Gunicorn worker runs its own event loop on its own core. Four workers = four event loops = four cores handling requests concurrently.

Problem 3: No Graceful Reload

You deploy new code. How do you restart Uvicorn?

# Option 1: Kill and restart (drops all in-flight requests)
kill $(pgrep uvicorn)
uvicorn main:app --host 0.0.0.0 --port 8000 &

# Option 2: Use --reload (filesystem watcher - NOT for production)
uvicorn main:app --reload

Both options are terrible for production:

Option 1 drops every request being processed at the moment you kill the process. Users see 502 errors. API consumers get connection resets.
Option 2 uses a filesystem watcher designed for development. It adds overhead, can trigger false restarts, and doesn't gracefully drain connections.

With Gunicorn, graceful reload is a single signal:

kill -HUP $(cat /var/run/gunicorn.pid)

Gunicorn starts new workers with the new code, lets old workers finish their current requests, then shuts down old workers. Zero dropped requests. Zero downtime. This is a solved problem - if you use the right tool.

Problem 4: No Worker Health Monitoring

In production, workers can get into bad states without actually crashing:

Memory leaks - a worker slowly consumes more and more RAM until the OOM killer eventually takes it out (taking all in-flight requests with it)
Hung workers - a worker gets stuck on a blocking call and stops responding, but the process is technically still alive
Zombie connections - the worker holds open connections that are no longer useful

Gunicorn addresses all of these:

# gunicorn_config.py
timeout = 120           # Kill workers that don't respond for 120 seconds
max_requests = 10000    # Recycle workers after 10k requests (memory leak protection)
max_requests_jitter = 1000  # Stagger restarts

The timeout catches hung workers. The max_requests prevents memory leaks from becoming incidents. The jitter ensures workers don't all restart at the same time (which would cause a brief capacity drop).

Default Uvicorn has none of this.

Problem 5: No Dynamic Scaling or Signal Handling

Gunicorn supports rich signal handling for operational control:

Signal	Action
HUP	Graceful reload (new code, no downtime)
TTIN	Add one worker
TTOU	Remove one worker
QUIT	Graceful shutdown
TERM	Fast shutdown
USR2	Upgrade Gunicorn itself (binary upgrade)

Need to scale up during a traffic spike? kill -TTIN pid. Need to drain workers for maintenance? kill -QUIT <pid

Default Uvicorn supports TERM and INT for shutdown. That's it. No graceful reload, no dynamic scaling, no operational flexibility.

"But I'm Running in Docker/Kubernetes"

This is the one legitimate exception. If your container orchestrator handles:

Crash recovery -> Kubernetes restarts failed pods
Scaling -> Horizontal Pod Autoscaler adds replicas
Health checks -> Liveness/readiness probes detect stuck processes
Rolling deploys -> New pods start before old ones stop

Then you can run a single Uvicorn process per container:

uvicorn main:app --host 0.0.0.0 --port 8000

The orchestrator is your process manager. Each container is a worker. Kubernetes does what Gunicorn's master process does - but at the container level.

However, even in Kubernetes, there are arguments for multi-worker containers:

Startup time: If your app takes 30 seconds to start (loading ML models, warming caches), spinning up 4 pods is slower than having 4 workers in one pod.
Resource efficiency: One pod with 4 workers shares memory for imported libraries (copy-on-write after fork). Four separate pods each load everything independently.
Connection pooling: One pod with 4 workers can share a database connection pool more efficiently than 4 pods with separate pools.

For most Kubernetes deployments, single-process containers are the standard pattern. But know the tradeoffs.

What Production Actually Needs

Here's the checklist for a production-ready FastAPI deployment:

Requirement	Default Uvicorn	Gunicorn + UvicornWorker	Kubernetes + Uvicorn
Multi-core utilization	No	Yes	Yes (via pods)
Crash recovery	No	Yes	Yes
Graceful reload	No	Yes	Yes (rolling deploy)
Worker health monitoring	No	Yes	Yes (probes)
Memory leak protection	No	Yes (max_requests)	Partial (pod restart)
Dynamic scaling	No	Yes (TTIN/TTOU)	Yes (HPA)
Signal handling	Basic	Comprehensive	Managed by K8s

Default Uvicorn fails on every single production requirement. That's not because Uvicorn is bad - it's because it's solving a different problem. Uvicorn is an ASGI server. Gunicorn is a process manager. Production needs both.

The Fix: Three Options

Option 1: Gunicorn + UvicornWorker (VMs, bare metal)

gunicorn main:app \
  -k uvicorn.workers.UvicornWorker \
  -w $(( $(nproc) * 2 + 1 )) \
  -b 0.0.0.0:8000 \
  --timeout 120 \
  --max-requests 10000 \
  --max-requests-jitter 1000 \
  --access-logfile - \
  --error-logfile -

This is the standard production setup for VMs. Put Nginx in front for SSL and request buffering. Use systemd to manage the Gunicorn process. For a complete walkthrough, see our step-by-step deployment guide.

Option 2: Uvicorn in Kubernetes

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 4
  template:
    spec:
      containers:
        - name: api
          image: myapp:latest
          command: ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
          ports:
            - containerPort: 8000
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

Option 3: Uvicorn with --workers (Simple Cases)

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

This gives you multi-process without Gunicorn. It's simpler but lacks Gunicorn's mature process management, max_requests, dynamic scaling, and server hooks. Use this for internal tools or low-criticality services where operational maturity isn't essential.

The Real Risk: "It Works in Staging"

The trap is that default Uvicorn works great - until it doesn't. In staging, with low traffic and minimal load, a single Uvicorn process is fine. It handles your QA team's test requests without breaking a sweat.

The failures only show up under real conditions:

Traffic spikes expose the single-core bottleneck
Long-running processes reveal the lack of timeout handling
Rare C extension bugs cause segfaults that crash the only process
Memory leaks accumulate over days, not hours

By the time you discover these issues, you're debugging in production at 3 AM. The fix takes 5 minutes if you set it up correctly from the start.

FAQ

Is uvicorn --workers 4 the same as Gunicorn?

No. uvicorn --workers 4 gives you 4 processes, but Uvicorn's process management is simpler than Gunicorn's. You miss out on max_requests with jitter, dynamic worker scaling via signals, and server lifecycle hooks. For a detailed comparison, see our Gunicorn vs Uvicorn article.