Scaling Node.js Applications
Scaling Node.js Applications
Node.js is single-threaded by default, but production applications need to scale. This chapter covers patterns for utilizing multiple cores, distributing load, and architecting scalable systems.
The Scaling Challenge
Node.js runs JavaScript on a single thread (the event loop). To fully utilize a multi-core server, you need multiple Node.js processes:
┌─────────────────────────────────────────────────┐
│ 8-Core Server │
├─────────────────────────────────────────────────┤
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Node │ │Node │ │Node │ │Node │ │
│ │ 1 │ │ 2 │ │ 3 │ │ 4 │ ... idle ... │
│ └─────┘ └─────┘ └─────┘ └─────┘ │
│ │
│ Single process = 1 core used = 7 cores wasted │
└─────────────────────────────────────────────────┘Cluster Module
What is the Cluster Module?
The Cluster Module is Node.js's built-in solution for creating multiple child processes (workers) that share the same server port. It allows you to spawn a pool of identical workers managed by a primary (master) process.
Think of it like a restaurant kitchen:
┌─────────────────────────────────────────────────────────┐
│ RESTAURANT KITCHEN │
├─────────────────────────────────────────────────────────┤
│ │
│ HEAD CHEF (Primary Process) │
│ ├── Receives all orders │
│ ├── Distributes work │
│ └── Replaces chefs who get sick │
│ │
│ LINE COOKS (Worker Processes) │
│ ├── Chef 1: Cooking Order A │
│ ├── Chef 2: Cooking Order B │
│ ├── Chef 3: Cooking Order C │
│ └── Chef 4: Cooking Order D │
│ │
│ One kitchen (port), multiple cooks (processes) │
└─────────────────────────────────────────────────────────┘Why Do We Need It?
Problem 1: Node.js is Single-Threaded
By default, Node.js runs on a single CPU core. On an 8-core machine, 87.5% of your CPU power sits idle:
Before Clustering:
┌────────────────────────────────────────┐
│ CPU Core 1: [████████ Node.js ████████] │ ← 100% utilized
│ CPU Core 2: [ ] │ ← 0% (wasted)
│ CPU Core 3: [ ] │ ← 0% (wasted)
│ CPU Core 4: [ ] │ ← 0% (wasted)
│ CPU Core 5: [ ] │ ← 0% (wasted)
│ CPU Core 6: [ ] │ ← 0% (wasted)
│ CPU Core 7: [ ] │ ← 0% (wasted)
│ CPU Core 8: [ ] │ ← 0% (wasted)
└────────────────────────────────────────┘
After Clustering:
┌────────────────────────────────────────┐
│ CPU Core 1: [████████ Worker 1 ████████] │ ← 100% utilized
│ CPU Core 2: [████████ Worker 2 ████████] │ ← 100% utilized
│ CPU Core 3: [████████ Worker 3 ████████] │ ← 100% utilized
│ CPU Core 4: [████████ Worker 4 ████████] │ ← 100% utilized
│ CPU Core 5: [████████ Worker 5 ████████] │ ← 100% utilized
│ CPU Core 6: [████████ Worker 6 ████████] │ ← 100% utilized
│ CPU Core 7: [████████ Worker 7 ████████] │ ← 100% utilized
│ CPU Core 8: [████████ Worker 8 ████████] │ ← 100% utilized
└────────────────────────────────────────┘Problem 2: One Crash Takes Down Everything
With a single process, one uncaught exception crashes your entire application. With clustering:
- Worker crashes → Primary spawns a replacement
- Other workers continue serving requests
- Users may never notice!
Problem 3: Deployment Requires Downtime
Without clustering, deploying new code means:
- Stop server → Requests fail
- Deploy code
- Start server → Back online
With clustering (zero-downtime deploy):
- Start new workers with new code
- New workers handle new requests
- Old workers finish current requests
- Old workers gracefully exit
- Zero dropped requests!
Basic Cluster Example
Node.js's built-in cluster module forks multiple worker processes:
const cluster = require("cluster");
const http = require("http");
const os = require("os");
const numCPUs = os.cpus().length;
if (cluster.isPrimary) {
console.log(`Primary ${process.pid} starting ${numCPUs} workers`);
// Fork workers
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
// Handle worker death
cluster.on("exit", (worker, code, signal) => {
console.log(`Worker ${worker.process.pid} died (${signal || code})`);
console.log("Starting replacement worker");
cluster.fork();
});
} else {
// Workers share the TCP connection
http
.createServer((req, res) => {
res.writeHead(200);
res.end(`Hello from worker ${process.pid}\n`);
})
.listen(8000);
console.log(`Worker ${process.pid} started`);
}How Clustering Works
┌─────────────────┐
│ Primary │
Port 8000 │ (manager) │
─────────────► │ │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
│ PID 101 │ │ PID 102 │ │ PID 103 │
└──────────┘ └──────────┘ └──────────┘
- Primary process manages workers
- Workers share the server port
- OS distributes connections (round-robin on Linux)Production-Ready Cluster
const cluster = require("cluster");
const os = require("os");
class ClusterManager {
constructor(options = {}) {
this.workers = new Map();
this.numWorkers = options.workers || os.cpus().length;
this.shutdownTimeout = options.shutdownTimeout || 10000;
}
start(workerScript) {
if (!cluster.isPrimary) {
require(workerScript);
return;
}
console.log(`Primary ${process.pid} starting`);
// Fork workers
for (let i = 0; i < this.numWorkers; i++) {
this.forkWorker();
}
// Handle worker exit
cluster.on("exit", (worker, code, signal) => {
this.workers.delete(worker.id);
if (worker.exitedAfterDisconnect) {
console.log(`Worker ${worker.process.pid} gracefully exited`);
} else {
console.error(`Worker ${worker.process.pid} crashed (${code})`);
this.forkWorker();
}
});
// Graceful shutdown
process.on("SIGTERM", () => this.shutdown());
process.on("SIGINT", () => this.shutdown());
}
forkWorker() {
const worker = cluster.fork();
this.workers.set(worker.id, worker);
worker.on("message", (msg) => {
if (msg.type === "ready") {
console.log(`Worker ${worker.process.pid} ready`);
}
});
return worker;
}
async shutdown() {
console.log("Shutting down cluster...");
// Tell workers to finish current requests
for (const worker of this.workers.values()) {
worker.send({ type: "shutdown" });
worker.disconnect();
}
// Wait for graceful shutdown or force kill
setTimeout(() => {
console.log("Force killing remaining workers");
for (const worker of this.workers.values()) {
worker.kill("SIGKILL");
}
process.exit(0);
}, this.shutdownTimeout);
}
// Zero-downtime restart
async reload() {
console.log("Reloading workers...");
const oldWorkers = [...this.workers.values()];
// Fork new workers first
for (let i = 0; i < this.numWorkers; i++) {
const newWorker = this.forkWorker();
// Wait for new worker to be ready
await new Promise((resolve) => {
newWorker.once("message", (msg) => {
if (msg.type === "ready") resolve();
});
});
}
// Then gracefully shutdown old workers
for (const worker of oldWorkers) {
worker.send({ type: "shutdown" });
worker.disconnect();
}
}
}
module.exports = ClusterManager;Worker with Graceful Shutdown
// worker.js
const http = require("http");
let isShuttingDown = false;
const connections = new Set();
const server = http.createServer((req, res) => {
if (isShuttingDown) {
res.setHeader("Connection", "close");
}
// Track connection
connections.add(res.connection);
res.on("finish", () => connections.delete(res.connection));
// Simulate work
setTimeout(() => {
res.end(`Hello from ${process.pid}\n`);
}, 100);
});
server.listen(8000, () => {
// Signal ready to primary
process.send?.({ type: "ready" });
});
// Handle shutdown signal
process.on("message", (msg) => {
if (msg.type === "shutdown") {
isShuttingDown = true;
// Stop accepting new connections
server.close(() => {
console.log(`Worker ${process.pid} closed`);
process.exit(0);
});
// Close idle connections
for (const conn of connections) {
if (!conn._httpMessage) {
conn.end();
}
}
}
});Process Management with PM2
What is PM2?
PM2 (Process Manager 2) is a production-ready process manager for Node.js applications. It's like having an expert ops engineer watching your app 24/7.
Think of PM2 as your application's guardian:
┌────────────────────────────────────────────────────┐
│ PM2 │
├────────────────────────────────────────────────────┤
│ ✓ Auto-restart on crash │
│ ✓ Built-in clustering (no code changes!) │
│ ✓ Zero-downtime reloads │
│ ✓ Log aggregation & management │
│ ✓ Startup scripts (survive server reboot) │
│ ✓ Memory monitoring & auto-restart │
│ ✓ Real-time monitoring dashboard │
└────────────────────────────────────────────────────┘Why Do We Need PM2?
Problem: Running Node.js in Production is Hard
Without PM2, you have to handle:
Manual Production Checklist:
────────────────────────
□ What happens when the app crashes? (restart it manually?)
□ How do I use all CPU cores? (write cluster code?)
□ How do I deploy without downtime? (???)
□ Where do my logs go? (console.log to /dev/null?)
□ What happens when the server reboots? (start manually?)
□ How do I know if the app is using too much memory?
VS
With PM2:
────────
$ pm2 start app.js -i max # Cluster mode, all cores
$ pm2 reload app # Zero-downtime deploy
$ pm2 logs # Aggregated logs
$ pm2 monit # Real-time monitoring
$ pm2 startup # Survive reboots
✔ All handled automatically!PM2 vs Manual Clustering:
// Without PM2: 50+ lines of cluster code
const cluster = require('cluster');
const os = require('os');
if (cluster.isPrimary) {
for (let i = 0; i < os.cpus().length; i++) {
cluster.fork();
}
cluster.on('exit', (worker) => {
console.log(`Worker ${worker.process.pid} died`);
cluster.fork(); // Respawn
});
} else {
require('./app');
}
// ... plus graceful shutdown, health checks, logging...
// With PM2: 0 lines of cluster code
$ pm2 start app.js -i max
// Done! PM2 handles everything.Getting Started with PM2
# Install
npm install -g pm2
# Start with cluster mode
pm2 start app.js -i max # Use all CPU cores
pm2 start app.js -i 4 # Use 4 workers
# Zero-downtime reload
pm2 reload app
# Monitor
pm2 monit
pm2 logs
# Startup script
pm2 startup
pm2 saveEcosystem Configuration
// ecosystem.config.js
module.exports = {
apps: [
{
name: "api",
script: "./src/server.js",
instances: "max",
exec_mode: "cluster",
// Environment
env: {
NODE_ENV: "development",
},
env_production: {
NODE_ENV: "production",
},
// Restart policy
max_memory_restart: "1G",
restart_delay: 3000,
max_restarts: 10,
// Graceful shutdown
kill_timeout: 5000,
wait_ready: true,
listen_timeout: 10000,
// Logs
error_file: "./logs/error.log",
out_file: "./logs/out.log",
log_date_format: "YYYY-MM-DD HH:mm:ss",
// Source maps
source_map_support: true,
},
],
};Load Balancing
What is Load Balancing?
Load Balancing is the practice of distributing incoming traffic across multiple servers or processes to ensure no single server becomes overwhelmed.
Think of it like checkout lanes at a grocery store:
WITHOUT LOAD BALANCING
─────────────────────
│
Customers ──────────► │ LANE 1 │ ← 50 people waiting!
│ │
│ LANE 2 │ ← Empty (closed)
│ │
│ LANE 3 │ ← Empty (closed)
WITH LOAD BALANCING
───────────────────
│
│ ┌───► LANE 1 (17 people)
Customers ──────────► │ GREETER │──► LANE 2 (17 people)
│ └───► LANE 3 (16 people)Why Do We Need Load Balancing?
1. Handle More Traffic Than One Server Can Manage
A single Node.js server might handle 10,000 requests/second. But what if you have 100,000 requests/second?
100,000 req/sec
│
▼
┌───────────┐
│ Load │
│ Balancer │
└─────┬─────┘
│
┌─────┼─────┬─────────┐
│ │ │ │
▼ ▼ ▼ ▼
10K 10K 10K ... 10K (10 servers)2. Eliminate Single Points of Failure
Without Load Balancer: With Load Balancer:
│ │
▼ ▼
┌─────────┐ ┌───────────┐
│ Server │ ← Dies │ LB │
└─────────┘ └─────┬─────┘
│ │
▼ ┌─────┴─────┐
💀 ALL TRAFFIC │ │
FAILS ▼ ▼
┌────────┐ ┌────────┐
│Server 1│ │Server 2│
│ 💀 │ │ ✓ │ ← Still serving!
└────────┘ └────────┘3. Enable Zero-Downtime Deployments
You can deploy to servers one at a time while others handle traffic.
4. Geographic Distribution
Route users to the nearest data center for lower latency.
Load Balancing Algorithms Explained
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Request 1 → Server A, Request 2 → Server B, Request 3 → Server A... | Servers with equal capacity |
| Least Connections | Send to the server with fewest active connections | Requests with varying processing time |
| IP Hash | Hash client IP to always route same client to same server | When you need session stickiness |
| Weighted | Assign weights (Server A: 3, Server B: 1) → A gets 75% of traffic | Servers with different capacities |
| Random | Pick a server at random | Simple, stateless deployments |
Nginx as Reverse Proxy
# /etc/nginx/conf.d/app.conf
upstream nodejs_cluster {
least_conn; # Load balancing method
server 127.0.0.1:3001;
server 127.0.0.1:3002;
server 127.0.0.1:3003;
server 127.0.0.1:3004;
keepalive 64; # Connection pool
}
server {
listen 80;
server_name api.example.com;
location / {
proxy_pass http://nodejs_cluster;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_cache_bypass $http_upgrade;
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}Load Balancing Strategies
| Strategy | Description | Use Case |
|---|---|---|
| Round Robin | Distribute evenly in rotation | Equal capacity servers |
| Least Connections | Send to server with fewest active connections | Variable request times |
| IP Hash | Same client IP always goes to same server | Session affinity |
| Weighted | Distribute based on server capacity | Mixed hardware |
Microservices Patterns
What are Microservices?
Microservices is an architectural style where an application is composed of small, independent services that communicate over a network. Each service is:
- Independently deployable – Deploy user service without touching order service
- Loosely coupled – Services communicate through well-defined APIs
- Organized around business capabilities – User management, Orders, Payments, etc.
- Owned by a small team – "You build it, you run it"
MONOLITH MICROSERVICES
───────── ─────────────
┌─────────────────────┐ ┌─────────┐ ┌─────────┐
│ │ │ Users │ │ Orders │
│ One Big App │ └────┬────┘ └────┬────┘
│ │ │ │
│ • Users │ ───► │ API │
│ • Orders │ ┌────┴────────────┴────┐
│ • Products │ │ Gateway │
│ • Payments │ └──────────────────────┘
│ • Inventory │ │ │
│ │ ┌────┴────┐ ┌────┴────┐
└─────────────────────┘ │Products │ │Payments │
└─────────┘ └─────────┘Why Do We Need Microservices?
1. Independent Scaling
Different parts of your system have different load patterns:
Black Friday Traffic:
┌──────────────────────────────────────────────────────┐
│ │
│ Product Browsing: ████████████████████████ (HIGH) │
│ User Login: ████████ (MEDIUM) │
│ Checkout: ████████████████ (HIGH) │
│ Admin Dashboard: ██ (LOW) │
│ │
│ With Microservices: │
│ • Scale Product Service to 20 instances │
│ • Scale Checkout to 15 instances │
│ • Keep Admin at 2 instances │
│ • Save money on infrastructure! │
│ │
└──────────────────────────────────────────────────────┘2. Technology Flexibility
Each service can use the best tool for its job:
┌─────────────────┬──────────────────┬────────────────┐
│ User Service │ Search Service │ ML Predictions │
├─────────────────┼──────────────────┼────────────────┤
│ Node.js │ Java + Elastic │ Python + ML │
│ PostgreSQL │ Elasticsearch │ TensorFlow │
│ │ │ Redis │
└─────────────────┴──────────────────┴────────────────┘3. Fault Isolation
One service failing doesn't crash everything:
Recommendation service is down:
┌─────────────────────────────────────────────────┐
│ Website still works: │
│ ✓ Users can log in │
│ ✓ Users can browse products │
│ ✓ Users can add to cart │
│ ✓ Users can checkout │
│ ✗ "Recommended for you" shows: "Coming soon" │
└─────────────────────────────────────────────────┘4. Team Autonomy
- Team A owns User Service (deploy anytime)
- Team B owns Payment Service (deploy anytime)
- No coordination needed between teams!
- Each team can move at their own pace
Warning
The Tradeoff: Microservices add complexity! You now need:
- Service discovery
- Distributed tracing
- Network failure handling
- Data consistency across services
- Deployment orchestration
Don't adopt microservices until your team/system has outgrown a monolith.
Service Decomposition
┌─────────────────────────────────────────────────────────┐
│ API Gateway │
│ (auth, routing, rate limiting) │
└────────────────────────┬────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ User │ │ Order │ │ Product │
│ Service │ │ Service │ │ Service │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│ DB │ │ DB │ │ DB │
└───────┘ └───────┘ └───────┘Service Communication
When services need to talk to each other, you have several options. Each has tradeoffs:
┌─────────────────────────────────────────────────────────────────────┐
│ COMMUNICATION PATTERNS │
├─────────────────┬───────────────────┬───────────────────────────────┤
│ HTTP/REST │ gRPC │ Message Queue │
├─────────────────┼───────────────────┼───────────────────────────────┤
│ ✓ Simple │ ✓ Fast (binary) │ ✓ Async/decoupled │
│ ✓ Human-readable│ ✓ Type-safe │ ✓ Resilient to failures │
│ ✓ Debuggable │ ✓ Streaming │ ✓ Handles load spikes │
│ ✗ Verbose │ ✗ Complex setup │ ✗ Eventually consistent │
│ ✗ Slower │ ✗ Hard to debug │ ✗ Harder to debug │
├─────────────────┼───────────────────┼───────────────────────────────┤
│ Best for: │ Best for: │ Best for: │
│ External APIs │ Internal services │ Background jobs │
│ CRUD operations │ High-throughput │ Event-driven systems │
└─────────────────┴───────────────────┴───────────────────────────────┘HTTP/REST with API Gateway
What is an API Gateway?
An API Gateway is the single entry point for all client requests. It handles cross-cutting concerns so your services don't have to:
Without Gateway: With Gateway:
Client ──► Auth Service Client ──► API Gateway ──► Services
Client ──► User Service │
Client ──► Order Service ├── Authentication
Client ──► Product Service ├── Rate Limiting
├── Request Routing
Each service handles: ├── SSL Termination
• Its own auth ├── Logging
• Its own rate limiting └── Circuit Breaking
• Its own SSL
• Duplicated code everywhere! Services focus on business logic!// api-gateway.js
const express = require("express");
const httpProxy = require("http-proxy");
const proxy = httpProxy.createProxyServer();
const app = express();
const services = {
users: "http://user-service:3001",
orders: "http://order-service:3002",
products: "http://product-service:3003",
};
app.use("/api/:service/*", (req, res) => {
const target = services[req.params.service];
if (!target) {
return res.status(404).json({ error: "Service not found" });
}
proxy.web(req, res, { target });
});
proxy.on("error", (err, req, res) => {
res.status(502).json({ error: "Service unavailable" });
});gRPC for Internal Communication
What is gRPC?
gRPC (Google Remote Procedure Call) is a high-performance framework for service-to-service communication. Instead of sending JSON over HTTP, it uses:
- Protocol Buffers (protobuf): A binary serialization format (10x smaller than JSON)
- HTTP/2: Enables multiplexing, streaming, and header compression
- Strong typing: Both client and server are generated from the same schema
HTTP/REST: gRPC:
┌─────────────────────────┐ ┌─────────────────────────┐
│ GET /users/123 │ │ Call: GetUser(id: 123) │
│ │ │ │
│ Response (JSON): │ │ Response (Binary): │
│ { │ │ 0x0A 0x03 0x31 0x32 │
│ "id": "123", │ │ 0x33 0x12 0x08 0x4A... │
│ "name": "John Doe", │ │ │
│ "email": "john@..." │ │ (same data, 10x smaller)│
│ } │ │ │
│ ~200 bytes │ │ ~20 bytes │
└─────────────────────────┘ └─────────────────────────┘Why use gRPC over REST for internal services?
- Speed: Binary format + HTTP/2 = faster serialization and transfer
- Type Safety: The proto file IS the contract. No surprises.
- Streaming: Built-in support for client/server/bidirectional streaming
- Code Generation: Client and server stubs generated automatically
// user.proto
syntax = "proto3";
service UserService {
rpc GetUser (GetUserRequest) returns (User);
rpc CreateUser (CreateUserRequest) returns (User);
}
message GetUserRequest {
string id = 1;
}
message User {
string id = 1;
string name = 2;
string email = 3;
}// user-service.js
const grpc = require("@grpc/grpc-js");
const protoLoader = require("@grpc/proto-loader");
const packageDefinition = protoLoader.loadSync("user.proto");
const userProto = grpc.loadPackageDefinition(packageDefinition);
const server = new grpc.Server();
server.addService(userProto.UserService.service, {
getUser: async (call, callback) => {
const user = await db.users.findById(call.request.id);
callback(null, user);
},
createUser: async (call, callback) => {
const user = await db.users.create(call.request);
callback(null, user);
},
});
server.bindAsync("0.0.0.0:50051", grpc.ServerCredentials.createInsecure(), () =>
server.start(),
);
// client.js
const client = new userProto.UserService(
"localhost:50051",
grpc.credentials.createInsecure(),
);
client.getUser({ id: "123" }, (err, user) => {
console.log(user);
});Service Discovery
What is Service Discovery?
In a microservices architecture, services need to find each other. But with dynamic scaling, IP addresses and ports constantly change. Service Discovery is the mechanism for services to register themselves and find other services.
WITHOUT Service Discovery: WITH Service Discovery:
──────────────────────── ─────────────────────────
// Hardcoded addresses 😱 // Dynamic lookup ✓
const userService = const userService =
'http://192.168.1.45:3001'; registry.discover('users');
// What happens when: // The registry knows:
// - Server IP changes? // - Which instances exist
// - Service moves? // - Which are healthy
// - You scale to 10 instances? // - Load balances between themHow Service Discovery Works:
1. REGISTRATION 2. HEARTBEAT
───────────── ─────────
┌─────────────┐ ┌─────────────┐
│ Service │ ──"I'm alive"───► │ Registry │
│ Instance │ at :3001 │ │
└─────────────┘ └─────────────┘
│
Stores: {
'user-service': [
{ host: '...', port: 3001,
lastSeen: now() }
]
}
3. DISCOVERY 4. HEALTH CHECK
───────── ────────────
┌─────────────┐ ┌─────────────┐
│ Client │ ──"Where's user │ Registry │
│ Service │ service?"──────► │ │
└─────────────┘ └──────┬──────┘
▲ │
│ Removes stale
└── Returns healthy instance instancesService Discovery
// service-registry.js
class ServiceRegistry {
constructor() {
this.services = new Map();
this.healthCheckInterval = 30000;
}
register(name, host, port, metadata = {}) {
const id = `${name}-${host}:${port}`;
const instance = {
id,
name,
host,
port,
metadata,
lastHeartbeat: Date.now(),
};
if (!this.services.has(name)) {
this.services.set(name, new Map());
}
this.services.get(name).set(id, instance);
console.log(`Registered: ${id}`);
return id;
}
deregister(id) {
for (const instances of this.services.values()) {
if (instances.has(id)) {
instances.delete(id);
console.log(`Deregistered: ${id}`);
return true;
}
}
return false;
}
heartbeat(id) {
for (const instances of this.services.values()) {
if (instances.has(id)) {
instances.get(id).lastHeartbeat = Date.now();
return true;
}
}
return false;
}
discover(name) {
const instances = this.services.get(name);
if (!instances || instances.size === 0) {
return null;
}
// Round-robin load balancing
const healthy = [...instances.values()].filter(
(i) => Date.now() - i.lastHeartbeat < this.healthCheckInterval * 2,
);
if (healthy.length === 0) return null;
const index = Math.floor(Math.random() * healthy.length);
return healthy[index];
}
discoverAll(name) {
const instances = this.services.get(name);
if (!instances) return [];
return [...instances.values()].filter(
(i) => Date.now() - i.lastHeartbeat < this.healthCheckInterval * 2,
);
}
}
// Usage in service
const registry = new ServiceRegistry();
// On startup
const instanceId = registry.register("user-service", "localhost", 3001);
// Heartbeat loop
setInterval(() => {
registry.heartbeat(instanceId);
}, 15000);
// On shutdown
process.on("SIGTERM", () => {
registry.deregister(instanceId);
process.exit(0);
});
// Client discovering service
const userService = registry.discover("user-service");
if (userService) {
await fetch(`http://${userService.host}:${userService.port}/users/123`);
}Worker Threads
What are Worker Threads?
Worker Threads allow you to run JavaScript code in parallel threads within the same Node.js process. Unlike the Cluster module (which creates separate processes), Worker Threads:
- Share memory (via SharedArrayBuffer)
- Have lower overhead than processes
- Run in the same process but on different threads
CLUSTER (Multi-Process) WORKER THREADS (Multi-Thread)
────────────────────── ────────────────────────────
┌────────────────────────────┐ ┌─────────────────────────────────┐
│ Process 1 (Primary) │ │ Single Process │
│ ┌──────────────────────┐ │ │ ┌────────────────────────────┐ │
│ │ Memory Space 1 │ │ │ │ Shared Memory Space │ │
│ └──────────────────────┘ │ │ │ │ │
└────────────────────────────┘ │ │ Thread 1 Thread 2 │ │
│ │ ▼ ▼ │ │
┌────────────────────────────┐ │ │ ┌────┐ ┌────┐ │ │
│ Process 2 (Worker) │ │ │ │Task│ │Task│ │ │
│ ┌──────────────────────┐ │ │ │ │ A │ │ B │ │ │
│ │ Memory Space 2 │ │ │ │ └────┘ └────┘ │ │
│ └──────────────────────┘ │ │ │ │ │
└────────────────────────────┘ │ └────────────────────────────┘ │
└─────────────────────────────────┘
Memory is COPIED between Memory can be SHARED between
processes (expensive) threads (fast)Why Do We Need Worker Threads?
The Event Loop Problem
Node.js has a single-threaded event loop. CPU-intensive operations block it:
// This BLOCKS the event loop for ~5 seconds!
function fibonacci(n) {
if (n <= 1) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
app.get("/compute", (req, res) => {
const result = fibonacci(45); // Blocks EVERYTHING
res.json({ result });
});
app.get("/health", (req, res) => {
res.json({ status: "ok" }); // Can't respond until fibonacci finishes!
});Timeline of a Blocked Event Loop:
0s 1s 2s 3s 4s 5s
│─────────│─────────│─────────│─────────│─────────│
│ │
│ [══════ fibonacci(45) blocking ════════════] │
│ │
│ Request A: /compute ────────────────────► Response
│ Request B: /health ─────waiting──────────► Response
│ Request C: /api/users ───waiting──────────► Response
│ │
│ ALL requests wait for fibonacci to complete! │
└──────────────────────────────────────────────────┘With Worker Threads:
0s 1s 2s 3s 4s 5s
│─────────│─────────│─────────│─────────│─────────│
│ │
│ Main Thread (Event Loop): │
│ ├─► Request B: /health ──► Response (instant) │
│ ├─► Request C: /api ─────► Response (instant) │
│ └─► Request A: waiting for worker... │
│ │
│ Worker Thread: │
│ └─► [════ fibonacci(45) ═══════════════] │
│ │ │
│ └──► Result to A │
│ │
│ Event loop stays responsive! │
└──────────────────────────────────────────────────┘When to Use Worker Threads vs Cluster
| Scenario | Use Cluster | Use Worker Threads |
|---|---|---|
| I/O-bound work (HTTP, DB) | ✓ | ✗ |
| CPU-bound work (crypto, parsing) | ✗ | ✓ |
| Need to share memory | ✗ | ✓ |
| Want process isolation | ✓ | ✗ |
| Image/video processing | ✗ | ✓ |
| Machine learning inference | ✗ | ✓ |
| Multiple HTTP servers | ✓ | ✗ |
Note
Rule of thumb: - If you're waiting on external resources (network, disk) → Cluster - If you're crunching numbers/data → Worker Threads
Basic Worker Thread Example
For CPU-intensive tasks, use Worker Threads to avoid blocking the event loop:
const {
Worker,
isMainThread,
parentPort,
workerData,
} = require("worker_threads");
if (isMainThread) {
// Main thread
function runWorker(data) {
return new Promise((resolve, reject) => {
const worker = new Worker(__filename, { workerData: data });
worker.on("message", resolve);
worker.on("error", reject);
worker.on("exit", (code) => {
if (code !== 0) {
reject(new Error(`Worker stopped with code ${code}`));
}
});
});
}
// Use worker
async function processData(items) {
const results = await Promise.all(items.map((item) => runWorker(item)));
return results;
}
} else {
// Worker thread
const result = heavyComputation(workerData);
parentPort.postMessage(result);
}Worker Pool
const { Worker } = require("worker_threads");
const os = require("os");
class WorkerPool {
constructor(workerScript, poolSize = os.cpus().length) {
this.workerScript = workerScript;
this.poolSize = poolSize;
this.workers = [];
this.queue = [];
for (let i = 0; i < poolSize; i++) {
this.addWorker();
}
}
addWorker() {
const worker = new Worker(this.workerScript);
worker.busy = false;
worker.on("message", (result) => {
worker.busy = false;
worker.currentResolve(result);
this.processQueue();
});
worker.on("error", (err) => {
worker.busy = false;
worker.currentReject(err);
this.processQueue();
});
this.workers.push(worker);
}
run(data) {
return new Promise((resolve, reject) => {
this.queue.push({ data, resolve, reject });
this.processQueue();
});
}
processQueue() {
if (this.queue.length === 0) return;
const availableWorker = this.workers.find((w) => !w.busy);
if (!availableWorker) return;
const { data, resolve, reject } = this.queue.shift();
availableWorker.busy = true;
availableWorker.currentResolve = resolve;
availableWorker.currentReject = reject;
availableWorker.postMessage(data);
}
async destroy() {
await Promise.all(this.workers.map((w) => w.terminate()));
}
}
// Usage
const pool = new WorkerPool("./heavy-task.js", 4);
const results = await Promise.all([
pool.run({ task: 1 }),
pool.run({ task: 2 }),
pool.run({ task: 3 }),
// ... more tasks
]);
await pool.destroy();Summary
Scaling Decision Tree
Use this flowchart to decide which scaling pattern fits your needs:
START HERE
│
▼
┌───────────────────────┐
│ Is your bottleneck │
│ CPU or I/O? │
└───────────┬───────────┘
┌───┴───┐
│ │
CPU ▼ ▼ I/O
┌────────────┐ ┌────────────┐
│ Worker │ │ Cluster │
│ Threads │ │ + PM2 │
└────────────┘ └──────┬─────┘
│
▼
┌───────────────────────┐
│ Need more than 1 │
│ machine can handle? │
└───────────┬───────────┘
┌───┴───┐
│ │
NO ▼ ▼ YES
┌────────────┐ ┌────────────┐
│ Stay with │ │ Load │
│ Cluster │ │ Balancer │
└────────────┘ └──────┬─────┘
│
▼
┌───────────────────────┐
│ Need independent │
│ scaling per feature?│
└───────────┬───────────┘
┌───┴───┐
│ │
NO ▼ ▼ YES
┌────────────┐ ┌────────────┐
│ Monolith │ │ Micro- │
│ is fine! │ │ services │
└────────────┘ └────────────┘Quick Reference Table
Scaling patterns for Node.js:
| Pattern | Use Case |
|---|---|
| Cluster Module | Utilize all CPU cores |
| PM2 | Production process management |
| Nginx | Load balancing, SSL termination |
| Microservices | Independent scaling, team autonomy |
| Worker Threads | CPU-intensive tasks |
Key takeaways:
- One process per core is the baseline for Node.js scaling
- Graceful shutdown prevents request loss during deploys
- Load balancers distribute traffic across processes/servers
- Microservices allow independent scaling of bottlenecks
- Worker threads offload CPU work without blocking
Warning
Don't prematurely optimize. Start with a single process, measure performance, and scale based on real bottlenecks. Often, the bottleneck is database or network I/O, not CPU.