Scaling Node.js Applications

24 min read•Node.js Design Patterns

Scaling Node.js Applications

Node.js is single-threaded by default, but production applications need to scale. This chapter covers patterns for utilizing multiple cores, distributing load, and architecting scalable systems.

The Scaling Challenge

Node.js runs JavaScript on a single thread (the event loop). To fully utilize a multi-core server, you need multiple Node.js processes:

PLAINTEXT

┌─────────────────────────────────────────────────┐
│              8-Core Server                       │
├─────────────────────────────────────────────────┤
│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐               │
│  │Node │ │Node │ │Node │ │Node │               │
│  │  1  │ │  2  │ │  3  │ │  4  │  ... idle ... │
│  └─────┘ └─────┘ └─────┘ └─────┘               │
│                                                  │
│  Single process = 1 core used = 7 cores wasted  │
└─────────────────────────────────────────────────┘

Cluster Module

What is the Cluster Module?

The Cluster Module is Node.js's built-in solution for creating multiple child processes (workers) that share the same server port. It allows you to spawn a pool of identical workers managed by a primary (master) process.

Think of it like a restaurant kitchen:

PLAINTEXT

┌─────────────────────────────────────────────────────────┐
│                    RESTAURANT KITCHEN                    │
├─────────────────────────────────────────────────────────┤
│                                                          │
│    HEAD CHEF (Primary Process)                          │
│    ├── Receives all orders                              │
│    ├── Distributes work                                  │
│    └── Replaces chefs who get sick                      │
│                                                          │
│    LINE COOKS (Worker Processes)                        │
│    ├── Chef 1: Cooking Order A                          │
│    ├── Chef 2: Cooking Order B                          │
│    ├── Chef 3: Cooking Order C                          │
│    └── Chef 4: Cooking Order D                          │
│                                                          │
│    One kitchen (port), multiple cooks (processes)       │
└─────────────────────────────────────────────────────────┘

Why Do We Need It?

Problem 1: Node.js is Single-Threaded

By default, Node.js runs on a single CPU core. On an 8-core machine, 87.5% of your CPU power sits idle:

PLAINTEXT

Before Clustering:
┌────────────────────────────────────────┐
│ CPU Core 1: [████████ Node.js ████████] │  ← 100% utilized
│ CPU Core 2: [                         ] │  ← 0% (wasted)
│ CPU Core 3: [                         ] │  ← 0% (wasted)
│ CPU Core 4: [                         ] │  ← 0% (wasted)
│ CPU Core 5: [                         ] │  ← 0% (wasted)
│ CPU Core 6: [                         ] │  ← 0% (wasted)
│ CPU Core 7: [                         ] │  ← 0% (wasted)
│ CPU Core 8: [                         ] │  ← 0% (wasted)
└────────────────────────────────────────┘
 
After Clustering:
┌────────────────────────────────────────┐
│ CPU Core 1: [████████ Worker 1 ████████] │  ← 100% utilized
│ CPU Core 2: [████████ Worker 2 ████████] │  ← 100% utilized
│ CPU Core 3: [████████ Worker 3 ████████] │  ← 100% utilized
│ CPU Core 4: [████████ Worker 4 ████████] │  ← 100% utilized
│ CPU Core 5: [████████ Worker 5 ████████] │  ← 100% utilized
│ CPU Core 6: [████████ Worker 6 ████████] │  ← 100% utilized
│ CPU Core 7: [████████ Worker 7 ████████] │  ← 100% utilized
│ CPU Core 8: [████████ Worker 8 ████████] │  ← 100% utilized
└────────────────────────────────────────┘

Problem 2: One Crash Takes Down Everything

With a single process, one uncaught exception crashes your entire application. With clustering:

Worker crashes → Primary spawns a replacement
Other workers continue serving requests
Users may never notice!

Problem 3: Deployment Requires Downtime

Without clustering, deploying new code means:

Stop server → Requests fail
Deploy code
Start server → Back online

With clustering (zero-downtime deploy):

Start new workers with new code
New workers handle new requests
Old workers finish current requests
Old workers gracefully exit
Zero dropped requests!

Basic Cluster Example

Node.js's built-in cluster module forks multiple worker processes:

JAVASCRIPT

const cluster = require("cluster");
const http = require("http");
const os = require("os");
 
const numCPUs = os.cpus().length;
 
if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} starting ${numCPUs} workers`);
 
  // Fork workers
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
 
  // Handle worker death
  cluster.on("exit", (worker, code, signal) => {
    console.log(`Worker ${worker.process.pid} died (${signal || code})`);
    console.log("Starting replacement worker");
    cluster.fork();
  });
} else {
  // Workers share the TCP connection
  http
    .createServer((req, res) => {
      res.writeHead(200);
      res.end(`Hello from worker ${process.pid}\n`);
    })
    .listen(8000);
 
  console.log(`Worker ${process.pid} started`);
}

How Clustering Works

PLAINTEXT

                    ┌─────────────────┐
                    │    Primary      │
        Port 8000   │   (manager)     │
    ─────────────►  │                 │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │ Worker 1 │   │ Worker 2 │   │ Worker 3 │
        │  PID 101 │   │  PID 102 │   │  PID 103 │
        └──────────┘   └──────────┘   └──────────┘
 
- Primary process manages workers
- Workers share the server port
- OS distributes connections (round-robin on Linux)

Production-Ready Cluster

JAVASCRIPT

const cluster = require("cluster");
const os = require("os");
 
class ClusterManager {
  constructor(options = {}) {
    this.workers = new Map();
    this.numWorkers = options.workers || os.cpus().length;
    this.shutdownTimeout = options.shutdownTimeout || 10000;
  }
 
  start(workerScript) {
    if (!cluster.isPrimary) {
      require(workerScript);
      return;
    }
 
    console.log(`Primary ${process.pid} starting`);
 
    // Fork workers
    for (let i = 0; i < this.numWorkers; i++) {
      this.forkWorker();
    }
 
    // Handle worker exit
    cluster.on("exit", (worker, code, signal) => {
      this.workers.delete(worker.id);
 
      if (worker.exitedAfterDisconnect) {
        console.log(`Worker ${worker.process.pid} gracefully exited`);
      } else {
        console.error(`Worker ${worker.process.pid} crashed (${code})`);
        this.forkWorker();
      }
    });
 
    // Graceful shutdown
    process.on("SIGTERM", () => this.shutdown());
    process.on("SIGINT", () => this.shutdown());
  }
 
  forkWorker() {
    const worker = cluster.fork();
    this.workers.set(worker.id, worker);
 
    worker.on("message", (msg) => {
      if (msg.type === "ready") {
        console.log(`Worker ${worker.process.pid} ready`);
      }
    });
 
    return worker;
  }
 
  async shutdown() {
    console.log("Shutting down cluster...");
 
    // Tell workers to finish current requests
    for (const worker of this.workers.values()) {
      worker.send({ type: "shutdown" });
      worker.disconnect();
    }
 
    // Wait for graceful shutdown or force kill
    setTimeout(() => {
      console.log("Force killing remaining workers");
      for (const worker of this.workers.values()) {
        worker.kill("SIGKILL");
      }
      process.exit(0);
    }, this.shutdownTimeout);
  }
 
  // Zero-downtime restart
  async reload() {
    console.log("Reloading workers...");
    const oldWorkers = [...this.workers.values()];
 
    // Fork new workers first
    for (let i = 0; i < this.numWorkers; i++) {
      const newWorker = this.forkWorker();
 
      // Wait for new worker to be ready
      await new Promise((resolve) => {
        newWorker.once("message", (msg) => {
          if (msg.type === "ready") resolve();
        });
      });
    }
 
    // Then gracefully shutdown old workers
    for (const worker of oldWorkers) {
      worker.send({ type: "shutdown" });
      worker.disconnect();
    }
  }
}
 
module.exports = ClusterManager;

Worker with Graceful Shutdown

JAVASCRIPT

// worker.js
const http = require("http");
 
let isShuttingDown = false;
const connections = new Set();
 
const server = http.createServer((req, res) => {
  if (isShuttingDown) {
    res.setHeader("Connection", "close");
  }
 
  // Track connection
  connections.add(res.connection);
  res.on("finish", () => connections.delete(res.connection));
 
  // Simulate work
  setTimeout(() => {
    res.end(`Hello from ${process.pid}\n`);
  }, 100);
});
 
server.listen(8000, () => {
  // Signal ready to primary
  process.send?.({ type: "ready" });
});
 
// Handle shutdown signal
process.on("message", (msg) => {
  if (msg.type === "shutdown") {
    isShuttingDown = true;
 
    // Stop accepting new connections
    server.close(() => {
      console.log(`Worker ${process.pid} closed`);
      process.exit(0);
    });
 
    // Close idle connections
    for (const conn of connections) {
      if (!conn._httpMessage) {
        conn.end();
      }
    }
  }
});

Process Management with PM2

What is PM2?

PM2 (Process Manager 2) is a production-ready process manager for Node.js applications. It's like having an expert ops engineer watching your app 24/7.

Think of PM2 as your application's guardian:

PLAINTEXT

┌────────────────────────────────────────────────────┐
│                        PM2                           │
├────────────────────────────────────────────────────┤
│  ✓ Auto-restart on crash                             │
│  ✓ Built-in clustering (no code changes!)            │
│  ✓ Zero-downtime reloads                             │
│  ✓ Log aggregation & management                      │
│  ✓ Startup scripts (survive server reboot)           │
│  ✓ Memory monitoring & auto-restart                  │
│  ✓ Real-time monitoring dashboard                    │
└────────────────────────────────────────────────────┘

Why Do We Need PM2?

Problem: Running Node.js in Production is Hard

Without PM2, you have to handle:

PLAINTEXT

Manual Production Checklist:
────────────────────────
□ What happens when the app crashes? (restart it manually?)
□ How do I use all CPU cores? (write cluster code?)
□ How do I deploy without downtime? (???)
□ Where do my logs go? (console.log to /dev/null?)
□ What happens when the server reboots? (start manually?)
□ How do I know if the app is using too much memory?
 
                         VS
 
With PM2:
────────
$ pm2 start app.js -i max    # Cluster mode, all cores
$ pm2 reload app             # Zero-downtime deploy
$ pm2 logs                   # Aggregated logs
$ pm2 monit                  # Real-time monitoring
$ pm2 startup                # Survive reboots
 
✔ All handled automatically!

PM2 vs Manual Clustering:

PLAINTEXT

// Without PM2: 50+ lines of cluster code
const cluster = require('cluster');
const os = require('os');
 
if (cluster.isPrimary) {
  for (let i = 0; i < os.cpus().length; i++) {
    cluster.fork();
  }
  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died`);
    cluster.fork(); // Respawn
  });
} else {
  require('./app');
}
// ... plus graceful shutdown, health checks, logging...
 
// With PM2: 0 lines of cluster code
$ pm2 start app.js -i max
// Done! PM2 handles everything.

Getting Started with PM2

BASH

# Install
npm install -g pm2
 
# Start with cluster mode
pm2 start app.js -i max  # Use all CPU cores
pm2 start app.js -i 4    # Use 4 workers
 
# Zero-downtime reload
pm2 reload app
 
# Monitor
pm2 monit
pm2 logs
 
# Startup script
pm2 startup
pm2 save

Ecosystem Configuration

JAVASCRIPT

// ecosystem.config.js
module.exports = {
  apps: [
    {
      name: "api",
      script: "./src/server.js",
      instances: "max",
      exec_mode: "cluster",
 
      // Environment
      env: {
        NODE_ENV: "development",
      },
      env_production: {
        NODE_ENV: "production",
      },
 
      // Restart policy
      max_memory_restart: "1G",
      restart_delay: 3000,
      max_restarts: 10,
 
      // Graceful shutdown
      kill_timeout: 5000,
      wait_ready: true,
      listen_timeout: 10000,
 
      // Logs
      error_file: "./logs/error.log",
      out_file: "./logs/out.log",
      log_date_format: "YYYY-MM-DD HH:mm:ss",
 
      // Source maps
      source_map_support: true,
    },
  ],
};

Load Balancing

What is Load Balancing?

Load Balancing is the practice of distributing incoming traffic across multiple servers or processes to ensure no single server becomes overwhelmed.

Think of it like checkout lanes at a grocery store:

PLAINTEXT

                    WITHOUT LOAD BALANCING
                    ─────────────────────
                          │
    Customers ──────────► │ LANE 1 │ ← 50 people waiting!
                          │        │
                          │ LANE 2 │ ← Empty (closed)
                          │        │
                          │ LANE 3 │ ← Empty (closed)
 
                    WITH LOAD BALANCING
                    ───────────────────
                          │
                          │         ┌───► LANE 1 (17 people)
    Customers ──────────► │ GREETER │──► LANE 2 (17 people)
                          │         └───► LANE 3 (16 people)

Why Do We Need Load Balancing?

1. Handle More Traffic Than One Server Can Manage

A single Node.js server might handle 10,000 requests/second. But what if you have 100,000 requests/second?

PLAINTEXT

    100,000 req/sec
          │
          ▼
    ┌───────────┐
    │   Load    │
    │  Balancer │
    └─────┬─────┘
          │
    ┌─────┼─────┬─────────┐
    │     │     │         │
    ▼     ▼     ▼         ▼
   10K   10K   10K  ...  10K   (10 servers)

2. Eliminate Single Points of Failure

PLAINTEXT

Without Load Balancer:        With Load Balancer:
        │                              │
        ▼                              ▼
   ┌─────────┐                   ┌───────────┐
   │ Server  │ ← Dies          │    LB      │
   └─────────┘                   └─────┬─────┘
        │                              │
        ▼                        ┌─────┴─────┐
   💀 ALL TRAFFIC               │           │
      FAILS                     ▼           ▼
                           ┌────────┐  ┌────────┐
                           │Server 1│  │Server 2│
                           │  💀    │  │   ✓    │ ← Still serving!
                           └────────┘  └────────┘

3. Enable Zero-Downtime Deployments

You can deploy to servers one at a time while others handle traffic.

4. Geographic Distribution

Route users to the nearest data center for lower latency.

Load Balancing Algorithms Explained

Algorithm	How It Works	Best For
Round Robin	Request 1 → Server A, Request 2 → Server B, Request 3 → Server A...	Servers with equal capacity
Least Connections	Send to the server with fewest active connections	Requests with varying processing time
IP Hash	Hash client IP to always route same client to same server	When you need session stickiness
Weighted	Assign weights (Server A: 3, Server B: 1) → A gets 75% of traffic	Servers with different capacities
Random	Pick a server at random	Simple, stateless deployments

Nginx as Reverse Proxy

NGINX

# /etc/nginx/conf.d/app.conf
upstream nodejs_cluster {
    least_conn;  # Load balancing method
 
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
    server 127.0.0.1:3004;
 
    keepalive 64;  # Connection pool
}
 
server {
    listen 80;
    server_name api.example.com;
 
    location / {
        proxy_pass http://nodejs_cluster;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_cache_bypass $http_upgrade;
 
        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }
}

Load Balancing Strategies

Strategy	Description	Use Case
Round Robin	Distribute evenly in rotation	Equal capacity servers
Least Connections	Send to server with fewest active connections	Variable request times
IP Hash	Same client IP always goes to same server	Session affinity
Weighted	Distribute based on server capacity	Mixed hardware

Microservices Patterns

What are Microservices?

Microservices is an architectural style where an application is composed of small, independent services that communicate over a network. Each service is:

Independently deployable – Deploy user service without touching order service
Loosely coupled – Services communicate through well-defined APIs
Organized around business capabilities – User management, Orders, Payments, etc.
Owned by a small team – "You build it, you run it"

PLAINTEXT

     MONOLITH                          MICROSERVICES
     ─────────                          ─────────────
┌─────────────────────┐          ┌─────────┐  ┌─────────┐
│                     │          │  Users  │  │ Orders  │
│   One Big App       │          └────┬────┘  └────┬────┘
│                     │               │            │
│  • Users            │   ───►        │    API     │
│  • Orders           │          ┌────┴────────────┴────┐
│  • Products         │          │      Gateway         │
│  • Payments         │          └──────────────────────┘
│  • Inventory        │               │            │
│                     │          ┌────┴────┐  ┌────┴────┐
└─────────────────────┘          │Products │  │Payments │
                                 └─────────┘  └─────────┘

Why Do We Need Microservices?

1. Independent Scaling

Different parts of your system have different load patterns:

PLAINTEXT

Black Friday Traffic:
┌──────────────────────────────────────────────────────┐
│                                                       │
│  Product Browsing:  ████████████████████████ (HIGH)  │
│  User Login:        ████████ (MEDIUM)                │
│  Checkout:          ████████████████ (HIGH)          │
│  Admin Dashboard:   ██ (LOW)                         │
│                                                       │
│  With Microservices:                                 │
│  • Scale Product Service to 20 instances             │
│  • Scale Checkout to 15 instances                    │
│  • Keep Admin at 2 instances                         │
│  • Save money on infrastructure!                     │
│                                                       │
└──────────────────────────────────────────────────────┘

2. Technology Flexibility

Each service can use the best tool for its job:

PLAINTEXT

┌─────────────────┬──────────────────┬────────────────┐
│  User Service   │  Search Service  │ ML Predictions │
├─────────────────┼──────────────────┼────────────────┤
│  Node.js        │  Java + Elastic  │  Python + ML   │
│  PostgreSQL     │  Elasticsearch   │  TensorFlow    │
│                 │                  │  Redis         │
└─────────────────┴──────────────────┴────────────────┘

3. Fault Isolation

One service failing doesn't crash everything:

PLAINTEXT

Recommendation service is down:
 
┌─────────────────────────────────────────────────┐
│  Website still works:                           │
│  ✓ Users can log in                            │
│  ✓ Users can browse products                   │
│  ✓ Users can add to cart                       │
│  ✓ Users can checkout                          │
│  ✗ "Recommended for you" shows: "Coming soon"  │
└─────────────────────────────────────────────────┘

4. Team Autonomy

Team A owns User Service (deploy anytime)
Team B owns Payment Service (deploy anytime)
No coordination needed between teams!
Each team can move at their own pace

Warning

The Tradeoff: Microservices add complexity! You now need:

Service discovery
Distributed tracing
Network failure handling
Data consistency across services
Deployment orchestration

Don't adopt microservices until your team/system has outgrown a monolith.

Service Decomposition

PLAINTEXT

┌─────────────────────────────────────────────────────────┐
│                   API Gateway                            │
│              (auth, routing, rate limiting)              │
└────────────────────────┬────────────────────────────────┘
                         │
      ┌──────────────────┼──────────────────┐
      │                  │                  │
      ▼                  ▼                  ▼
┌───────────┐      ┌───────────┐      ┌───────────┐
│   User    │      │   Order   │      │  Product  │
│  Service  │      │  Service  │      │  Service  │
└─────┬─────┘      └─────┬─────┘      └─────┬─────┘
      │                  │                  │
      ▼                  ▼                  ▼
  ┌───────┐          ┌───────┐          ┌───────┐
  │  DB   │          │  DB   │          │  DB   │
  └───────┘          └───────┘          └───────┘

Service Communication

When services need to talk to each other, you have several options. Each has tradeoffs:

PLAINTEXT

┌─────────────────────────────────────────────────────────────────────┐
│                   COMMUNICATION PATTERNS                             │
├─────────────────┬───────────────────┬───────────────────────────────┤
│   HTTP/REST     │      gRPC         │     Message Queue              │
├─────────────────┼───────────────────┼───────────────────────────────┤
│ ✓ Simple        │ ✓ Fast (binary)   │ ✓ Async/decoupled             │
│ ✓ Human-readable│ ✓ Type-safe       │ ✓ Resilient to failures       │
│ ✓ Debuggable    │ ✓ Streaming       │ ✓ Handles load spikes         │
│ ✗ Verbose       │ ✗ Complex setup   │ ✗ Eventually consistent       │
│ ✗ Slower        │ ✗ Hard to debug   │ ✗ Harder to debug             │
├─────────────────┼───────────────────┼───────────────────────────────┤
│ Best for:       │ Best for:         │ Best for:                     │
│ External APIs   │ Internal services │ Background jobs               │
│ CRUD operations │ High-throughput   │ Event-driven systems          │
└─────────────────┴───────────────────┴───────────────────────────────┘

HTTP/REST with API Gateway

What is an API Gateway?

An API Gateway is the single entry point for all client requests. It handles cross-cutting concerns so your services don't have to:

PLAINTEXT

Without Gateway:                 With Gateway:
 
Client ──► Auth Service          Client ──► API Gateway ──► Services
Client ──► User Service                     │
Client ──► Order Service                    ├── Authentication
Client ──► Product Service                  ├── Rate Limiting
                                            ├── Request Routing
Each service handles:                       ├── SSL Termination
• Its own auth                              ├── Logging
• Its own rate limiting                     └── Circuit Breaking
• Its own SSL
• Duplicated code everywhere!    Services focus on business logic!

JAVASCRIPT

// api-gateway.js
const express = require("express");
const httpProxy = require("http-proxy");
 
const proxy = httpProxy.createProxyServer();
const app = express();
 
const services = {
  users: "http://user-service:3001",
  orders: "http://order-service:3002",
  products: "http://product-service:3003",
};
 
app.use("/api/:service/*", (req, res) => {
  const target = services[req.params.service];
  if (!target) {
    return res.status(404).json({ error: "Service not found" });
  }
 
  proxy.web(req, res, { target });
});
 
proxy.on("error", (err, req, res) => {
  res.status(502).json({ error: "Service unavailable" });
});

gRPC for Internal Communication

What is gRPC?

gRPC (Google Remote Procedure Call) is a high-performance framework for service-to-service communication. Instead of sending JSON over HTTP, it uses:

Protocol Buffers (protobuf): A binary serialization format (10x smaller than JSON)
HTTP/2: Enables multiplexing, streaming, and header compression
Strong typing: Both client and server are generated from the same schema

PLAINTEXT

HTTP/REST:                           gRPC:
┌─────────────────────────┐          ┌─────────────────────────┐
│ GET /users/123          │          │ Call: GetUser(id: 123)  │
│                         │          │                         │
│ Response (JSON):        │          │ Response (Binary):      │
│ {                       │          │ 0x0A 0x03 0x31 0x32     │
│   "id": "123",          │          │ 0x33 0x12 0x08 0x4A...  │
│   "name": "John Doe",   │          │                         │
│   "email": "john@..."   │          │ (same data, 10x smaller)│
│ }                       │          │                         │
│ ~200 bytes              │          │ ~20 bytes               │
└─────────────────────────┘          └─────────────────────────┘

Why use gRPC over REST for internal services?

Speed: Binary format + HTTP/2 = faster serialization and transfer
Type Safety: The proto file IS the contract. No surprises.
Streaming: Built-in support for client/server/bidirectional streaming
Code Generation: Client and server stubs generated automatically

PROTOBUF

// user.proto
syntax = "proto3";
 
service UserService {
  rpc GetUser (GetUserRequest) returns (User);
  rpc CreateUser (CreateUserRequest) returns (User);
}
 
message GetUserRequest {
  string id = 1;
}
 
message User {
  string id = 1;
  string name = 2;
  string email = 3;
}

JAVASCRIPT

// user-service.js
const grpc = require("@grpc/grpc-js");
const protoLoader = require("@grpc/proto-loader");
 
const packageDefinition = protoLoader.loadSync("user.proto");
const userProto = grpc.loadPackageDefinition(packageDefinition);
 
const server = new grpc.Server();
 
server.addService(userProto.UserService.service, {
  getUser: async (call, callback) => {
    const user = await db.users.findById(call.request.id);
    callback(null, user);
  },
 
  createUser: async (call, callback) => {
    const user = await db.users.create(call.request);
    callback(null, user);
  },
});
 
server.bindAsync("0.0.0.0:50051", grpc.ServerCredentials.createInsecure(), () =>
  server.start(),
);
 
// client.js
const client = new userProto.UserService(
  "localhost:50051",
  grpc.credentials.createInsecure(),
);
 
client.getUser({ id: "123" }, (err, user) => {
  console.log(user);
});

Service Discovery

What is Service Discovery?

In a microservices architecture, services need to find each other. But with dynamic scaling, IP addresses and ports constantly change. Service Discovery is the mechanism for services to register themselves and find other services.

PLAINTEXT

WITHOUT Service Discovery:          WITH Service Discovery:
────────────────────────           ─────────────────────────
 
// Hardcoded addresses 😱           // Dynamic lookup ✓
const userService =                 const userService =
  'http://192.168.1.45:3001';        registry.discover('users');
 
// What happens when:               // The registry knows:
// - Server IP changes?             // - Which instances exist
// - Service moves?                 // - Which are healthy
// - You scale to 10 instances?     // - Load balances between them

How Service Discovery Works:

PLAINTEXT

1. REGISTRATION                     2. HEARTBEAT
   ─────────────                       ─────────
 
┌─────────────┐                    ┌─────────────┐
│   Service   │ ──"I'm alive"───► │  Registry   │
│  Instance   │    at :3001        │             │
└─────────────┘                    └─────────────┘
                                         │
                                   Stores: {
                                     'user-service': [
                                       { host: '...', port: 3001,
                                         lastSeen: now() }
                                     ]
                                   }
 
3. DISCOVERY                        4. HEALTH CHECK
   ─────────                           ────────────
 
┌─────────────┐                    ┌─────────────┐
│   Client    │ ──"Where's user    │  Registry   │
│  Service    │   service?"──────► │             │
└─────────────┘                    └──────┬──────┘
      ▲                                   │
      │                            Removes stale
      └── Returns healthy instance        instances

Service Discovery

JAVASCRIPT

// service-registry.js
class ServiceRegistry {
  constructor() {
    this.services = new Map();
    this.healthCheckInterval = 30000;
  }
 
  register(name, host, port, metadata = {}) {
    const id = `${name}-${host}:${port}`;
    const instance = {
      id,
      name,
      host,
      port,
      metadata,
      lastHeartbeat: Date.now(),
    };
 
    if (!this.services.has(name)) {
      this.services.set(name, new Map());
    }
    this.services.get(name).set(id, instance);
 
    console.log(`Registered: ${id}`);
    return id;
  }
 
  deregister(id) {
    for (const instances of this.services.values()) {
      if (instances.has(id)) {
        instances.delete(id);
        console.log(`Deregistered: ${id}`);
        return true;
      }
    }
    return false;
  }
 
  heartbeat(id) {
    for (const instances of this.services.values()) {
      if (instances.has(id)) {
        instances.get(id).lastHeartbeat = Date.now();
        return true;
      }
    }
    return false;
  }
 
  discover(name) {
    const instances = this.services.get(name);
    if (!instances || instances.size === 0) {
      return null;
    }
 
    // Round-robin load balancing
    const healthy = [...instances.values()].filter(
      (i) => Date.now() - i.lastHeartbeat < this.healthCheckInterval * 2,
    );
 
    if (healthy.length === 0) return null;
 
    const index = Math.floor(Math.random() * healthy.length);
    return healthy[index];
  }
 
  discoverAll(name) {
    const instances = this.services.get(name);
    if (!instances) return [];
 
    return [...instances.values()].filter(
      (i) => Date.now() - i.lastHeartbeat < this.healthCheckInterval * 2,
    );
  }
}
 
// Usage in service
const registry = new ServiceRegistry();
 
// On startup
const instanceId = registry.register("user-service", "localhost", 3001);
 
// Heartbeat loop
setInterval(() => {
  registry.heartbeat(instanceId);
}, 15000);
 
// On shutdown
process.on("SIGTERM", () => {
  registry.deregister(instanceId);
  process.exit(0);
});
 
// Client discovering service
const userService = registry.discover("user-service");
if (userService) {
  await fetch(`http://${userService.host}:${userService.port}/users/123`);
}

Worker Threads

What are Worker Threads?

Worker Threads allow you to run JavaScript code in parallel threads within the same Node.js process. Unlike the Cluster module (which creates separate processes), Worker Threads:

Share memory (via SharedArrayBuffer)
Have lower overhead than processes
Run in the same process but on different threads

PLAINTEXT

      CLUSTER (Multi-Process)           WORKER THREADS (Multi-Thread)
      ──────────────────────           ────────────────────────────
 
┌────────────────────────────┐    ┌─────────────────────────────────┐
│     Process 1 (Primary)    │    │        Single Process           │
│  ┌──────────────────────┐  │    │  ┌────────────────────────────┐ │
│  │ Memory Space 1       │  │    │  │     Shared Memory Space    │ │
│  └──────────────────────┘  │    │  │                            │ │
└────────────────────────────┘    │  │  Thread 1    Thread 2      │ │
                                  │  │    ▼           ▼           │ │
┌────────────────────────────┐    │  │  ┌────┐     ┌────┐        │ │
│     Process 2 (Worker)     │    │  │  │Task│     │Task│        │ │
│  ┌──────────────────────┐  │    │  │  │ A  │     │ B  │        │ │
│  │ Memory Space 2       │  │    │  │  └────┘     └────┘        │ │
│  └──────────────────────┘  │    │  │                            │ │
└────────────────────────────┘    │  └────────────────────────────┘ │
                                  └─────────────────────────────────┘
  Memory is COPIED between          Memory can be SHARED between
  processes (expensive)             threads (fast)

Why Do We Need Worker Threads?

The Event Loop Problem

Node.js has a single-threaded event loop. CPU-intensive operations block it:

JAVASCRIPT

// This BLOCKS the event loop for ~5 seconds!
function fibonacci(n) {
  if (n <= 1) return n;
  return fibonacci(n - 1) + fibonacci(n - 2);
}
 
app.get("/compute", (req, res) => {
  const result = fibonacci(45); // Blocks EVERYTHING
  res.json({ result });
});
 
app.get("/health", (req, res) => {
  res.json({ status: "ok" }); // Can't respond until fibonacci finishes!
});

Timeline of a Blocked Event Loop:

PLAINTEXT

0s        1s        2s        3s        4s        5s
│─────────│─────────│─────────│─────────│─────────│
│                                                  │
│  [══════ fibonacci(45) blocking ════════════]   │
│                                                  │
│  Request A: /compute ────────────────────► Response
│  Request B: /health  ─────waiting──────────► Response
│  Request C: /api/users ───waiting──────────► Response
│                                                  │
│  ALL requests wait for fibonacci to complete!   │
└──────────────────────────────────────────────────┘

With Worker Threads:

PLAINTEXT

0s        1s        2s        3s        4s        5s
│─────────│─────────│─────────│─────────│─────────│
│                                                  │
│  Main Thread (Event Loop):                       │
│  ├─► Request B: /health ──► Response (instant)   │
│  ├─► Request C: /api ─────► Response (instant)   │
│  └─► Request A: waiting for worker...            │
│                                                  │
│  Worker Thread:                                  │
│  └─► [════ fibonacci(45) ═══════════════]       │
│                              │                   │
│                              └──► Result to A    │
│                                                  │
│  Event loop stays responsive!                    │
└──────────────────────────────────────────────────┘

When to Use Worker Threads vs Cluster

Scenario	Use Cluster	Use Worker Threads
I/O-bound work (HTTP, DB)	✓	✗
CPU-bound work (crypto, parsing)	✗	✓
Need to share memory	✗	✓
Want process isolation	✓	✗
Image/video processing	✗	✓
Machine learning inference	✗	✓
Multiple HTTP servers	✓	✗

Note

Rule of thumb: - If you're waiting on external resources (network, disk) → Cluster - If you're crunching numbers/data → Worker Threads

Basic Worker Thread Example

For CPU-intensive tasks, use Worker Threads to avoid blocking the event loop:

JAVASCRIPT

const {
  Worker,
  isMainThread,
  parentPort,
  workerData,
} = require("worker_threads");
 
if (isMainThread) {
  // Main thread
  function runWorker(data) {
    return new Promise((resolve, reject) => {
      const worker = new Worker(__filename, { workerData: data });
      worker.on("message", resolve);
      worker.on("error", reject);
      worker.on("exit", (code) => {
        if (code !== 0) {
          reject(new Error(`Worker stopped with code ${code}`));
        }
      });
    });
  }
 
  // Use worker
  async function processData(items) {
    const results = await Promise.all(items.map((item) => runWorker(item)));
    return results;
  }
} else {
  // Worker thread
  const result = heavyComputation(workerData);
  parentPort.postMessage(result);
}

Worker Pool

JAVASCRIPT

const { Worker } = require("worker_threads");
const os = require("os");
 
class WorkerPool {
  constructor(workerScript, poolSize = os.cpus().length) {
    this.workerScript = workerScript;
    this.poolSize = poolSize;
    this.workers = [];
    this.queue = [];
 
    for (let i = 0; i < poolSize; i++) {
      this.addWorker();
    }
  }
 
  addWorker() {
    const worker = new Worker(this.workerScript);
    worker.busy = false;
 
    worker.on("message", (result) => {
      worker.busy = false;
      worker.currentResolve(result);
      this.processQueue();
    });
 
    worker.on("error", (err) => {
      worker.busy = false;
      worker.currentReject(err);
      this.processQueue();
    });
 
    this.workers.push(worker);
  }
 
  run(data) {
    return new Promise((resolve, reject) => {
      this.queue.push({ data, resolve, reject });
      this.processQueue();
    });
  }
 
  processQueue() {
    if (this.queue.length === 0) return;
 
    const availableWorker = this.workers.find((w) => !w.busy);
    if (!availableWorker) return;
 
    const { data, resolve, reject } = this.queue.shift();
    availableWorker.busy = true;
    availableWorker.currentResolve = resolve;
    availableWorker.currentReject = reject;
    availableWorker.postMessage(data);
  }
 
  async destroy() {
    await Promise.all(this.workers.map((w) => w.terminate()));
  }
}
 
// Usage
const pool = new WorkerPool("./heavy-task.js", 4);
 
const results = await Promise.all([
  pool.run({ task: 1 }),
  pool.run({ task: 2 }),
  pool.run({ task: 3 }),
  // ... more tasks
]);
 
await pool.destroy();

Summary

Scaling Decision Tree

Use this flowchart to decide which scaling pattern fits your needs:

PLAINTEXT

                  START HERE
                      │
                      ▼
        ┌───────────────────────┐
        │ Is your bottleneck  │
        │ CPU or I/O?         │
        └───────────┬───────────┘
               ┌───┴───┐
               │       │
           CPU ▼       ▼ I/O
    ┌────────────┐  ┌────────────┐
    │  Worker    │  │  Cluster    │
    │  Threads   │  │  + PM2      │
    └────────────┘  └──────┬─────┘
                           │
                           ▼
            ┌───────────────────────┐
            │ Need more than 1     │
            │ machine can handle?  │
            └───────────┬───────────┘
                   ┌───┴───┐
                   │       │
                NO ▼       ▼ YES
        ┌────────────┐  ┌────────────┐
        │  Stay with  │  │  Load      │
        │  Cluster    │  │  Balancer  │
        └────────────┘  └──────┬─────┘
                               │
                               ▼
                ┌───────────────────────┐
                │ Need independent    │
                │ scaling per feature?│
                └───────────┬───────────┘
                       ┌───┴───┐
                       │       │
                    NO ▼       ▼ YES
            ┌────────────┐  ┌────────────┐
            │  Monolith  │  │ Micro-    │
            │  is fine!  │  │ services  │
            └────────────┘  └────────────┘

Quick Reference Table

Scaling patterns for Node.js:

Pattern	Use Case
Cluster Module	Utilize all CPU cores
PM2	Production process management
Nginx	Load balancing, SSL termination
Microservices	Independent scaling, team autonomy
Worker Threads	CPU-intensive tasks

Key takeaways:

One process per core is the baseline for Node.js scaling
Graceful shutdown prevents request loss during deploys
Load balancers distribute traffic across processes/servers
Microservices allow independent scaling of bottlenecks
Worker threads offload CPU work without blocking

Warning

Don't prematurely optimize. Start with a single process, measure performance, and scale based on real bottlenecks. Often, the bottleneck is database or network I/O, not CPU.