Joins Revisited
Joins Revisited
You've learned the basics of joins. Now let's explore advanced techniques: complex join conditions, mixing join types, self-joins for hierarchical data, and solving real-world problems with joins.
Review: Join Types
| Join Type | Returns |
|---|---|
| INNER JOIN | Only matching rows from both tables |
| LEFT JOIN | All from left + matching from right |
| RIGHT JOIN | All from right + matching from left |
| FULL JOIN | All from both, NULLs where no match |
| CROSS JOIN | All combinations (Cartesian product) |
Complex Join Conditions
Multiple Conditions
Joins can have multiple conditions connected with AND:
-- Find orders where the order date matches the customer's join date
SELECT
c.first_name,
c.last_name,
o.order_id,
o.order_date
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id
AND DATE(o.order_date) = c.join_date;Non-Equality Joins
Joins don't require equality:
-- Match books to price ranges
CREATE TABLE price_ranges (
range_name VARCHAR(20),
min_price DECIMAL(8,2),
max_price DECIMAL(8,2)
);
INSERT INTO price_ranges VALUES
('Budget', 0, 13.99),
('Standard', 14.00, 17.99),
('Premium', 18.00, 99.99);
SELECT
b.title,
b.price,
pr.range_name
FROM books b
JOIN price_ranges pr
ON b.price >= pr.min_price
AND b.price <= pr.max_price;| title | price | range_name |
|---|---|---|
| Animal Farm | 12.99 | Budget |
| Pride and Prejudice | 14.99 | Standard |
| 1984 | 15.99 | Standard |
| Norwegian Wood | 16.99 | Standard |
| Kafka on the Shore | 17.99 | Standard |
| One Hundred Years... | 18.99 | Premium |
BETWEEN in Joins
-- Same result using BETWEEN
SELECT b.title, b.price, pr.range_name
FROM books b
JOIN price_ranges pr ON b.price BETWEEN pr.min_price AND pr.max_price;Self-Joins
A table joining to itself. Essential for hierarchical or comparative queries.
Comparing Rows Within Same Table
-- Find book pairs by the same author
SELECT
b1.title AS book1,
b2.title AS book2,
b1.author_id
FROM books b1
JOIN books b2
ON b1.author_id = b2.author_id
AND b1.book_id < b2.book_id; -- Avoid duplicates and self-matches| book1 | book2 | author_id |
|---|---|---|
| 1984 | Animal Farm | 1 |
| Norwegian Wood | Kafka on the Shore | 4 |
Note
The condition b1.book_id < b2.book_id ensures each pair appears only once and a book isn't paired with itself.
Finding Price Differences
-- Books that cost more than another book by the same author
SELECT
b1.title AS expensive_book,
b1.price AS higher_price,
b2.title AS cheaper_book,
b2.price AS lower_price,
b1.price - b2.price AS price_diff
FROM books b1
JOIN books b2
ON b1.author_id = b2.author_id
AND b1.price > b2.price;Hierarchical Data
Self-joins are common for organizational structures:
-- Employees and their managers (if we had an employees table)
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;Mixing Join Types
You can combine different join types in one query:
-- All customers, their orders (if any), and order items (if any)
SELECT
c.first_name,
c.last_name,
o.order_id,
b.title,
oi.quantity
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN books b ON oi.book_id = b.book_id
ORDER BY c.last_name, c.first_name, o.order_id;Carol (with no orders) appears with NULLs for order, item, and book columns.
Inner Join After Left Join
Be careful—an INNER JOIN after a LEFT JOIN can eliminate the rows you wanted to keep:
-- This LOSES customers without orders!
SELECT c.first_name, o.order_id, oi.quantity
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id; -- INNER removes NULLs
-- Keep all customers by using LEFT JOIN consistently
SELECT c.first_name, o.order_id, oi.quantity
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id;Joining Three or More Tables
Real queries often need multiple tables:
-- Complete order report
SELECT
c.first_name || ' ' || c.last_name AS customer,
o.order_id,
o.order_date,
b.title,
a.first_name || ' ' || a.last_name AS author,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price AS line_total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN books b ON oi.book_id = b.book_id
JOIN authors a ON b.author_id = a.author_id
ORDER BY o.order_date, o.order_id;CROSS JOIN Use Cases
Though often accidental, CROSS JOINs have legitimate uses:
Generating All Combinations
-- All possible author-category combinations for a matrix
SELECT
a.last_name AS author,
c.category_name
FROM authors a
CROSS JOIN categories c;With Date Ranges
-- Generate report structure for all months and products
WITH months AS (
SELECT generate_series(
'2024-01-01'::DATE,
'2024-12-01'::DATE,
'1 month'::INTERVAL
) AS month
)
SELECT
m.month,
p.product_name,
COALESCE(SUM(s.quantity), 0) AS units_sold
FROM months m
CROSS JOIN products p
LEFT JOIN sales s
ON s.product_id = p.product_id
AND DATE_TRUNC('month', s.sale_date) = m.month
GROUP BY m.month, p.product_name;Using USING and NATURAL JOIN
USING Clause
When join columns have identical names:
-- These are equivalent
SELECT * FROM books JOIN authors ON books.author_id = authors.author_id;
SELECT * FROM books JOIN authors USING (author_id);With USING, the column appears once in the result, not twice.
NATURAL JOIN
Automatically joins on all matching column names:
SELECT * FROM orders NATURAL JOIN customers;
-- Joins on customer_id (the common column)Warning
Avoid NATURAL JOIN in production code. Adding a column with a matching name later will silently change your query's behavior.
Lateral Joins
LATERAL joins allow the subquery to reference columns from preceding
tables:
-- Top 2 most expensive books per author
SELECT
a.first_name,
a.last_name,
top_books.title,
top_books.price
FROM authors a
CROSS JOIN LATERAL (
SELECT title, price
FROM books b
WHERE b.author_id = a.author_id
ORDER BY price DESC
LIMIT 2
) AS top_books;This is like a correlated subquery but in the FROM clause.
Join Algorithms (How Databases Execute Joins)
Understanding join algorithms helps with performance tuning:
Nested Loop Join
- For each row in table A, scan table B for matches
- Fast for small tables or when B has an index
- Slow for large unindexed tables
Hash Join
- Build hash table from smaller table
- Probe with larger table
- Good for large tables, equality joins
Merge Join
- Sort both tables on join key
- Merge sorted results
- Efficient when data is already sorted
Note
You don't choose the algorithm directly—the query optimizer does. But you can influence it through indexes and query structure.
Performance Tips
1. Index Foreign Keys
CREATE INDEX idx_books_author_id ON books(author_id);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);2. Filter Early
-- Better: filter in the join or WHERE
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01';
-- Worse: filtering in a derived table after joining everything3. Select Only Needed Columns
-- Good
SELECT c.first_name, c.last_name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name;
-- Avoid
SELECT * FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;4. Use EXPLAIN
EXPLAIN ANALYZE
SELECT * FROM books b
JOIN authors a ON b.author_id = a.author_id
WHERE b.price > 15;Practical Examples
Example 1: Finding Orphan Records
-- Order items referencing deleted books
SELECT oi.*
FROM order_items oi
LEFT JOIN books b ON oi.book_id = b.book_id
WHERE b.book_id IS NULL;
-- Customers with no orders
SELECT c.customer_id, c.first_name, c.last_name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;Example 2: Comparison Report
-- Compare this month's sales to last month by product
WITH current_month AS (
SELECT book_id, SUM(quantity) AS qty
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY book_id
),
last_month AS (
SELECT book_id, SUM(quantity) AS qty
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '1 month'
AND o.order_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY book_id
)
SELECT
b.title,
COALESCE(cm.qty, 0) AS this_month,
COALESCE(lm.qty, 0) AS last_month,
COALESCE(cm.qty, 0) - COALESCE(lm.qty, 0) AS change
FROM books b
LEFT JOIN current_month cm ON b.book_id = cm.book_id
LEFT JOIN last_month lm ON b.book_id = lm.book_id
ORDER BY change DESC;Example 3: Full Customer Order History
SELECT
c.customer_id,
c.first_name,
c.last_name,
COUNT(DISTINCT o.order_id) AS order_count,
COUNT(oi.order_item_id) AS items_purchased,
COALESCE(SUM(oi.quantity), 0) AS total_units,
COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS total_spent,
STRING_AGG(DISTINCT b.title, ', ' ORDER BY b.title) AS books_purchased
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN books b ON oi.book_id = b.book_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY total_spent DESC;What's Next
The next chapter covers conditional logic with CASE expressions—adding if/then/else logic directly into your SQL queries.