Subqueries
Subqueries
A subquery is a query nested inside another query. Subqueries let you use the result of one query as input to another, enabling complex filtering and data transformations that would otherwise require multiple steps.
Subquery Basics
A subquery is enclosed in parentheses and can appear in several places:
-- In WHERE clause
SELECT * FROM books
WHERE author_id IN (SELECT author_id FROM authors WHERE nationality = 'British');
-- In FROM clause (derived table)
SELECT * FROM (SELECT * FROM books WHERE price > 15) AS expensive_books;
-- In SELECT clause (scalar subquery)
SELECT title, (SELECT MAX(price) FROM books) AS max_price FROM books;Subqueries in WHERE Clauses
Single-Value Subqueries
When a subquery returns exactly one value, use it with comparison operators:
-- Find books priced above average
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);| title | price |
|---|---|
| One Hundred Years of Solitude | 18.99 |
| Norwegian Wood | 16.99 |
| Kafka on the Shore | 17.99 |
-- Find the most recent order
SELECT * FROM orders
WHERE order_date = (SELECT MAX(order_date) FROM orders);Warning
If a subquery used with =, >, < etc. returns more than one row, you'll get an error. Use IN, ANY, or ALL for multi-row results.
Multi-Value Subqueries with IN
When a subquery returns multiple values, use IN:
-- Books by British authors
SELECT title, price
FROM books
WHERE author_id IN (
SELECT author_id
FROM authors
WHERE nationality = 'British'
);| title | price |
|---|---|
| 1984 | 15.99 |
| Animal Farm | 12.99 |
| Pride and Prejudice | 14.99 |
-- Customers who have placed orders
SELECT first_name, last_name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id FROM orders
);NOT IN for Exclusion
-- Books that have never been ordered
SELECT title
FROM books
WHERE book_id NOT IN (
SELECT DISTINCT book_id FROM order_items
);NOT IN and NULL
If the subquery returns any NULL values, NOT IN returns no results! This is because x NOT IN (1, 2, NULL) evaluates to UNKNOWN.
-- Safer: use NOT EXISTS instead
SELECT title FROM books b
WHERE NOT EXISTS (
SELECT 1 FROM order_items oi WHERE oi.book_id = b.book_id
);ANY and ALL Operators
ANY (or SOME) returns true if the comparison is true for at least one
value:
-- Books cheaper than ANY Murakami book
SELECT title, price
FROM books
WHERE price < ANY (
SELECT price FROM books WHERE author_id = 4
);ALL returns true if the comparison is true for every value:
-- Books cheaper than ALL Murakami books
SELECT title, price
FROM books
WHERE price < ALL (
SELECT price FROM books WHERE author_id = 4
);Correlated Subqueries
A correlated subquery references the outer query. It executes once for each row in the outer query.
-- Books priced above their author's average
SELECT b.title, b.price, b.author_id
FROM books b
WHERE b.price > (
SELECT AVG(b2.price)
FROM books b2
WHERE b2.author_id = b.author_id -- References outer query
);The subquery WHERE b2.author_id = b.author_id correlates with the outer query—it calculates a different average for each author.
EXISTS and NOT EXISTS
EXISTS tests whether a subquery returns any rows:
-- Customers who have placed at least one order
SELECT c.first_name, c.last_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);NOT EXISTS finds rows with no matching subquery results:
-- Authors with no books in our catalog
SELECT a.first_name, a.last_name
FROM authors a
WHERE NOT EXISTS (
SELECT 1
FROM books b
WHERE b.author_id = a.author_id
);Note
EXISTS only checks for existence—it doesn't matter what columns the subquery
selects. SELECT 1, SELECT *, or SELECT column all work the same.
Derived Tables (Subqueries in FROM)
A subquery in the FROM clause creates a temporary table for the outer query:
-- Average of per-author book counts
SELECT AVG(book_count) AS avg_books_per_author
FROM (
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id
) AS author_counts;Warning
Derived tables MUST have an alias in most databases. The AS author_counts is
required.
Practical Uses of Derived Tables
-- Top customers with their ranking
SELECT
customer_name,
total_spent,
RANK() OVER (ORDER BY total_spent DESC) AS spending_rank
FROM (
SELECT
CONCAT(c.first_name, ' ', c.last_name) AS customer_name,
COALESCE(SUM(o.total_amount), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
) AS customer_totals;Scalar Subqueries in SELECT
A scalar subquery returns exactly one value and can appear in the SELECT clause:
SELECT
title,
price,
(SELECT AVG(price) FROM books) AS avg_price,
price - (SELECT AVG(price) FROM books) AS diff_from_avg
FROM books;| title | price | avg_price | diff_from_avg |
|---|---|---|---|
| 1984 | 15.99 | 16.32 | -0.33 |
| Animal Farm | 12.99 | 16.32 | -3.33 |
| ... | ... | ... | ... |
Correlated Scalar Subqueries
-- Each book with its author's total book count
SELECT
b.title,
b.author_id,
(SELECT COUNT(*) FROM books b2 WHERE b2.author_id = b.author_id) AS author_book_count
FROM books b;| title | author_id | author_book_count |
|---|---|---|
| 1984 | 1 | 2 |
| Animal Farm | 1 | 2 |
| Pride and Prejudice | 2 | 1 |
| Norwegian Wood | 4 | 2 |
| Kafka on the Shore | 4 | 2 |
Common Table Expressions (CTEs)
CTEs provide a cleaner alternative to derived tables:
WITH author_stats AS (
SELECT
author_id,
COUNT(*) AS book_count,
AVG(price) AS avg_price
FROM books
GROUP BY author_id
)
SELECT
a.first_name,
a.last_name,
s.book_count,
s.avg_price
FROM authors a
JOIN author_stats s ON a.author_id = s.author_id;Multiple CTEs
WITH
order_totals AS (
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
),
customer_ranks AS (
SELECT
customer_id,
total_spent,
RANK() OVER (ORDER BY total_spent DESC) AS rank
FROM order_totals
)
SELECT
c.first_name,
c.last_name,
cr.total_spent,
cr.rank
FROM customers c
JOIN customer_ranks cr ON c.customer_id = cr.customer_id;Recursive CTEs
For hierarchical data like organizational charts or category trees:
-- Example: Employee hierarchy
WITH RECURSIVE employee_tree AS (
-- Base case: top-level managers
SELECT employee_id, name, manager_id, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees under managers
SELECT e.employee_id, e.name, e.manager_id, et.level + 1
FROM employees e
JOIN employee_tree et ON e.manager_id = et.employee_id
)
SELECT * FROM employee_tree ORDER BY level, name;Subquery vs. Join
Many subqueries can be rewritten as joins:
-- Subquery version
SELECT title FROM books
WHERE author_id IN (
SELECT author_id FROM authors WHERE nationality = 'British'
);
-- Join version
SELECT b.title
FROM books b
JOIN authors a ON b.author_id = a.author_id
WHERE a.nationality = 'British';When to Use Which?
| Use Subquery When... | Use Join When... |
|---|---|
| You need values from one table only | You need columns from multiple tables |
| Logic is clearer with nested structure | Performance is critical (often faster) |
| Using EXISTS/NOT EXISTS | Joining on multiple conditions |
| Computing aggregates for filtering | Many-to-many relationships |
Practical Examples
Example 1: Above-Average Analysis
-- Books priced above their category's average
WITH category_avg AS (
SELECT
author_id,
AVG(price) AS avg_price
FROM books
GROUP BY author_id
)
SELECT
b.title,
b.price,
a.avg_price AS author_avg,
b.price - a.avg_price AS above_avg
FROM books b
JOIN category_avg a ON b.author_id = a.author_id
WHERE b.price > a.avg_price;Example 2: Customers Without Recent Orders
-- Customers who haven't ordered in the last 30 days
SELECT first_name, last_name, email
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date > CURRENT_DATE - INTERVAL 30 DAY
);Example 3: Running Totals with Subquery
-- Cumulative order totals per customer
SELECT
o.order_id,
o.customer_id,
o.order_date,
o.total_amount,
(
SELECT SUM(o2.total_amount)
FROM orders o2
WHERE o2.customer_id = o.customer_id
AND o2.order_date <= o.order_date
) AS running_total
FROM orders o
ORDER BY o.customer_id, o.order_date;Example 4: Percentile Ranking
-- Find books in the top 25% by price
SELECT title, price
FROM books b
WHERE price >= (
SELECT price FROM books
ORDER BY price DESC
LIMIT 1 OFFSET (SELECT CEIL(COUNT(*) * 0.25) - 1 FROM books)
)
ORDER BY price DESC;What's Next
The next chapter revisits joins with more advanced techniques—self-joins, multiple join types in one query, and complex join conditions.