Working with Sets
Working with Sets
SQL is built on set theory. Beyond joins, SQL provides set operators that combine the results of multiple queries. These operators—UNION, INTERSECT, and EXCEPT—treat query results as mathematical sets.
Set Operations vs. Joins
Joins combine columns from different tables horizontally:
Table A JOIN Table B
[col1, col2] + [col3, col4] = [col1, col2, col3, col4]Set operations combine rows vertically:
Query A results
+
Query B results
=
Combined rowsFor set operations to work, both queries must have the same number of columns with compatible data types.
UNION: Combining Results
UNION combines results from two queries and removes duplicates:
-- All names from both customers and authors
SELECT first_name, last_name FROM customers
UNION
SELECT first_name, last_name FROM authors;| first_name | last_name |
|---|---|
| Alice | Chen |
| Bob | Martinez |
| Carol | Taylor |
| George | Orwell |
| Jane | Austen |
| Gabriel | García Márquez |
| Haruki | Murakami |
UNION ALL: Keep Duplicates
UNION ALL keeps all rows, including duplicates:
SELECT first_name, last_name FROM customers
UNION ALL
SELECT first_name, last_name FROM authors;Note
UNION ALL is faster than UNION because it doesn't need to check for and remove duplicates. Use UNION ALL when you know there are no duplicates or when duplicates are acceptable.
Practical UNION Example
-- Create a unified contact list from multiple sources
SELECT
'Customer' AS contact_type,
first_name,
last_name,
email
FROM customers
UNION
SELECT
'Author' AS contact_type,
first_name,
last_name,
NULL AS email -- Authors don't have email in our schema
FROM authors
ORDER BY last_name, first_name;| contact_type | first_name | last_name | |
|---|---|---|---|
| Jane | Austen | Author | NULL |
| Alice | Chen | Customer | alice.chen@email.com |
| Gabriel | García Márquez | Author | NULL |
| ... | ... | ... | ... |
INTERSECT: Finding Common Rows
INTERSECT returns only rows that appear in both query results:
-- Find first names that appear in both customers and authors
SELECT first_name FROM customers
INTERSECT
SELECT first_name FROM authors;If a customer named "George" existed, "George" would appear in the result.
Practical INTERSECT Example
-- Books that have been ordered AND are currently in stock
SELECT book_id, title FROM books WHERE stock_quantity > 0
INTERSECT
SELECT DISTINCT b.book_id, b.title
FROM books b
JOIN order_items oi ON b.book_id = oi.book_id;Warning
MySQL doesn't support INTERSECT directly. Use a JOIN or EXISTS subquery instead:
-- MySQL alternative to INTERSECT
SELECT DISTINCT b.book_id, b.title
FROM books b
JOIN order_items oi ON b.book_id = oi.book_id
WHERE b.stock_quantity > 0;EXCEPT: Finding Differences
EXCEPT (called MINUS in Oracle) returns rows from the first query
that don't appear in the second:
-- Find books that have never been ordered
SELECT book_id, title FROM books
EXCEPT
SELECT DISTINCT b.book_id, b.title
FROM books b
JOIN order_items oi ON b.book_id = oi.book_id;Order Matters with EXCEPT
EXCEPT is not symmetric:
-- Query A EXCEPT Query B: rows in A but not in B
-- Query B EXCEPT Query A: rows in B but not in A (different result!)-- Books not ordered (A - B)
SELECT book_id FROM books
EXCEPT
SELECT book_id FROM order_items;
-- Order items for deleted books (B - A) - different question!
SELECT book_id FROM order_items
EXCEPT
SELECT book_id FROM books;Warning
MySQL doesn't support EXCEPT. Use NOT EXISTS or NOT IN instead:
-- MySQL alternative to EXCEPT
SELECT book_id, title FROM books b
WHERE NOT EXISTS (
SELECT 1 FROM order_items oi WHERE oi.book_id = b.book_id
);Rules for Set Operations
1. Same Number of Columns
Both queries must select the same number of columns:
-- Error: different column counts
SELECT first_name, last_name, email FROM customers
UNION
SELECT first_name, last_name FROM authors; -- Missing column!
-- Fix: add NULL or a placeholder
SELECT first_name, last_name, email FROM customers
UNION
SELECT first_name, last_name, NULL AS email FROM authors;2. Compatible Data Types
Corresponding columns must have compatible types:
-- Error: can't union INT with VARCHAR
SELECT customer_id FROM customers
UNION
SELECT email FROM customers;
-- Works: both are strings
SELECT first_name FROM customers
UNION
SELECT last_name FROM authors;3. Column Names Come from First Query
The result uses column names from the first query:
SELECT first_name AS name FROM customers
UNION
SELECT last_name AS family_name FROM authors;
-- Result column is named "name", not "family_name"Ordering Set Operation Results
ORDER BY applies to the entire combined result and must come last:
SELECT first_name, last_name, 'Customer' AS type FROM customers
UNION
SELECT first_name, last_name, 'Author' AS type FROM authors
ORDER BY last_name, first_name; -- Sorts the combined resultWarning
You cannot ORDER BY individual queries in a UNION. Only one ORDER BY is allowed, at the very end.
Combining Multiple Set Operations
You can chain set operations:
SELECT column1 FROM table1
UNION
SELECT column1 FROM table2
UNION
SELECT column1 FROM table3
EXCEPT
SELECT column1 FROM table4;Precedence
In standard SQL, INTERSECT has higher precedence than UNION and EXCEPT. Use parentheses for clarity:
-- Without parentheses, INTERSECT binds first
SELECT * FROM A
UNION
SELECT * FROM B
INTERSECT
SELECT * FROM C;
-- Equivalent to: A UNION (B INTERSECT C)
-- Use parentheses for different grouping
(SELECT * FROM A
UNION
SELECT * FROM B)
INTERSECT
SELECT * FROM C;Practical Examples
Example 1: Unified Activity Feed
-- Combine different types of activities into one feed
SELECT
order_date AS activity_date,
'Order Placed' AS activity_type,
CONCAT('Order #', order_id, ' - $', total_amount) AS description
FROM orders
UNION ALL
SELECT
join_date AS activity_date,
'New Customer' AS activity_type,
CONCAT(first_name, ' ', last_name, ' joined') AS description
FROM customers
ORDER BY activity_date DESC
LIMIT 10;Example 2: Finding Gaps
-- Find authors with no books in our catalog
SELECT author_id, first_name, last_name
FROM authors
WHERE author_id NOT IN (
SELECT DISTINCT author_id FROM books
);
-- Using EXCEPT (where supported)
SELECT author_id FROM authors
EXCEPT
SELECT author_id FROM books;Example 3: Comparing Two Time Periods
-- Customers who ordered in January but not February
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01'
EXCEPT
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2024-02-01' AND order_date < '2024-03-01';Example 4: Finding Common Customers
-- Customers who bought both Orwell books
SELECT customer_id
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE oi.book_id = 1 -- 1984
INTERSECT
SELECT customer_id
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE oi.book_id = 2; -- Animal FarmSet Operations vs. Alternatives
Often you can achieve the same result multiple ways:
UNION vs. OR
-- Using UNION
SELECT * FROM books WHERE author_id = 1
UNION
SELECT * FROM books WHERE author_id = 4;
-- Using OR (simpler for same table)
SELECT * FROM books WHERE author_id = 1 OR author_id = 4;
-- Using IN (even simpler)
SELECT * FROM books WHERE author_id IN (1, 4);INTERSECT vs. JOIN
-- Using INTERSECT
SELECT customer_id FROM orders WHERE status = 'completed'
INTERSECT
SELECT customer_id FROM orders WHERE total_amount > 30;
-- Using AND (simpler for same table)
SELECT DISTINCT customer_id FROM orders
WHERE status = 'completed' AND total_amount > 30;EXCEPT vs. NOT IN / NOT EXISTS
-- Using EXCEPT
SELECT book_id FROM books
EXCEPT
SELECT book_id FROM order_items;
-- Using NOT IN
SELECT book_id FROM books
WHERE book_id NOT IN (SELECT book_id FROM order_items);
-- Using NOT EXISTS
SELECT book_id FROM books b
WHERE NOT EXISTS (
SELECT 1 FROM order_items oi WHERE oi.book_id = b.book_id
);When to Use Set Operations
Use set operations when:
- Combining results from different tables with similar structures
- The logic is clearer with set terminology
- You need to remove duplicates (UNION) or find common elements (INTERSECT)
Use JOINs/subqueries when:
- Working with a single table
- You need columns from multiple tables in the result
- Your database doesn't support certain set operations
What's Next
The next chapter covers data generation and conversion—string manipulation, date functions, and converting between data types.