Grouping and Aggregates
Grouping and Aggregates
So far, our queries have returned individual rows. But often you need summarized data: total sales, average prices, customer counts. Aggregate functions and GROUP BY let you collapse many rows into summary statistics.
Aggregate Functions
Aggregate functions operate on sets of rows and return a single value.
COUNT: Counting Rows
-- Count all rows
SELECT COUNT(*) AS total_books FROM books;| total_books |
|---|
| 6 |
-- Count non-NULL values in a column
SELECT COUNT(birth_date) AS authors_with_birthdate FROM authors;
-- Count distinct values
SELECT COUNT(DISTINCT author_id) AS unique_authors FROM books;Note
COUNT(*) counts all rows, including those with NULL values. COUNT(column)
counts only non-NULL values in that column.
SUM: Total of Values
SELECT
SUM(stock_quantity) AS total_inventory,
SUM(price * stock_quantity) AS total_inventory_value
FROM books;| total_inventory | total_inventory_value |
|---|---|
| 183 | 2867.20 |
AVG: Average Value
SELECT
AVG(price) AS average_price,
ROUND(AVG(price), 2) AS avg_rounded
FROM books;| average_price | avg_rounded |
|---|---|
| 16.32333... | 16.32 |
MIN and MAX: Extremes
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MIN(publication_year) AS oldest,
MAX(publication_year) AS newest
FROM books;| cheapest | most_expensive | oldest | newest |
|---|---|---|---|
| 12.99 | 18.99 | 1813 | 2002 |
Combining Aggregates
SELECT
COUNT(*) AS total_orders,
SUM(total_amount) AS revenue,
AVG(total_amount) AS avg_order_value,
MIN(total_amount) AS smallest_order,
MAX(total_amount) AS largest_order
FROM orders
WHERE status = 'completed';GROUP BY: Aggregating by Categories
GROUP BY divides rows into groups and applies aggregate functions to
each group:
SELECT
author_id,
COUNT(*) AS book_count,
AVG(price) AS avg_price
FROM books
GROUP BY author_id;| author_id | book_count | avg_price |
|---|---|---|
| 1 | 2 | 14.49 |
| 2 | 1 | 14.99 |
| 3 | 1 | 18.99 |
| 4 | 2 | 17.49 |
The GROUP BY Rule
When using GROUP BY, every column in SELECT must either:
- Be in the GROUP BY clause, OR
- Be inside an aggregate function
-- ERROR: title is not grouped or aggregated
SELECT author_id, title, COUNT(*)
FROM books
GROUP BY author_id;
-- CORRECT: all non-aggregated columns are grouped
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id;Grouping by Multiple Columns
-- Sales by customer and status
SELECT
customer_id,
status,
COUNT(*) AS order_count,
SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id, status;| customer_id | status | order_count | total_spent |
|---|---|---|---|
| 1 | completed | 1 | 31.98 |
| 1 | shipped | 1 | 46.97 |
| 2 | completed | 1 | 18.99 |
Grouping with Joins
SELECT
a.first_name,
a.last_name,
COUNT(b.book_id) AS book_count,
ROUND(AVG(b.price), 2) AS avg_price
FROM authors a
LEFT JOIN books b ON a.author_id = b.author_id
GROUP BY a.author_id, a.first_name, a.last_name;| first_name | last_name | book_count | avg_price |
|---|---|---|---|
| George | Orwell | 2 | 14.49 |
| Jane | Austen | 1 | 14.99 |
| Gabriel | García Márquez | 1 | 18.99 |
| Haruki | Murakami | 2 | 17.49 |
HAVING: Filtering Groups
WHERE filters rows before grouping. HAVING filters groups
after aggregation.
-- Authors with more than one book
SELECT
author_id,
COUNT(*) AS book_count
FROM books
GROUP BY author_id
HAVING COUNT(*) > 1;| author_id | book_count |
|---|---|
| 1 | 2 |
| 4 | 2 |
WHERE vs. HAVING
SELECT
author_id,
COUNT(*) AS book_count,
AVG(price) AS avg_price
FROM books
WHERE price > 13 -- Filter ROWS first
GROUP BY author_id
HAVING COUNT(*) >= 1 -- Then filter GROUPS
AND AVG(price) > 15;Understanding the Difference
-- WHERE: "Only consider books priced over $15"
-- HAVING: "Only show authors whose average book price is over $15"
-- These give different results:
-- High-priced books by author
SELECT author_id, COUNT(*) FROM books
WHERE price > 15
GROUP BY author_id;
-- Authors with high average prices
SELECT author_id, AVG(price) FROM books
GROUP BY author_id
HAVING AVG(price) > 15;Warning
You cannot use column aliases in HAVING because HAVING is processed before SELECT. Repeat the aggregate expression:
-- This won't work:
SELECT author_id, COUNT(*) AS cnt FROM books
GROUP BY author_id
HAVING cnt > 1; -- Error: cnt doesn't exist yet
-- Do this instead:
SELECT author_id, COUNT(*) AS cnt FROM books
GROUP BY author_id
HAVING COUNT(*) > 1; -- Repeat the expressionQuery Execution Order (Revisited)
Understanding execution order clarifies why WHERE and HAVING behave differently:
- FROM - Identify tables
- WHERE - Filter individual rows
- GROUP BY - Create groups
- HAVING - Filter groups
- SELECT - Choose columns, apply aliases
- DISTINCT - Remove duplicates
- ORDER BY - Sort results
- LIMIT - Restrict output
Aggregate Functions with NULL
Aggregates generally ignore NULL values:
-- Given: prices = [10.00, NULL, 20.00, NULL, 30.00]
SELECT
COUNT(*) AS all_rows, -- 5
COUNT(price) AS non_null, -- 3
SUM(price) AS total, -- 60.00
AVG(price) AS average; -- 20.00 (60/3, not 60/5)Note
AVG ignores NULLs, which might not be what you want. To treat NULLs as zero:
AVG(COALESCE(price, 0))
Advanced Grouping
ROLLUP: Subtotals and Grand Totals
-- MySQL / PostgreSQL
SELECT
author_id,
publication_year,
COUNT(*) AS book_count,
SUM(price) AS total_price
FROM books
GROUP BY ROLLUP(author_id, publication_year);This produces:
- Rows for each author/year combination
- Subtotals for each author (year = NULL)
- Grand total (author_id and year both NULL)
CUBE: All Combinations
SELECT
author_id,
publication_year,
COUNT(*) AS book_count
FROM books
GROUP BY CUBE(author_id, publication_year);CUBE produces all possible subtotal combinations.
GROUPING SETS: Specific Combinations
SELECT
author_id,
publication_year,
COUNT(*) AS book_count
FROM books
GROUP BY GROUPING SETS (
(author_id, publication_year),
(author_id),
(publication_year),
()
);Practical Examples
Example 1: Sales Report
SELECT
YEAR(order_date) AS year,
MONTH(order_date) AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue,
ROUND(AVG(total_amount), 2) AS avg_order
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;Example 2: Customer Analysis
SELECT
c.customer_id,
CONCAT(c.first_name, ' ', c.last_name) AS customer_name,
COUNT(o.order_id) AS total_orders,
COALESCE(SUM(o.total_amount), 0) AS lifetime_value,
COALESCE(ROUND(AVG(o.total_amount), 2), 0) AS avg_order_value,
MAX(o.order_date) AS last_order
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY lifetime_value DESC;| customer_name | total_orders | lifetime_value | avg_order_value | last_order |
|---|---|---|---|---|
| Alice Chen | 2 | 78.95 | 39.48 | 2024-01-17 |
| Bob Martinez | 1 | 18.99 | 18.99 | 2024-01-16 |
| Carol Taylor | 0 | 0.00 | 0.00 | NULL |
Example 3: Inventory Summary
SELECT
CASE
WHEN stock_quantity = 0 THEN 'Out of Stock'
WHEN stock_quantity < 25 THEN 'Low Stock'
WHEN stock_quantity < 40 THEN 'Normal'
ELSE 'Well Stocked'
END AS stock_status,
COUNT(*) AS book_count,
SUM(stock_quantity) AS total_units,
ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY
CASE
WHEN stock_quantity = 0 THEN 'Out of Stock'
WHEN stock_quantity < 25 THEN 'Low Stock'
WHEN stock_quantity < 40 THEN 'Normal'
ELSE 'Well Stocked'
END;Example 4: Top Selling Books
SELECT
b.title,
COUNT(oi.order_item_id) AS times_ordered,
SUM(oi.quantity) AS total_sold,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM books b
LEFT JOIN order_items oi ON b.book_id = oi.book_id
GROUP BY b.book_id, b.title
HAVING SUM(oi.quantity) > 0
ORDER BY total_sold DESC
LIMIT 5;Example 5: Author Performance
SELECT
CONCAT(a.first_name, ' ', a.last_name) AS author,
COUNT(DISTINCT b.book_id) AS books_in_catalog,
COALESCE(SUM(oi.quantity), 0) AS total_copies_sold,
COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS total_revenue
FROM authors a
LEFT JOIN books b ON a.author_id = b.author_id
LEFT JOIN order_items oi ON b.book_id = oi.book_id
GROUP BY a.author_id, a.first_name, a.last_name
ORDER BY total_revenue DESC;What's Next
The next chapter covers subqueries—queries nested inside other queries—which enable powerful filtering and data transformation.