Learning Guides
Menu

Grouping and Aggregates

7 min readLearning SQL

Grouping and Aggregates

So far, our queries have returned individual rows. But often you need summarized data: total sales, average prices, customer counts. Aggregate functions and GROUP BY let you collapse many rows into summary statistics.

Aggregate Functions

Aggregate functions operate on sets of rows and return a single value.

COUNT: Counting Rows

SQL
-- Count all rows
SELECT COUNT(*) AS total_books FROM books;
total_books
6
SQL
-- Count non-NULL values in a column
SELECT COUNT(birth_date) AS authors_with_birthdate FROM authors;
 
-- Count distinct values
SELECT COUNT(DISTINCT author_id) AS unique_authors FROM books;

Note

COUNT(*) counts all rows, including those with NULL values. COUNT(column) counts only non-NULL values in that column.

SUM: Total of Values

SQL
SELECT
    SUM(stock_quantity) AS total_inventory,
    SUM(price * stock_quantity) AS total_inventory_value
FROM books;
total_inventorytotal_inventory_value
1832867.20

AVG: Average Value

SQL
SELECT
    AVG(price) AS average_price,
    ROUND(AVG(price), 2) AS avg_rounded
FROM books;
average_priceavg_rounded
16.32333...16.32

MIN and MAX: Extremes

SQL
SELECT
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive,
    MIN(publication_year) AS oldest,
    MAX(publication_year) AS newest
FROM books;
cheapestmost_expensiveoldestnewest
12.9918.9918132002

Combining Aggregates

SQL
SELECT
    COUNT(*) AS total_orders,
    SUM(total_amount) AS revenue,
    AVG(total_amount) AS avg_order_value,
    MIN(total_amount) AS smallest_order,
    MAX(total_amount) AS largest_order
FROM orders
WHERE status = 'completed';

GROUP BY: Aggregating by Categories

GROUP BY divides rows into groups and applies aggregate functions to each group:

SQL
SELECT
    author_id,
    COUNT(*) AS book_count,
    AVG(price) AS avg_price
FROM books
GROUP BY author_id;
author_idbook_countavg_price
1214.49
2114.99
3118.99
4217.49

The GROUP BY Rule

When using GROUP BY, every column in SELECT must either:

  1. Be in the GROUP BY clause, OR
  2. Be inside an aggregate function
SQL
-- ERROR: title is not grouped or aggregated
SELECT author_id, title, COUNT(*)
FROM books
GROUP BY author_id;
 
-- CORRECT: all non-aggregated columns are grouped
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id;

Grouping by Multiple Columns

SQL
-- Sales by customer and status
SELECT
    customer_id,
    status,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id, status;
customer_idstatusorder_counttotal_spent
1completed131.98
1shipped146.97
2completed118.99

Grouping with Joins

SQL
SELECT
    a.first_name,
    a.last_name,
    COUNT(b.book_id) AS book_count,
    ROUND(AVG(b.price), 2) AS avg_price
FROM authors a
LEFT JOIN books b ON a.author_id = b.author_id
GROUP BY a.author_id, a.first_name, a.last_name;
first_namelast_namebook_countavg_price
GeorgeOrwell214.49
JaneAusten114.99
GabrielGarcía Márquez118.99
HarukiMurakami217.49

HAVING: Filtering Groups

WHERE filters rows before grouping. HAVING filters groups after aggregation.

SQL
-- Authors with more than one book
SELECT
    author_id,
    COUNT(*) AS book_count
FROM books
GROUP BY author_id
HAVING COUNT(*) > 1;
author_idbook_count
12
42

WHERE vs. HAVING

SQL
SELECT
    author_id,
    COUNT(*) AS book_count,
    AVG(price) AS avg_price
FROM books
WHERE price > 13                    -- Filter ROWS first
GROUP BY author_id
HAVING COUNT(*) >= 1                -- Then filter GROUPS
   AND AVG(price) > 15;

Understanding the Difference

SQL
-- WHERE: "Only consider books priced over $15"
-- HAVING: "Only show authors whose average book price is over $15"
 
-- These give different results:
 
-- High-priced books by author
SELECT author_id, COUNT(*) FROM books
WHERE price > 15
GROUP BY author_id;
 
-- Authors with high average prices
SELECT author_id, AVG(price) FROM books
GROUP BY author_id
HAVING AVG(price) > 15;

Warning

You cannot use column aliases in HAVING because HAVING is processed before SELECT. Repeat the aggregate expression:

SQL
-- This won't work:
SELECT author_id, COUNT(*) AS cnt FROM books
GROUP BY author_id
HAVING cnt > 1;  -- Error: cnt doesn't exist yet
 
-- Do this instead:
SELECT author_id, COUNT(*) AS cnt FROM books
GROUP BY author_id
HAVING COUNT(*) > 1;  -- Repeat the expression

Query Execution Order (Revisited)

Understanding execution order clarifies why WHERE and HAVING behave differently:

  1. FROM - Identify tables
  2. WHERE - Filter individual rows
  3. GROUP BY - Create groups
  4. HAVING - Filter groups
  5. SELECT - Choose columns, apply aliases
  6. DISTINCT - Remove duplicates
  7. ORDER BY - Sort results
  8. LIMIT - Restrict output

Aggregate Functions with NULL

Aggregates generally ignore NULL values:

SQL
-- Given: prices = [10.00, NULL, 20.00, NULL, 30.00]
 
SELECT
    COUNT(*) AS all_rows,        -- 5
    COUNT(price) AS non_null,    -- 3
    SUM(price) AS total,         -- 60.00
    AVG(price) AS average;       -- 20.00 (60/3, not 60/5)

Note

AVG ignores NULLs, which might not be what you want. To treat NULLs as zero: AVG(COALESCE(price, 0))

Advanced Grouping

ROLLUP: Subtotals and Grand Totals

SQL
-- MySQL / PostgreSQL
SELECT
    author_id,
    publication_year,
    COUNT(*) AS book_count,
    SUM(price) AS total_price
FROM books
GROUP BY ROLLUP(author_id, publication_year);

This produces:

  • Rows for each author/year combination
  • Subtotals for each author (year = NULL)
  • Grand total (author_id and year both NULL)

CUBE: All Combinations

SQL
SELECT
    author_id,
    publication_year,
    COUNT(*) AS book_count
FROM books
GROUP BY CUBE(author_id, publication_year);

CUBE produces all possible subtotal combinations.

GROUPING SETS: Specific Combinations

SQL
SELECT
    author_id,
    publication_year,
    COUNT(*) AS book_count
FROM books
GROUP BY GROUPING SETS (
    (author_id, publication_year),
    (author_id),
    (publication_year),
    ()
);

Practical Examples

Example 1: Sales Report

SQL
SELECT
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue,
    ROUND(AVG(total_amount), 2) AS avg_order
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

Example 2: Customer Analysis

SQL
SELECT
    c.customer_id,
    CONCAT(c.first_name, ' ', c.last_name) AS customer_name,
    COUNT(o.order_id) AS total_orders,
    COALESCE(SUM(o.total_amount), 0) AS lifetime_value,
    COALESCE(ROUND(AVG(o.total_amount), 2), 0) AS avg_order_value,
    MAX(o.order_date) AS last_order
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY lifetime_value DESC;
customer_nametotal_orderslifetime_valueavg_order_valuelast_order
Alice Chen278.9539.482024-01-17
Bob Martinez118.9918.992024-01-16
Carol Taylor00.000.00NULL

Example 3: Inventory Summary

SQL
SELECT
    CASE
        WHEN stock_quantity = 0 THEN 'Out of Stock'
        WHEN stock_quantity < 25 THEN 'Low Stock'
        WHEN stock_quantity < 40 THEN 'Normal'
        ELSE 'Well Stocked'
    END AS stock_status,
    COUNT(*) AS book_count,
    SUM(stock_quantity) AS total_units,
    ROUND(AVG(price), 2) AS avg_price
FROM books
GROUP BY
    CASE
        WHEN stock_quantity = 0 THEN 'Out of Stock'
        WHEN stock_quantity < 25 THEN 'Low Stock'
        WHEN stock_quantity < 40 THEN 'Normal'
        ELSE 'Well Stocked'
    END;

Example 4: Top Selling Books

SQL
SELECT
    b.title,
    COUNT(oi.order_item_id) AS times_ordered,
    SUM(oi.quantity) AS total_sold,
    SUM(oi.quantity * oi.unit_price) AS revenue
FROM books b
LEFT JOIN order_items oi ON b.book_id = oi.book_id
GROUP BY b.book_id, b.title
HAVING SUM(oi.quantity) > 0
ORDER BY total_sold DESC
LIMIT 5;

Example 5: Author Performance

SQL
SELECT
    CONCAT(a.first_name, ' ', a.last_name) AS author,
    COUNT(DISTINCT b.book_id) AS books_in_catalog,
    COALESCE(SUM(oi.quantity), 0) AS total_copies_sold,
    COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS total_revenue
FROM authors a
LEFT JOIN books b ON a.author_id = b.author_id
LEFT JOIN order_items oi ON b.book_id = oi.book_id
GROUP BY a.author_id, a.first_name, a.last_name
ORDER BY total_revenue DESC;

What's Next

The next chapter covers subqueries—queries nested inside other queries—which enable powerful filtering and data transformation.