query-optimization
$npx skills add blunotech-dev/agents --skill query-optimizationOptimize slow SQL queries using plan analysis, indexing, join tuning, subquery/CTE simplification, and pagination improvements. Use when the user shares slow queries, EXPLAIN output, or reports high execution time, sequential scans, or inefficient joins.
| name | description | category |
|---|---|---|
| query-optimization | Optimize slow SQL queries using plan analysis, indexing, join tuning, subquery/CTE simplification, and pagination improvements. Use when the user shares slow queries, EXPLAIN output, or reports high execution time, sequential scans, or inefficient joins. | Backend |
Query Optimization Skill
Phase 1 — Discovery
Ask only what context doesn't reveal:
- Database engine and version? Planner behavior, available hints, and index types differ significantly. Postgres 14+ has improved parallel query and CTE materialization defaults vs. older versions.
- Do you have
EXPLAIN ANALYZEoutput? Without it, optimization is guesswork. Ask them to run it and share — not justEXPLAIN(estimates only), butEXPLAIN (ANALYZE, BUFFERS)which shows actual row counts and I/O. - Table row counts for involved tables? The planner's choices make sense or don't in context of data volume.
- Is this a one-time slow query or a recurring endpoint? One-time queries tolerate different tradeoffs (parallel workers, temp indexes) vs. a hot path that runs thousands of times per minute.
- Are statistics up to date? Ask if
ANALYZEhas been run recently on large tables. Stale statistics cause the planner to make systematically wrong choices.
Phase 2 — Plan Analysis
Reading EXPLAIN ANALYZE — what to look for first
Rows estimate vs. actual rows — the most important signal:
A large discrepancy between rows=X (estimate) and actual rows=Y means the planner is working with bad statistics. The rest of the plan is built on a wrong assumption.
Seq Scan on orders (cost=0.00..4821.00 rows=5 width=32)
(actual time=0.041..89.3 rows=48291 loops=1)
Planner expected 5 rows, got 48,291. Every join and operation downstream was sized wrong. Fix: ANALYZE orders; then re-run.
Where to look for the bottleneck — start at the innermost node: Cost accumulates upward. The highest-cost leaf node is usually the problem. Don't optimize the outer join if the inner sequential scan is doing 90% of the work.
loops=N multiplies everything:
A node with actual time=5ms loops=1000 costs 5 seconds total. Nested loop joins on large tables create this silently. The per-loop time looks fine; the total doesn't appear until you multiply.
Buffers: shared hit=X read=Y:
read = disk I/O. High read count means data isn't cached — either the working set is too large for shared_buffers or the index isn't being used and is doing full table reads.
Sequential scan: when it's correct vs. wrong
A sequential scan is correct when:
- Returning >5-10% of the table's rows (index scan has higher overhead at that selectivity)
- The table fits in memory and is small
- Statistics show low cardinality on the filter column
A sequential scan is wrong when:
- The filter is highly selective (returns <1% of rows) and no index covers it
- The table is large and only a few rows match
Trap: adding an index on a low-cardinality column (e.g., status with values active/inactive on a 50/50 split) and wondering why the planner ignores it — it shouldn't use it. Partial indexes fix this:
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';
-- Planner uses this when filtering on status = 'pending' specifically
Phase 3 — Fix Patterns
Join optimization
Join order matters; the planner gets it wrong on complex queries:
Postgres's planner considers all join orderings up to join_collapse_limit (default 8) tables. Beyond that it uses a greedy heuristic and can pick suboptimal order. For queries joining 10+ tables, explicitly reordering joins or using CTEs to materialize intermediate results can help.
Nested loop vs. hash join vs. merge join:
- Nested loop: good for small outer set + indexed inner. Bad when outer set is large.
- Hash join: good for large unsorted sets. Requires memory — if
work_memis too low, spills to disk (showsBatches: Nin plan where N > 1). - Merge join: requires both sides sorted. Good for sorted indexed columns. Rare in practice.
If you see a hash join with Batches: 4, increase work_mem for that session before optimizing indexes:
SET work_mem = '64MB';
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
The implicit cross join trap:
A missing join condition between two tables in a multi-table query produces a cartesian product. It shows up as an astronomically high row estimate in the plan. Look for Nested Loop with rows= in the millions when your tables are thousands of rows each.
Subquery elimination
Correlated subqueries — the hidden N+1 in SQL:
-- Bad: executes subquery once per outer row
SELECT p.*, (SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id) AS comment_count
FROM posts p;
-- Good: single aggregation join
SELECT p.*, COALESCE(c.comment_count, 0) AS comment_count
FROM posts p
LEFT JOIN (
SELECT post_id, COUNT(*) AS comment_count
FROM comments GROUP BY post_id
) c ON c.post_id = p.id;
EXISTS vs IN vs JOIN for filtering:
IN (subquery)materializes the full subquery result — problematic if large.EXISTSshort-circuits on first match — correct for "does at least one exist" checks.JOIN+DISTINCTorGROUP BYis often faster than either for large sets with indexes.- For
NOT IN: a single NULL in the subquery result makesNOT INreturn no rows. Always useNOT EXISTSinstead.
CTEs: materialization trap (Postgres < 12): Before Postgres 12, CTEs were always materialized (optimization fence). The planner couldn't push predicates inside them. A CTE that looked like a clean refactor was actually forcing a full intermediate result. In Postgres 12+, CTEs are inlined by default unless they're recursive or contain side effects. If you're on < 12, rewrite performance-critical CTEs as subqueries.
In Postgres 12+, you can force materialization when you want it (to avoid re-execution):
WITH expensive AS MATERIALIZED (
SELECT ... FROM large_table WHERE ...
)
SELECT * FROM expensive JOIN other ON ...;
Pagination optimization
OFFSET pagination degrades linearly — the fundamental problem:
OFFSET 10000 LIMIT 20 scans and discards 10,000 rows every time. Page 500 is 500x slower than page 1. This is not fixable with indexes.
Keyset (cursor) pagination — the correct replacement:
-- Instead of: SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 400
-- Use: pass the last seen value as a cursor
SELECT * FROM posts
WHERE created_at < :last_seen_created_at
ORDER BY created_at DESC
LIMIT 20;
Requirements: sort column must be indexed, and must be unique or combined with a tiebreaker (usually id):
WHERE (created_at, id) < (:last_created_at, :last_id)
ORDER BY created_at DESC, id DESC
LIMIT 20;
Limitation: keyset pagination doesn't support jumping to arbitrary pages. Only valid for "load more" / infinite scroll patterns. If the product requires page numbers, OFFSET is unavoidable — mitigate with a covering index and a hard cap on max offset.
COUNT(*) for total pages is expensive — avoid or cache it:
SELECT COUNT(*) FROM posts WHERE ... with the same filter as your paginated query often does a full index scan. Options:
- Return an estimated count from
pg_stat_user_tablesfor rough totals - Cache the count and invalidate on write
- Return
has_next_pageboolean instead of total count (requires fetchingLIMIT + 1rows)
Index patterns for common slow query shapes
Covering index — eliminates table heap fetch: If a query selects only a few columns and filters on others, a covering index that includes all referenced columns lets Postgres answer entirely from the index without touching the table:
-- Query: SELECT id, status FROM orders WHERE user_id = ? AND created_at > ?
CREATE INDEX idx_orders_user_covering ON orders(user_id, created_at) INCLUDE (status, id);
-- INCLUDE columns are stored but not part of the sort key
Index on expression — when the filter transforms the column:
-- This won't use an index on email:
WHERE LOWER(email) = 'foo@example.com'
-- Fix:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
Any function applied to a column in a WHERE clause — DATE(), EXTRACT(), ::text cast — disables index usage unless an expression index matches exactly.
Why an index is ignored even when it exists:
- Filter is not selective enough (planner prefers seq scan)
- Cast mismatch:
WHERE id = '123'on an integer column causes implicit cast, disabling the index LIKE '%suffix'— leading wildcard disables B-tree index; usepg_trgm+ GIN index for arbitrary substring search- Index has not been analyzed recently; planner thinks it's larger or smaller than it is
ORconditions:WHERE a = 1 OR b = 2won't use indexes onaandbseparately; useUNION ALLor a multi-column index
Phase 4 — Output
Produce whichever the user needs:
- Plan annotation — walk through their
EXPLAIN ANALYZEoutput and flag the bottleneck node, the rows estimate mismatch, and the fix - Rewritten query — subquery → join conversion, correlated subquery elimination, or CTE inlining
- Index recommendation — specific
CREATE INDEXstatement with column order rationale and whetherINCLUDEor a partial index applies - Pagination rewrite — keyset cursor implementation replacing
OFFSET, including tiebreaker handling - Session-level diagnostic commands — the exact
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)invocation to run, plusANALYZE tablenameif statistics look stale
Always include:
- Expected plan change after the fix (seq scan → index scan, nested loop → hash join)
- A note if the fix requires
work_memtuning rather than an index change - Postgres version caveat if CTE materialization behavior is relevant
- Warning if
NOT INis used with a nullable subquery column