Mastering Database Indexing: From B-Trees to Specialized Indexes
Everything you need to know about database indexes - when to use them, how they work, and optimization strategies.
Database indexing is the single most important performance optimization technique available to developers. A well-indexed database can handle millions of queries per second, while a poorly indexed one crawls to a halt with just thousands.
Yet, indexing is often misunderstood. Many developers add indexes randomly hoping for performance gains, while others avoid them entirely fearing write performance penalties. This guide will demystify indexing and give you the knowledge to make smart indexing decisions.
Proper indexing can improve query performance by 100x-1000x. Poor indexing can make even simple queries take seconds instead of milliseconds.
How Indexes Work: The Fundamentals
Think of a database index like an index in a book. Instead of reading every page to find a topic, you look it up in the index and jump directly to the relevant pages.
B-Tree: The Most Common Index Type
B-Tree (Balanced Tree) indexes are the default in most databases. They maintain sorted data and allow for efficient equality, range, and prefix matching.
B-Tree Structure:
Root Node
├── Branch Node (100-200)
│ ├── Leaf Node (1-50) → Row pointers
│ ├── Leaf Node (51-100) → Row pointers
│ └── Leaf Node (101-150) → Row pointers
└── Branch Node (201-300)
├── Leaf Node (151-200) → Row pointers
└── Leaf Node (201-250) → Row pointersOperations: O(log n) for search, insert, delete. Excellent for range queries and ordered results.
Index Types: Choosing the Right Tool
1. Single-Column Indexes
The most basic type. Index one column for fast lookups.
CREATE INDEX idx_users_email ON users(email);
Best for: Primary key lookups, foreign key joins, unique constraints
2. Composite (Multi-Column) Indexes
Index multiple columns together. Order matters!
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
Best for: WHERE clauses with multiple conditions, ORDER BY optimization
3. Partial Indexes
Index only rows that match a condition. Smaller, faster indexes.
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
Best for: Frequently queried subsets (active users, recent orders)
4. Unique Indexes
Enforce uniqueness and provide fast lookups.
CREATE UNIQUE INDEX idx_unique_email ON users(email);
Best for: Natural keys, business constraints
Specialized Indexes for Specific Use Cases
Full-Text Search Indexes
For searching within text content.
-- PostgreSQL
CREATE INDEX idx_posts_content ON posts USING gin(to_tsvector('english', content));
-- MySQL
ALTER TABLE posts ADD FULLTEXT(content);Use for: Blog posts, comments, product descriptions
Spatial (GIS) Indexes
For geographic data and location-based queries.
-- PostgreSQL CREATE INDEX idx_places_location ON places USING gist(location); -- MySQL ALTER TABLE places ADD SPATIAL INDEX(location);
Use for: "Find restaurants within 5 miles", mapping applications
Hash Indexes
Fast equality lookups, but no range queries.
CREATE INDEX idx_users_email_hash ON users USING hash(email);
Use for: Simple equality lookups where range queries aren't needed
The Indexing Decision Framework
When to Create an Index
If a column appears in WHERE, JOIN, or ORDER BY clauses regularly
Columns where queries return < 5-10% of total rows
When query execution time exceeds acceptable thresholds
Common Indexing Mistakes to Avoid
❌ Indexing Every Column
Indexes speed up reads but slow down writes. Every index has a maintenance cost.
❌ Wrong Column Order in Composite Indexes
Index order should match query patterns. (user_id, created_at) ≠ (created_at, user_id)
❌ Ignoring Index Maintenance
Indexes fragment over time. Rebuild them periodically for optimal performance.
❌ Not Monitoring Index Usage
Unused indexes waste space and slow down writes. Remove them!
Index Maintenance and Monitoring
Monitoring Queries
pg_stat_user_indexesperformance_schemasys.dm_db_index_usage_statsMaintenance Tasks
Advanced Indexing Strategies
Covering Indexes
Indexes that include all columns needed for a query, eliminating the need to access the table.
-- Query: SELECT name, email FROM users WHERE status = 'active' CREATE INDEX idx_users_covering ON users(status, name, email); -- Index contains all needed data!
Index-Only Scans
When the query can be satisfied entirely from the index without touching the table.
EXPLAIN SELECT count(*) FROM users WHERE created_at > '2025-01-01'; -- "Index Only Scan" - no table access needed!
Partial Indexes with WHERE
Index only the rows you actually query, reducing index size and maintenance.
-- Only index active users (assume 10% of users are active) CREATE INDEX idx_active_users ON users(email) WHERE active = true; -- Smaller index, faster queries, less maintenance
The Cost-Benefit Analysis
Index Costs vs Benefits
Costs
- • Storage space (indexes can be 10-50% of table size)
- • Write performance penalty (INSERT/UPDATE/DELETE slower)
- • Maintenance overhead (rebuilds, statistics updates)
- • Query optimizer complexity
Benefits
- • Query performance improvements (10x-1000x faster)
- • Reduced CPU and I/O usage
- • Better user experience
- • Lower infrastructure costs
Real-World Examples
E-commerce Product Search
-- Query: Products by category, sorted by price SELECT * FROM products WHERE category_id = ? AND active = true ORDER BY price ASC LIMIT 20; -- Optimal index: CREATE INDEX idx_products_category_price ON products(category_id, price) WHERE active = true;
Composite index handles WHERE + ORDER BY. Partial index reduces size.
User Timeline
-- Query: Recent posts from followed users SELECT * FROM posts WHERE user_id IN (followed_user_ids) ORDER BY created_at DESC LIMIT 50; -- Index per user for fast timeline generation: CREATE INDEX idx_posts_user_date ON posts(user_id, created_at DESC);
Composite index with descending order matches query pattern perfectly.
Tools and Best Practices
Use EXPLAIN plans to understand if indexes are being used effectively.
Track index usage and remove unused indexes regularly.
Load test with production-like data volumes before deploying indexes.
Document why each index exists and what queries it optimizes.
🎯 The Bottom Line
Indexing is both an art and a science. Start with your most important queries, measure the impact, and iterate. A few well-placed indexes can transform your application's performance.
Remember: indexes are not free. Each one has a cost. But the right indexes at the right time can be the difference between success and failure.