Computed Truth
The SQL Query Cost Estimator proves that **Index Seek (O(log N))** operations are exponentially faster than **Full Table Scans (O(N))** as data grows. For a table with 1M rows, an Index Seek is ~20,000x cheaper than a Full Scan. However, at >10-15% selectivity, the Database Optimizer often forces a partial scan due to Random I/O penalties.
SQL Query Cost Estimator
Estimate Query Cost
The Technical Proof
Database cost models rely on Page Reads as the primary currency. The formula differs by access method:
1. Full Table Scan (Heap Scan) - \( O(N) \)
$$ Cost_{scan} = \frac{N_{rows} \times Row_{width}}{Page_{size}} $$
Every page is read sequentially. High throughput, but reads unnecessary data.
2. Index Seek (B-Tree Loop) - \( O(\log N) \)
$$ Cost_{seek} = Height_{tree} + \lceil Selectivity \times N_{rows} \rceil $$
Where \( Height_{tree} \approx \log_{branching\_factor}(Pages) \). Usually 3-4 reads deep. Efficient for fetching specific rows.
Note: This model assumes a Cold Buffer Cache (worst-case I/O). In a Warm Cache scenario, logical reads are CPU-bound, not I/O-bound.
Step-by-Step Logic
- Input Analysis: User provides Table Size (\( N \)) and Selectivity (\( S \)).
- Determine Access Path:
- If "Scan" is selected, the engine must traverse all Leaf Pages.
- If "Index Seek" is selected, the engine traverses the B-Tree Root \(\to\) Intermediate \(\to\) Leaf.
- Calculate Page Cost:
- Assume average row width of 200 bytes.
- Rows per page = \( Page_{size} / 200 \).
- Total Pages = \( N / Rows_{page} \).
- Apply Optimizer Heuristics: If Selectivity \( > 15\% \), warn that Random I/O cost exceeds Sequential Scan cost, likely causing an Index Scan flip.
- Convert to Time: Map Page Reads to hardware latency (HDD vs SSD standards).