SQL Query Optimization

Problem: How to actually process this:

SQL Query rewriting
- Make everything into joins!
De-correlation: Correlated subqueries are de-correlated using “Magic” de-correlation
Iterated (=pipelined) algorithm processes
- Process one tuple, up the chain, one at a time
- Will start producing results faster, but may not be fast in total
Bottom-up Evaluation
- Process the bottommost query, then up one level, etc.
- Use temporary files to store intermediate results

Heuristics-based Optimization

Idea: estimate size of intermediate results to calculate total operation count
- Cardinality estimation
Given knowledge: $∣ π_{A} R ∣, ∣ R ∣$
Principles:
- Preservation of Value

Selection

∣ σ_{A = v a l} ∣ ∣ σ_{A = v a l} ∣ \approx ∣ R ∣ \cdot \frac{1}{∣ π _{A} R ∣} = \frac{Size of R}{Distict A values in R} \approx ∣ R ∣ \cdot (1 - \frac{1}{∣ π _{A} R ∣})

Conjunction, Disjunction (AND, OR operations)

∣ σ_{A = u \land B = v} ∣ ∣ σ_{A = u \lor B = v} ∣ \approx ∣ R ∣ \cdot \frac{1}{∣ π _{A} R ∣ \cdot ∣ π _{B} R ∣} \approx ∣ R ∣ \cdot (\frac{1}{∣ π _{A} R ∣} + \frac{1}{∣ π _{A} R ∣} - \frac{1}{∣ π _{A} R ∣ \cdot ∣ π _{B} R ∣}) Conjunction Disjunction

Range

Without $ma x, min$ values: just say $\frac{1}{3}$
With $ma x =$ hi(R.A) and $min =$ lo(R.B)
- & sometimes, highest and lowest is “invalid” → use second highest & lowest

∣ σ_{A > v} ∣ \approx ∣ R ∣ \cdot \frac{ma x - v}{ma x - min}

Natural Join

∣ R ⋈ S ∣ \approx ∣ R \times S ∣ \cdot \frac{1}{ma x ( ∣ π _{A} R ∣ , ∣ π _{A} S ∣ )}

Multi-way Join

∣ R (A, B) ⋈ S (B, C) ⋈ T (C, D) ∣ \approx ∣ R \times S \times T ∣ \cdot selectivity of first join \frac{1}{∣ π _{B} R ∣ , ∣ π _{B} S ∣} \cdot selectivity of second join \frac{1}{∣ π _{C} S ∣ , ∣ π _{C} T ∣}

Projection over Join

Due to assumption of preservation of value sets…
…when $R (A, B), S (B, C)$ , $A$ does not appear in $S$ . Therefore we estimate:

∣ π_{A} (R ⋈ S) ∣ \approx ∣ π_{A} S ∣

Nowadays people use histograms and ML for better estimation

Q. given $n$ relations to join, how to join?

Brute Force: $\frac{( 2 n - 2 )!}{( n - 1 )!}$
Left-Deep Plan: $n!$
Greedy: $n^{2}$
Dynamic Programming:
- Need to consider: interesting orders (=sorted? deduped? etc.) need to be considered too!