Notation
- : block count of table
- : tuple count of table
- because many tuples fit in a block
- : number of available blocks in memory
- Complexity is always the number of I/Os
Sorting Based Algorithms
alg. External Merge Sort
- Idea: Merge Sort, but with limited memory
- We are trying to sort relation on disk with blocks of main memory
- Pass 0: we read sequential blocks at a time, sort (using any Sorting Algorithms) and read onto disk into different runs. The number of runs is .
- Pass 1: We merge teach runs in memory. We need one block to write out the result, and blocks to merge simultaneously.
- Thus
- Repeat passes until there is only one output run. This is the sorted run.
- Complexity
- Number of passes:
- I/O Complexity:
- Memory requirement: (as much as possible)
alg. Sort-Merge Join Idea: while sorting the table, simultaneously join them in the process:
- Run external merge sort passes until the no. of total runs to merge is less than or equal to
- Merge each table in-memory, and compare them to the condition. Join if necessary.
- Complexity for 2 passes:
- IO Complexity in 2 passes:
- Memory requirement:
- Complexity for passes: same as external merge sort
- ! May degrade complexity if, e.g. the whole table needs joining.
Hash Based Algorithms
alg. Hash Join
- Idea: use a hash function that partitions the table into block size partitions, then merge:
- Probe: Partition both tables into block size different partitions using the same hash function
- Load one partition of into memory. Stream one block of at a time, and join.
- Complexity
- I/O:
- Memory requirement:
Index Based Algorithms
Equality and Range: use B+ Trees
Index Nested Loop Join
Zigzag Join