Notation

  • : block count of table
  • : tuple count of table
    • because many tuples fit in a block
  • : number of available blocks in memory
  • Complexity is always the number of I/Os

Sorting Based Algorithms

alg. External Merge Sort

  1. We are trying to sort relation on disk with blocks of main memory
  2. Pass 0: we read sequential blocks at a time, sort (using any Sorting Algorithms) and read onto disk into different runs. The number of runs is .
  3. Pass 1: We merge teach runs in memory. We need one block to write out the result, and blocks to merge simultaneously.
    1. Thus
  4. Repeat passes until there is only one output run. This is the sorted run.
  • Complexity
    • Number of passes:
    • I/O Complexity:
    • Memory requirement: (as much as possible)

alg. Sort-Merge Join Idea: while sorting the table, simultaneously join them in the process:

  1. Run external merge sort passes until the no. of total runs to merge is less than or equal to
  2. Merge each table in-memory, and compare them to the condition. Join if necessary.
  • Complexity for 2 passes:
    • IO Complexity in 2 passes:
    • Memory requirement:
  • Complexity for passes: same as external merge sort
  • ! May degrade complexity if, e.g. the whole table needs joining.

Hash Based Algorithms

alg. Hash Join

  • Idea: use a hash function that partitions the table into block size partitions, then merge:
  1. Probe: Partition both tables into block size different partitions using the same hash function
  2. Load one partition of into memory. Stream one block of at a time, and join.
  • Complexity
    • I/O:
    • Memory requirement:

Index Based Algorithms

Equality and Range: use B+ Trees

Index Nested Loop Join

Zigzag Join