def. Hash Function. A function that maps data to a hash table.

  • Data is denoted (=Universe)
    • Universe can be continuous or discrete
  • Hash table has slots, thus
    • ,
    • Access time is
  • is the hash function that takes in data point

Uniform Hash Function

def. Uniformity. A hash function is uniform iff:

e.g. Modular Hash Function.

  • If is random,

Universal Hash Function

alg. Universal Hashing. A hash function is universal iff:

  1. Initialization: Choose a random from family
  2. Use that for all future hashing needs for that dataset ⇒ probability of collision is

e.g. Universal Modular Hash Function.

def. Linear Congruence Hashing (integer key)

  1. Choose a very large prime number (bigger than the number of things you need to hash=)
  2. Construct a hash table of size
  3. Construct a family of hash functions is a universal family

def. Multiply-Shift Binary Hashing (integer key) (SotA)

Collision

def. Collision Probability.

Alternative Techniques

Double Hashing

  • Construction:
  • : the secondary hash table
  • : set that denotes all the elements hashed to . Depends on what , the data, actually is
    • For some slack, we usually make the secondary hash table

Bloom Filters

  1. From a family of hash functions choose different functions
  2. Initialize boolean array which has size
  • Insert(x)
    1. calculate and store them into .
    2. If there’s alreay a in the table, keep it that way
  • Search(x)
    1. calculate
    2. If all of returns , it’s highly likely that it is present.
      • (false positive rate=) Probability that search(x) returns True, even if : ……(1)
  • (1)…… is not really known. is limited by memory. So choose as big as possible
    • is good ⇒ false positive search(x) is
  • Probability that bit is