def. Hash Function. A function that maps data to a hash table.
- Data is denoted (=Universe)
- Universe can be continuous or discrete
- Hash table has slots, thus
- ,
- Access time is
- is the hash function that takes in data point
Uniform Hash Function
def. Uniformity. A hash function is uniform iff:
e.g. Modular Hash Function.
- If is random,
Universal Hash Function
alg. Universal Hashing. A hash function is universal iff:
- Initialization: Choose a random from family
- Use that for all future hashing needs for that dataset ⇒ probability of collision is
e.g. Universal Modular Hash Function.
def. Linear Congruence Hashing (integer key)
- Choose a very large prime number (bigger than the number of things you need to hash=)
- Construct a hash table of size
- Construct a family of hash functions ⇒ is a universal family
def. Multiply-Shift Binary Hashing (integer key) (SotA)
Collision
def. Collision Probability.
Alternative Techniques
Double Hashing
- Construction:
- : the secondary hash table
- : set that denotes all the elements hashed to . Depends on what , the data, actually is
- For some slack, we usually make the secondary hash table
Bloom Filters
- From a family of hash functions choose different functions
- Initialize boolean array which has size
Insert(x)
- calculate and store them into .
- If there’s alreay a in the table, keep it that way
Search(x)
- calculate
- If all of returns , it’s highly likely that it is present.
-
- (false positive rate=) Probability that
search(x)
returns True, even if : ……(1)
- (false positive rate=) Probability that
- (1)…… is not really known. is limited by memory. So choose as big as possible
- is good ⇒ false positive
search(x)
is
- is good ⇒ false positive
- Probability that bit is