Hash Tables
Hash tables are data structures that store key-value pairs and provide efficient lookup, insertion, and deletion operations. This module covers the fundamentals of hash tables, their implementations, collision resolution strategies, and common applications.
Module Contents
Learning Resources
Introduction to Hash Tables
A hash table (also known as a hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
Key Characteristics
- Fast operations: Average case O(1) for search, insert, and delete operations
- Key-value storage: Data is stored as key-value pairs
- Unordered: Elements are not stored in a specific order
- Dynamic size: Can grow or shrink as needed
- Collision handling: Must handle cases where different keys hash to the same index
Time and Space Complexity
Operation | Average Case | Worst Case |
---|---|---|
Search | O(1) | O(n) |
Insert | O(1) | O(n) |
Delete | O(1) | O(n) |
Space | O(n) | O(n) |
Note: The worst-case time complexity occurs when there are many collisions, causing the hash table to degenerate into a linked list.
Hash Functions
A hash function is a function that maps data of arbitrary size to fixed-size values. In the context of hash tables, it converts keys into array indices.
Properties of a Good Hash Function
- Deterministic: The same key should always produce the same hash value
- Efficient: It should be fast to compute
- Uniform distribution: It should distribute keys uniformly across the hash table to minimize collisions
- Low collision rate: Different keys should hash to different indices as much as possible
Common Hash Functions
1. Division Method
The simplest hash function is the division method, which takes the remainder of the key divided by the table size.
// Division method hash function
int hash(int key, int tableSize) {
return key % tableSize;
}
2. Multiplication Method
The multiplication method multiplies the key by a constant, takes the fractional part, and scales it to the table size.
// Multiplication method hash function
int hash(int key, int tableSize) {
double A = 0.6180339887; // (sqrt(5) - 1) / 2
double val = key * A;
val = val - Math.floor(val); // Extract fractional part
return (int)(tableSize * val);
}
3. Universal Hashing
Universal hashing uses a randomly selected hash function from a family of hash functions to minimize the probability of collisions.
4. String Hashing
For string keys, a common approach is to use polynomial hashing.
// String hash function
int hash(String key, int tableSize) {
int hash = 0;
for (int i = 0; i < key.length(); i++) {
hash = (hash * 31 + key.charAt(i)) % tableSize;
}
return hash;
}
Collision Resolution
A collision occurs when two different keys hash to the same index. There are several strategies to handle collisions in hash tables.
1. Separate Chaining
In separate chaining, each bucket in the hash table contains a linked list (or another data structure) of all key-value pairs that hash to the same index.
Advantages:
- Simple to implement
- Hash table never fills up
- Less sensitive to the hash function or load factors
Disadvantages:
- Requires additional memory for linked list pointers
- Cache performance can be poor due to linked list traversal
- If many keys hash to the same index, performance degrades to O(n)
// Separate chaining implementation
class HashTable<K, V> {
private static class Node<K, V> {
K key;
V value;
Node<K, V> next;
Node(K key, V value) {
this.key = key;
this.value = value;
}
}
private Node<K, V>[] buckets;
private int size;
private int capacity;
@SuppressWarnings("unchecked")
public HashTable(int capacity) {
this.capacity = capacity;
this.buckets = (Node<K, V>[]) new Node[capacity];
this.size = 0;
}
private int hash(K key) {
return Math.abs(key.hashCode()) % capacity;
}
public void put(K key, V value) {
int index = hash(key);
Node<K, V> head = buckets[index];
// Check if key already exists
Node<K, V> current = head;
while (current != null) {
if (current.key.equals(key)) {
current.value = value; // Update value
return;
}
current = current.next;
}
// Key doesn't exist, add new node at the beginning of the list
Node<K, V> newNode = new Node<>(key, value);
newNode.next = head;
buckets[index] = newNode;
size++;
}
public V get(K key) {
int index = hash(key);
Node<K, V> current = buckets[index];
while (current != null) {
if (current.key.equals(key)) {
return current.value;
}
current = current.next;
}
return null; // Key not found
}
public void remove(K key) {
int index = hash(key);
Node<K, V> head = buckets[index];
// If bucket is empty
if (head == null) {
return;
}
// If key is in the first node
if (head.key.equals(key)) {
buckets[index] = head.next;
size--;
return;
}
// Search for the key in the rest of the list
Node<K, V> prev = head;
Node<K, V> current = head.next;
while (current != null) {
if (current.key.equals(key)) {
prev.next = current.next;
size--;
return;
}
prev = current;
current = current.next;
}
}
}
2. Open Addressing
In open addressing, all key-value pairs are stored in the hash table itself. When a collision occurs, the algorithm searches for the next available slot according to a probing sequence.
a. Linear Probing
In linear probing, if a collision occurs at index i, the algorithm checks index i+1, then i+2, and so on, until an empty slot is found.
// Linear probing hash function
int linearProbe(int key, int i, int tableSize) {
return (hash(key) + i) % tableSize;
}
b. Quadratic Probing
In quadratic probing, if a collision occurs at index i, the algorithm checks index i+1², then i+2², and so on.
// Quadratic probing hash function
int quadraticProbe(int key, int i, int tableSize) {
return (hash(key) + i*i) % tableSize;
}
c. Double Hashing
Double hashing uses a second hash function to determine the step size for probing.
// Double hashing
int doubleHash(int key, int i, int tableSize) {
int hash1 = hash1(key);
int hash2 = hash2(key);
return (hash1 + i * hash2) % tableSize;
}
Advantages:
- Better cache performance
- No extra memory for pointers
Disadvantages:
- More sensitive to the hash function and load factor
- Can suffer from clustering (especially with linear probing)
- Requires a good probing strategy
- Table can fill up, requiring resizing
Implementation
Hash Table with Open Addressing in Java
// Hash Table implementation with linear probing
class HashTable<K, V> {
private static class Entry<K, V> {
K key;
V value;
boolean isDeleted;
Entry(K key, V value) {
this.key = key;
this.value = value;
this.isDeleted = false;
}
}
private Entry<K, V>[] table;
private int size;
private int capacity;
private final double loadFactor = 0.75;
@SuppressWarnings("unchecked")
public HashTable(int capacity) {
this.capacity = capacity;
this.table = (Entry<K, V>[]) new Entry[capacity];
this.size = 0;
}
private int hash(K key) {
return Math.abs(key.hashCode()) % capacity;
}
public void put(K key, V value) {
if (size >= capacity * loadFactor) {
resize();
}
int index = hash(key);
int i = 0;
while (i < capacity) {
int j = (index + i) % capacity; // Linear probing
if (table[j] == null || table[j].isDeleted) {
table[j] = new Entry<>(key, value);
size++;
return;
} else if (table[j].key.equals(key)) {
table[j].value = value; // Update value
return;
}
i++;
}
}
public V get(K key) {
int index = hash(key);
int i = 0;
while (i < capacity) {
int j = (index + i) % capacity;
if (table[j] == null) {
return null; // Key not found
} else if (!table[j].isDeleted && table[j].key.equals(key)) {
return table[j].value;
}
i++;
}
return null; // Key not found
}
public void remove(K key) {
int index = hash(key);
int i = 0;
while (i < capacity) {
int j = (index + i) % capacity;
if (table[j] == null) {
return; // Key not found
} else if (!table[j].isDeleted && table[j].key.equals(key)) {
table[j].isDeleted = true;
size--;
return;
}
i++;
}
}
@SuppressWarnings("unchecked")
private void resize() {
int oldCapacity = capacity;
capacity = capacity * 2;
Entry<K, V>[] oldTable = table;
table = (Entry<K, V>[]) new Entry[capacity];
size = 0;
for (int i = 0; i < oldCapacity; i++) {
if (oldTable[i] != null && !oldTable[i].isDeleted) {
put(oldTable[i].key, oldTable[i].value);
}
}
}
}
Hash Table in Python
# Hash Table implementation with separate chaining
class HashTable:
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.next = None
def __init__(self, capacity=10):
self.capacity = capacity
self.size = 0
self.buckets = [None] * capacity
def _hash(self, key):
return hash(key) % self.capacity
def put(self, key, value):
index = self._hash(key)
# If bucket is empty
if self.buckets[index] is None:
self.buckets[index] = self.Node(key, value)
self.size += 1
return
# If key already exists, update value
current = self.buckets[index]
while current:
if current.key == key:
current.value = value
return
if current.next is None:
break
current = current.next
# Key doesn't exist, add to the end of the list
current.next = self.Node(key, value)
self.size += 1
def get(self, key):
index = self._hash(key)
current = self.buckets[index]
while current:
if current.key == key:
return current.value
current = current.next
return None # Key not found
def remove(self, key):
index = self._hash(key)
current = self.buckets[index]
prev = None
while current:
if current.key == key:
# If it's the first node in the bucket
if prev is None:
self.buckets[index] = current.next
else:
prev.next = current.next
self.size -= 1
return
prev = current
current = current.next
Applications
Hash tables are widely used in various applications due to their efficient lookup, insertion, and deletion operations.
1. Database Indexing
Hash tables are used to create indexes in databases for fast data retrieval. They allow for O(1) lookup time for exact match queries.
2. Caching
Hash tables are used in caching systems to store and retrieve data quickly. Examples include browser caches, DNS caches, and web server caches.
3. Symbol Tables in Compilers
Compilers use hash tables to store information about variables, functions, and other identifiers during the compilation process.
4. Associative Arrays
Hash tables are used to implement associative arrays or dictionaries in programming languages, such as Python's dict, Java's HashMap, and JavaScript's Object.
5. Counting Frequencies
Hash tables are used to count the frequency of elements in a collection, such as counting the occurrences of words in a document.
// Count word frequencies in a document
Map<String, Integer> wordFrequency = new HashMap<>();
for (String word : document.split("\s+")) {
word = word.toLowerCase().replaceAll("[^a-zA-Z]", "");
if (!word.isEmpty()) {
wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 1);
}
}
6. De-duplication
Hash tables are used to remove duplicates from a collection by storing unique elements as keys.
// Remove duplicates from an array
public int[] removeDuplicates(int[] nums) {
Set<Integer> set = new HashSet<>();
for (int num : nums) {
set.add(num);
}
int[] result = new int[set.size()];
int i = 0;
for (int num : set) {
result[i++] = num;
}
return result;
}
Practice Problems
Apply your knowledge of hash tables by solving these carefully selected problems:
1. Two Sum
Given an array of integers and a target sum, return indices of the two numbers that add up to the target.
2. Valid Anagram
Given two strings, determine if one is an anagram of the other.
3. Group Anagrams
Given an array of strings, group anagrams together.
4. LRU Cache
Design and implement a data structure for Least Recently Used (LRU) cache.