Hash Tables (Dictionaries) in Python
A hash table, also known as a hash map, is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.
In Python, the built-in dict type is a highly optimized implementation of a hash table.
Key Concepts:
- Hash Function: A function that takes a key and computes an integer index (hash code) representing the position in the underlying array (bucket) where the corresponding value should be stored. A good hash function distributes keys uniformly across the buckets.
- Keys: Unique identifiers used to store and retrieve values. In Python dictionaries, keys must be hashable (i.e., immutable types like strings, numbers, tuples containing only hashable types).
- Buckets (Slots): The array positions where values are stored. The hash function maps keys to these bucket indices.
- Collisions: Occur when two different keys hash to the same bucket index. Efficient collision resolution strategies are crucial for hash table performance.
Collision Resolution:
When a collision occurs, the hash table must have a way to store and retrieve the multiple items that map to the same slot. Common techniques include:
- Chaining (Separate Chaining): Each bucket stores a pointer to a linked list (or another data structure) of key-value pairs that hash to that bucket. This is a common method used in many hash table implementations, including Python'''s dictionaries.
- Open Addressing: All entries are stored in the bucket array itself. When a collision occurs, the algorithm probes for the next available slot according to a predefined sequence (e.g., linear probing, quadratic probing, double hashing).
Python Dictionaries (dict)
Python'''s dict provides a powerful and easy-to-use hash table implementation. It handles hash function computation, resizing, and collision resolution internally.
Basic Dictionary Operations:
# Creating a dictionary
my_dict = {}
student = {"name": "Alice", "age": 25, "major": "Computer Science"}
print(f"Initial student dict: {student}")
# Adding or updating elements
student["grade"] = "A" # Add new key-value pair
student["age"] = 26 # Update existing value for key "age"
print(f"Updated student dict: {student}")
# Accessing elements
print(f"Student'''s name: {student["name"]}")
# print(student["id"]) # Would raise KeyError if "id" doesn'''t exist
# Safe access using .get()
major = student.get("major")
print(f"Student'''s major: {major}")
non_existent_key = student.get("id", "Not specified") # Returns default if key not found
print(f"Student'''s ID: {non_existent_key}")
# Deleting elements
removed_grade = student.pop("grade")
print(f"Removed grade: {removed_grade}")
print(f"Dict after pop: {student}")
# Using del keyword
del student["major"]
print(f"Dict after del '''major''': {student}")
# Checking for key existence
if "name" in student:
print("'''name''' is a key in the dictionary.")
if "country" not in student:
print("'''country''' is not a key.")
# Iterating through a dictionary
print("\nIterating through keys:")
for key in student:
print(f"Key: {key}, Value: {student[key]}")
print("\nIterating through key-value pairs (items):")
for key, value in student.items():
print(f"Key: {key}, Value: {value}")
print("\nIterating through values:")
for value in student.values():
print(f"Value: {value}")
# Dictionary length
print(f"\nNumber of items in dictionary: {len(student)}")
Advantages of Hash Tables:
- Fast Lookups, Insertions, and Deletions: Average time complexity is O(1) if the hash function is good and collisions are handled well.
- Flexible Keys: Can use various hashable types as keys (e.g., strings, numbers, tuples).
- Dynamic Sizing: Python dictionaries automatically resize to accommodate more elements.
Disadvantages of Hash Tables:
- Worst-case Performance: In the worst case (e.g., many collisions, or a poorly designed hash function), operations can degrade to O(n).
- Memory Overhead: To keep collisions low, hash tables often need to be larger than the number of elements stored (load factor considerations). Pointers in chaining also add overhead.
- Unordered (Historically): Traditionally, hash tables do not maintain the insertion order of elements. However, since Python 3.7 (and CPython 3.6), dictionaries remember insertion order.
- Keys must be Hashable: Mutable types like lists cannot be used as dictionary keys in Python.
Common Use Cases:
- Implementing caches (e.g., memoization).
- Database indexing.
- Counting frequencies of items.
- Associative arrays where key-value mapping is needed.
- Symbol tables in compilers.
- Representing objects or records (like JSON objects).
Hash tables, and Python'''s dictionaries, are one of the most useful and versatile data structures in programming due to their efficiency and ease of use.