Approximate Nearest Neighbor (ANN) algorithms are the secret sauce that makes vector search practical at scale. This lesson explores the major indexing algorithms: HNSW, IVF, LSH, and Product Quantization.
Exact search: Check every vector
┌─────────────────────────────────────────────┐
│ Query → Compare with ALL 10M vectors → Sort │
│ Time: O(n × d) = O(10M × 768) ≈ 1 second │
└─────────────────────────────────────────────┘
ANN: Smart data structures
┌─────────────────────────────────────────────┐
│ Query → Navigate index → Check ~1000 vectors│
│ Time: O(log n × d) ≈ 1-10 ms │
└─────────────────────────────────────────────┘
100-1000x speedup with 95-99% accuracy!HNSW is the most popular ANN algorithm, used by Pinecone, Weaviate, Qdrant, and others. It combines ideas from skip lists and navigable small-world graphs.
In a small-world graph, you can reach any node from any other node in just a few hops (like "six degrees of separation").
Multi-layer graph structure:
Layer 2 (sparse): A ←──────→ F
\ /
\ /
Layer 1 (medium): A ← B ← C ← D ← E ← F
\ | | | /
\ | | | /
Layer 0 (dense): A-B-C-D-E-F-G-H-I-J-K-L
- Higher layers: Long-distance connections (express lanes)
- Lower layers: Short-distance connections (local roads)
- Layer 0: All nodes with local connectionsdef hnsw_search(query, graph, k, ef):
"""
HNSW search algorithm.
Args:
query: Query vector
graph: HNSW graph structure
k: Number of results
ef: Search breadth (exploration factor)
"""
# Start at top layer with entry point
current_node = graph.entry_point
# Navigate through layers top-to-bottom
for layer in range(graph.max_layer, 0, -1):
# Greedy search: move to closest neighbor
while True:
neighbors = graph.get_neighbors(current_node, layer)
closest = min(neighbors, key=lambda n: distance(query, n))
if distance(query, closest) >= distance(query, current_node):
break # Local minimum found
current_node = closest
# At layer 0: do beam search with ef candidates
candidates = [current_node]
visited = {current_node}
results = []
while candidates:
node = candidates.pop()
for neighbor in graph.get_neighbors(node, 0):
if neighbor not in visited:
visited.add(neighbor)
candidates.append(neighbor)
results.append((distance(query, neighbor), neighbor))
if len(visited) >= ef:
break
# Return top k
return sorted(results)[:k]| Parameter | Effect | Typical Values |
|---|---|---|
| M | Max connections per node. Higher = better recall, more memory | 16-64 |
| ef_construction | Search breadth during index building. Higher = better quality, slower build | 100-400 |
| ef_search | Search breadth at query time. Higher = better recall, slower query | 50-200 |
# HNSW in FAISS
import faiss
# Build index
dimension = 768
M = 32 # Connections per node
index = faiss.IndexHNSWFlat(dimension, M)
# Set construction parameter
index.hnsw.efConstruction = 200
# Add vectors
index.add(embeddings)
# Set search parameter (can change at query time)
index.hnsw.efSearch = 100
# Search
D, I = index.search(query, k=10)IVF clusters vectors and only searches within relevant clusters.
1. Training Phase: Cluster vectors using k-means
┌─────────────────────────────────────────────┐
│ 10M vectors → k-means with 1000 centroids │
│ Each vector assigned to nearest centroid │
└─────────────────────────────────────────────┘
2. Indexing: Store vectors in cluster buckets
Cluster 1: [v23, v145, v892, ...]
Cluster 2: [v7, v89, v234, ...]
...
Cluster 1000: [v12, v567, ...]
3. Search: Find closest centroids, search only those clusters
Query → Find 10 closest centroids → Search ~10,000 vectors
(Instead of all 10M vectors)# IVF in FAISS
import faiss
dimension = 768
nlist = 1000 # Number of clusters
# Create IVF index
quantizer = faiss.IndexFlatL2(dimension) # Used to find closest clusters
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
# Must train before adding vectors
index.train(embeddings) # Learn cluster centroids
index.add(embeddings)
# Set search parameter
index.nprobe = 50 # Search 50 clusters
D, I = index.search(query, k=10)PQ compresses vectors to reduce memory while maintaining search quality.
Original vector: 768 dimensions × 4 bytes = 3072 bytes
PQ Compression:
1. Split vector into M subspaces (e.g., M=8)
[d1...d96] [d97...d192] ... [d673...d768]
2. Learn codebook for each subspace (256 codes each)
Subspace 1: 256 representative vectors
Subspace 2: 256 representative vectors
...
3. Replace each subspace with code ID (1 byte)
[code_1] [code_2] ... [code_8] = 8 bytes
Compression ratio: 3072 / 8 = 384x!# IVF with PQ in FAISS
import faiss
dimension = 768
nlist = 1000
m = 8 # Number of subquantizers
nbits = 8 # Bits per code (256 codes)
index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits)
index.train(embeddings)
index.add(embeddings)
# Much smaller memory footprint!
# 10M vectors: 768 × 4 × 10M = 30GB → 8 × 10M = 80MBLSH uses hash functions that map similar vectors to the same buckets.
Key property: Similar vectors → Same hash bucket (with high probability)
1. Create random hyperplanes
2. Hash vector based on which side of each hyperplane it falls
3. Similar vectors likely on same side → same bucket
Example:
v1 = [0.5, 0.8, 0.3] → hash = "10110" → bucket 22
v2 = [0.6, 0.75, 0.35] → hash = "10110" → bucket 22 (similar!)
v3 = [-0.5, -0.8, 0.9] → hash = "01001" → bucket 9 (different)LSH is less accurate than HNSW/IVF but very fast for certain use cases.
| Algorithm | Speed | Accuracy | Memory | Best For |
|---|---|---|---|---|
| HNSW | Very Fast | Excellent | High | General purpose, production |
| IVF | Fast | Good | Medium | Large datasets, memory-constrained |
| IVF-PQ | Fast | Moderate | Very Low | Billion+ scale, limited memory |
| LSH | Very Fast | Moderate | Low | Duplicate detection, streaming |
# Guidelines for HNSW tuning
def recommend_hnsw_params(
num_vectors: int,
target_recall: float = 0.95,
query_latency_budget_ms: float = 10
) -> dict:
"""Recommend HNSW parameters based on requirements."""
if num_vectors < 100_000:
return {"M": 16, "ef_construction": 100, "ef_search": 50}
elif num_vectors < 1_000_000:
if target_recall > 0.98:
return {"M": 32, "ef_construction": 200, "ef_search": 100}
else:
return {"M": 16, "ef_construction": 100, "ef_search": 50}
else: # 1M+ vectors
if query_latency_budget_ms > 50:
# Quality priority
return {"M": 48, "ef_construction": 400, "ef_search": 200}
else:
# Speed priority
return {"M": 24, "ef_construction": 200, "ef_search": 100}
# Rule of thumb:
# ef_search should be >= k (number of results you want)
# Higher M = more memory but better recall
# ef_construction only matters at build timedef evaluate_ann_recall(
index_approx,
index_exact,
test_queries: np.ndarray,
k: int = 10
) -> float:
"""
Calculate recall: what fraction of true top-k are found?
"""
total_recall = 0
for query in test_queries:
# Ground truth from exact search
_, true_neighbors = index_exact.search(query.reshape(1, -1), k)
true_set = set(true_neighbors[0])
# Approximate search results
_, approx_neighbors = index_approx.search(query.reshape(1, -1), k)
approx_set = set(approx_neighbors[0])
# Recall for this query
recall = len(true_set & approx_set) / k
total_recall += recall
return total_recall / len(test_queries)
# Typical results:
# HNSW with good params: 95-99% recall
# IVF with nprobe=50: 90-95% recall
# IVF-PQ: 85-95% recalldef estimate_memory_gb(
num_vectors: int,
dimensions: int,
index_type: str = "hnsw",
M: int = 16
) -> float:
"""Estimate memory usage for different index types."""
# Base vector storage (float32)
vector_bytes = num_vectors * dimensions * 4
if index_type == "flat":
# Just vectors
return vector_bytes / (1024**3)
elif index_type == "hnsw":
# Vectors + graph links
# Each node has M connections on layer 0, M/2 on higher layers
# Roughly 2*M links per node, 8 bytes per link
graph_bytes = num_vectors * 2 * M * 8
return (vector_bytes + graph_bytes) / (1024**3)
elif index_type == "ivf_pq":
# Compressed vectors (e.g., 64 bytes per vector with m=64)
m = 64 # subquantizers
compressed_bytes = num_vectors * m
# Plus codebook
codebook_bytes = 256 * (dimensions // m) * 4 * m
return (compressed_bytes + codebook_bytes) / (1024**3)
# Example: 10M vectors, 768 dimensions
print(estimate_memory_gb(10_000_000, 768, "flat")) # ~28.6 GB
print(estimate_memory_gb(10_000_000, 768, "hnsw", 32)) # ~31.0 GB
print(estimate_memory_gb(10_000_000, 768, "ivf_pq")) # ~0.6 GBIn the next lesson, we'll explore distance metrics in more detail and when to use each.