Vector search finds the most similar vectors to a query vector. This lesson covers the fundamentals of similarity search, different distance metrics, and when to use each.
Given a query vector q and a set of vectors V, find the k vectors in V that are most similar to q.
Query: q = [0.2, 0.5, -0.1, 0.8]
Database vectors:
v1 = [0.3, 0.4, -0.2, 0.7] → distance = 0.15 ✓ (close)
v2 = [0.9, -0.3, 0.5, 0.1] → distance = 1.24
v3 = [0.25, 0.48, -0.12, 0.78] → distance = 0.05 ✓ (closest)
...
Return top-k closest vectorsStraight-line distance between two points in space.
import numpy as np
def euclidean_distance(a: np.ndarray, b: np.ndarray) -> float:
"""
L2 distance: sqrt(Σ(a_i - b_i)²)
- Range: [0, ∞)
- Lower is more similar
"""
return np.sqrt(np.sum((a - b) ** 2))
# Alternative using numpy
distance = np.linalg.norm(a - b)
# Example
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(euclidean_distance(a, b)) # 5.196When to use: When absolute magnitude matters. Good for un-normalized embeddings.
Measures the angle between two vectors, ignoring magnitude.
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
"""
cos(θ) = (a · b) / (||a|| × ||b||)
- Range: [-1, 1]
- 1 = identical direction
- 0 = orthogonal (unrelated)
- -1 = opposite direction
"""
dot_product = np.dot(a, b)
norm_a = np.linalg.norm(a)
norm_b = np.linalg.norm(b)
return dot_product / (norm_a * norm_b)
def cosine_distance(a: np.ndarray, b: np.ndarray) -> float:
"""Cosine distance = 1 - cosine_similarity"""
return 1 - cosine_similarity(a, b)
# Example
a = np.array([1, 2, 3])
b = np.array([2, 4, 6]) # Same direction, different magnitude
print(cosine_similarity(a, b)) # 1.0 (identical direction)
c = np.array([1, 0, 0])
d = np.array([0, 1, 0]) # Orthogonal
print(cosine_similarity(c, d)) # 0.0When to use: Most common for text embeddings. Recommended default for RAG.
Simple multiplication of corresponding elements, summed.
def dot_product(a: np.ndarray, b: np.ndarray) -> float:
"""
a · b = Σ(a_i × b_i)
- Range: (-∞, ∞)
- Higher is more similar
- Affected by magnitude
"""
return np.dot(a, b)
# Key insight: For NORMALIZED vectors:
# cosine_similarity(a, b) = dot_product(a, b)
a_normalized = a / np.linalg.norm(a)
b_normalized = b / np.linalg.norm(b)
print(cosine_similarity(a, b)) # Same as...
print(dot_product(a_normalized, b_normalized)) # ...this!When to use: When vectors are pre-normalized. Faster than cosine (no normalization needed at query time).
def manhattan_distance(a: np.ndarray, b: np.ndarray) -> float:
"""
L1 distance: Σ|a_i - b_i|
- Range: [0, ∞)
- Also called "taxicab" or "city block" distance
"""
return np.sum(np.abs(a - b))When to use: High-dimensional sparse data. Less common for embeddings.
| Metric | Range | Magnitude Sensitive | Best For |
|---|---|---|---|
| Cosine | [-1, 1] | No | Text embeddings (default) |
| Dot Product | (-∞, ∞) | Yes | Normalized vectors, speed |
| Euclidean (L2) | [0, ∞) | Yes | Clustering, image features |
| Manhattan (L1) | [0, ∞) | Yes | Sparse, high-dimensional |
def normalize_vector(v: np.ndarray) -> np.ndarray:
"""L2 normalize a vector to unit length."""
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
# Pre-normalize all vectors at indexing time
normalized_embeddings = np.array([normalize_vector(e) for e in embeddings])
# At query time, only need dot product (faster!)
query = normalize_vector(query_embedding)
similarities = np.dot(normalized_embeddings, query)
# Top-k
top_k_indices = np.argsort(similarities)[-k:][::-1]def exact_knn(query: np.ndarray, vectors: np.ndarray, k: int) -> list:
"""
Compute distance to ALL vectors, return top k.
Complexity: O(n × d)
n = number of vectors
d = dimensionality
"""
distances = np.linalg.norm(vectors - query, axis=1)
return np.argsort(distances)[:k]
# For 1M vectors, 768 dimensions:
# 1M × 768 = 768M operations per query
# ~100-500ms per query (not scalable!)Trade exact accuracy for speed. Returns neighbors that are "probably" in the true top-k.
Trade-off:
- Exact KNN: 100% recall, O(n) time
- ANN: 95-99% recall, O(log n) time
For 1M vectors:
- Exact: ~200ms
- ANN (HNSW): ~1-5ms
That's 40-200x speedup!# Using FAISS (Facebook AI Similarity Search)
import faiss
# Create index
dimension = 768
index = faiss.IndexFlatL2(dimension) # Exact search
# Add vectors
index.add(embeddings) # numpy array of shape (n, dimension)
# Search
k = 5
query = np.array([query_embedding], dtype='float32')
distances, indices = index.search(query, k)
# indices: [[23, 142, 7, 891, 45]] # Top-k indices
# distances: [[0.12, 0.18, 0.23, 0.31, 0.35]] # Distances
# For approximate search (much faster)
nlist = 100 # Number of clusters
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
index.train(embeddings) # Required for IVF
index.add(embeddings)
# Set search parameters
index.nprobe = 10 # Number of clusters to search
distances, indices = index.search(query, k)MMR balances relevance with diversity to avoid redundant results.
def mmr(
query_embedding: np.ndarray,
candidate_embeddings: np.ndarray,
k: int = 5,
lambda_param: float = 0.5
) -> list[int]:
"""
Maximum Marginal Relevance.
MMR = λ × sim(query, doc) - (1-λ) × max(sim(doc, selected))
λ = 1: pure relevance
λ = 0: pure diversity
λ = 0.5: balanced (recommended)
"""
# Normalize for cosine similarity
query = query_embedding / np.linalg.norm(query_embedding)
candidates = candidate_embeddings / np.linalg.norm(
candidate_embeddings, axis=1, keepdims=True
)
# Query similarities
query_sims = np.dot(candidates, query)
selected = []
remaining = list(range(len(candidates)))
for _ in range(k):
if not remaining:
break
if not selected:
# First: most similar to query
idx = remaining[np.argmax(query_sims[remaining])]
else:
# MMR selection
scores = []
for idx in remaining:
relevance = query_sims[idx]
# Max similarity to already selected
redundancy = max(np.dot(candidates[idx], candidates[s]) for s in selected)
mmr_score = lambda_param * relevance - (1 - lambda_param) * redundancy
scores.append((idx, mmr_score))
idx = max(scores, key=lambda x: x[1])[0]
selected.append(idx)
remaining.remove(idx)
return selectedCombine vector similarity with metadata filters.
# Two approaches:
# 1. Pre-filtering: Filter first, then search
# - Faster when filter is very selective
# - May return fewer than k results
# 2. Post-filtering: Search first, then filter
# - Always returns results
# - May need to search more initially
# Most vector DBs support pre-filtering:
results = collection.query(
query_embedding=query,
top_k=10,
where={
"category": "technical",
"date": {"$gte": "2024-01-01"}
}
)
# Hybrid approach (Qdrant example):
from qdrant_client.models import Filter, FieldCondition, MatchValue
results = client.search(
collection_name="documents",
query_vector=query,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="technical"))
]
),
limit=10
)# Search multiple queries at once (more efficient)
# FAISS
queries = np.array([q1, q2, q3], dtype='float32') # Shape: (3, 768)
distances, indices = index.search(queries, k=5)
# indices shape: (3, 5) - 5 results for each of 3 queries
# Pinecone
results = index.query(
queries=[
{"id": "q1", "values": q1},
{"id": "q2", "values": q2}
],
top_k=5
)In the next lesson, we'll dive deep into indexing algorithms (HNSW, IVF, LSH) that make fast approximate search possible.