Every AI application that retrieves relevant information—RAG systems, semantic search, recommendation engines—relies on vector databases. With the explosion of LLM applications, understanding vector databases has become essential for engineers.
Here’s how they work and how to use them effectively.
What Are Vector Databases?
The Core Concept
vector_database_basics:
what:
- Databases optimized for storing and querying vectors
- Vectors are numerical representations of data
- Enable similarity search at scale
why_needed:
- Traditional databases: exact match, range queries
- Vector databases: "find similar items"
- Enable semantic understanding in applications
key_operation:
query: "Find the k nearest neighbors to this vector"
unlike: "Find records where field = value"
Embeddings
# Converting text to vectors using embeddings
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Text becomes a dense vector
text = "How do I configure Kubernetes networking?"
embedding = model.encode(text)
print(embedding.shape) # (384,) - 384 dimensions
print(embedding[:5]) # [-0.012, 0.234, -0.089, 0.156, 0.078]
Similarity Search
similarity_metrics:
cosine_similarity:
what: Angle between vectors
range: -1 to 1 (1 = identical direction)
use_case: Text similarity (normalized vectors)
euclidean_distance:
what: Straight-line distance between points
range: 0 to infinity (0 = identical)
use_case: Image similarity, general purpose
dot_product:
what: Product of magnitudes and angle
range: -infinity to infinity
use_case: Recommendation systems (when magnitude matters)
Vector Database Options
Landscape
vector_databases:
dedicated:
pinecone:
type: Managed service
strengths: Easy to use, fast, scalable
considerations: Vendor lock-in, cost
weaviate:
type: Open source / managed
strengths: GraphQL, hybrid search, modules
considerations: Complexity
qdrant:
type: Open source / cloud
strengths: Rust-based, fast, filtering
considerations: Newer ecosystem
milvus:
type: Open source
strengths: Highly scalable, mature
considerations: Operational complexity
extensions:
pgvector:
type: PostgreSQL extension
strengths: Use existing Postgres, ACID
considerations: Scale limits
elasticsearch:
type: Dense vector support
strengths: Existing infrastructure, hybrid search
considerations: Not vector-first design
Choosing a Solution
decision_criteria:
use_pgvector_when:
- Already using PostgreSQL
- < 1M vectors
- Want ACID transactions
- Simple use case
use_dedicated_when:
- > 1M vectors
- Need low latency at scale
- Vector search is core functionality
- Complex filtering requirements
use_managed_when:
- Don't want operational overhead
- Need to scale quickly
- Budget allows
use_self_hosted_when:
- Data sovereignty requirements
- Cost optimization at scale
- Existing infrastructure team
Implementation Patterns
Basic Operations
# Using Pinecone as example
import pinecone
# Initialize
pinecone.init(api_key="your-api-key", environment="us-east1-gcp")
# Create index
pinecone.create_index(
name="documents",
dimension=384, # Match your embedding model
metric="cosine"
)
index = pinecone.Index("documents")
# Upsert vectors
index.upsert(vectors=[
{
"id": "doc-1",
"values": embedding_1,
"metadata": {"source": "docs", "category": "kubernetes"}
},
{
"id": "doc-2",
"values": embedding_2,
"metadata": {"source": "blog", "category": "docker"}
}
])
# Query with filtering
results = index.query(
vector=query_embedding,
top_k=5,
filter={"category": "kubernetes"},
include_metadata=True
)
Hybrid Search
# Combining vector and keyword search
class HybridSearch:
def __init__(self, vector_db, keyword_db):
self.vector_db = vector_db
self.keyword_db = keyword_db
def search(self, query, k=10, alpha=0.5):
"""
alpha=1: pure vector search
alpha=0: pure keyword search
"""
# Get vector results
query_embedding = self.encode(query)
vector_results = self.vector_db.query(query_embedding, top_k=k*2)
# Get keyword results
keyword_results = self.keyword_db.search(query, top_k=k*2)
# Combine scores
combined = self.reciprocal_rank_fusion(
vector_results,
keyword_results,
alpha
)
return combined[:k]
def reciprocal_rank_fusion(self, vec_results, kw_results, alpha):
"""Combine rankings from both methods."""
scores = {}
for rank, result in enumerate(vec_results):
scores[result.id] = alpha * (1 / (rank + 60))
for rank, result in enumerate(kw_results):
if result.id in scores:
scores[result.id] += (1 - alpha) * (1 / (rank + 60))
else:
scores[result.id] = (1 - alpha) * (1 / (rank + 60))
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
Indexing Strategies
Index Types
index_types:
flat:
description: Exact search, no approximation
pros: Perfect accuracy
cons: Slow at scale O(n)
use: < 10K vectors, accuracy critical
ivf:
description: Inverted file, cluster-based
pros: Good balance of speed/accuracy
cons: Requires training, parameter tuning
use: 10K - 1M vectors
hnsw:
description: Hierarchical navigable small world
pros: Fast, high accuracy
cons: Memory intensive
use: Most common for production
pq:
description: Product quantization, compression
pros: Memory efficient
cons: Lower accuracy
use: Very large datasets, memory constrained
HNSW Configuration
# HNSW parameters
hnsw_config = {
"M": 16, # Number of connections per layer
"ef_construction": 100, # Build quality (higher = better, slower)
"ef_search": 50, # Search quality (higher = better, slower)
}
# Trade-offs:
# Higher M: Better recall, more memory, slower
# Higher ef: Better recall, slower search
# Typical production: M=16, ef_construction=100-200, ef_search=50-100
Production Considerations
Chunking Strategies
class DocumentChunker:
def __init__(self, chunk_size=500, overlap=50):
self.chunk_size = chunk_size
self.overlap = overlap
def chunk(self, document):
"""Split document into overlapping chunks."""
chunks = []
text = document.content
start = 0
while start < len(text):
end = start + self.chunk_size
# Try to break at sentence boundary
if end < len(text):
last_period = text.rfind('.', start, end)
if last_period > start + self.chunk_size // 2:
end = last_period + 1
chunks.append({
"text": text[start:end],
"start": start,
"end": end,
"document_id": document.id
})
start = end - self.overlap
return chunks
Metadata Design
metadata_best_practices:
include:
- Source document ID
- Chunk position/index
- Creation/update timestamp
- Access permissions
- Content type
- Category/tags
avoid:
- Large text blobs (use IDs instead)
- Frequently changing data
- Complex nested structures
enable:
- Pre-filtering before vector search
- Post-processing enrichment
- Access control
Refresh and Updates
class VectorIndexManager:
def __init__(self, vector_db, document_store):
self.vector_db = vector_db
self.document_store = document_store
def refresh_document(self, doc_id):
"""Re-index a document after update."""
# Get updated document
document = self.document_store.get(doc_id)
# Delete old vectors
self.vector_db.delete(filter={"document_id": doc_id})
# Create new chunks and vectors
chunks = self.chunker.chunk(document)
vectors = self.create_vectors(chunks)
# Upsert new vectors
self.vector_db.upsert(vectors)
def full_reindex(self):
"""Rebuild entire index (for embedding model updates)."""
# This is expensive - plan for maintenance windows
pass
Key Takeaways
- Vector databases enable semantic search through embedding similarity
- Choose based on scale: pgvector for small, dedicated for large
- HNSW is the most common production index type
- Chunk documents appropriately for your use case
- Hybrid search combines semantic and keyword for better results
- Design metadata for filtering and post-processing
- Plan for index updates and maintenance
- Test with realistic data volumes and query patterns
- Monitor latency and recall in production
Vector databases are infrastructure. Understand them to build effective AI applications.