Vector Databases Explained: The Foundation of AI Search

Every AI application that retrieves relevant information—RAG systems, semantic search, recommendation engines—relies on vector databases. With the explosion of LLM applications, understanding vector databases has become essential for engineers.

Here’s how they work and how to use them effectively.

What Are Vector Databases?

The Core Concept

vector_database_basics:
  what:
    - Databases optimized for storing and querying vectors
    - Vectors are numerical representations of data
    - Enable similarity search at scale

  why_needed:
    - Traditional databases: exact match, range queries
    - Vector databases: "find similar items"
    - Enable semantic understanding in applications

  key_operation:
    query: "Find the k nearest neighbors to this vector"
    unlike: "Find records where field = value"

Embeddings

# Converting text to vectors using embeddings
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Text becomes a dense vector
text = "How do I configure Kubernetes networking?"
embedding = model.encode(text)

print(embedding.shape)  # (384,) - 384 dimensions
print(embedding[:5])    # [-0.012, 0.234, -0.089, 0.156, 0.078]

Similarity Search

similarity_metrics:
  cosine_similarity:
    what: Angle between vectors
    range: -1 to 1 (1 = identical direction)
    use_case: Text similarity (normalized vectors)

  euclidean_distance:
    what: Straight-line distance between points
    range: 0 to infinity (0 = identical)
    use_case: Image similarity, general purpose

  dot_product:
    what: Product of magnitudes and angle
    range: -infinity to infinity
    use_case: Recommendation systems (when magnitude matters)

Vector Database Options

Landscape

vector_databases:
  dedicated:
    pinecone:
      type: Managed service
      strengths: Easy to use, fast, scalable
      considerations: Vendor lock-in, cost

    weaviate:
      type: Open source / managed
      strengths: GraphQL, hybrid search, modules
      considerations: Complexity

    qdrant:
      type: Open source / cloud
      strengths: Rust-based, fast, filtering
      considerations: Newer ecosystem

    milvus:
      type: Open source
      strengths: Highly scalable, mature
      considerations: Operational complexity

  extensions:
    pgvector:
      type: PostgreSQL extension
      strengths: Use existing Postgres, ACID
      considerations: Scale limits

    elasticsearch:
      type: Dense vector support
      strengths: Existing infrastructure, hybrid search
      considerations: Not vector-first design

Choosing a Solution

decision_criteria:
  use_pgvector_when:
    - Already using PostgreSQL
    - < 1M vectors
    - Want ACID transactions
    - Simple use case

  use_dedicated_when:
    - > 1M vectors
    - Need low latency at scale
    - Vector search is core functionality
    - Complex filtering requirements

  use_managed_when:
    - Don't want operational overhead
    - Need to scale quickly
    - Budget allows

  use_self_hosted_when:
    - Data sovereignty requirements
    - Cost optimization at scale
    - Existing infrastructure team

Implementation Patterns

Basic Operations

# Using Pinecone as example
import pinecone

# Initialize
pinecone.init(api_key="your-api-key", environment="us-east1-gcp")

# Create index
pinecone.create_index(
    name="documents",
    dimension=384,  # Match your embedding model
    metric="cosine"
)

index = pinecone.Index("documents")

# Upsert vectors
index.upsert(vectors=[
    {
        "id": "doc-1",
        "values": embedding_1,
        "metadata": {"source": "docs", "category": "kubernetes"}
    },
    {
        "id": "doc-2",
        "values": embedding_2,
        "metadata": {"source": "blog", "category": "docker"}
    }
])

# Query with filtering
results = index.query(
    vector=query_embedding,
    top_k=5,
    filter={"category": "kubernetes"},
    include_metadata=True
)

Hybrid Search

# Combining vector and keyword search
class HybridSearch:
    def __init__(self, vector_db, keyword_db):
        self.vector_db = vector_db
        self.keyword_db = keyword_db

    def search(self, query, k=10, alpha=0.5):
        """
        alpha=1: pure vector search
        alpha=0: pure keyword search
        """
        # Get vector results
        query_embedding = self.encode(query)
        vector_results = self.vector_db.query(query_embedding, top_k=k*2)

        # Get keyword results
        keyword_results = self.keyword_db.search(query, top_k=k*2)

        # Combine scores
        combined = self.reciprocal_rank_fusion(
            vector_results,
            keyword_results,
            alpha
        )

        return combined[:k]

    def reciprocal_rank_fusion(self, vec_results, kw_results, alpha):
        """Combine rankings from both methods."""
        scores = {}

        for rank, result in enumerate(vec_results):
            scores[result.id] = alpha * (1 / (rank + 60))

        for rank, result in enumerate(kw_results):
            if result.id in scores:
                scores[result.id] += (1 - alpha) * (1 / (rank + 60))
            else:
                scores[result.id] = (1 - alpha) * (1 / (rank + 60))

        return sorted(scores.items(), key=lambda x: x[1], reverse=True)

Indexing Strategies

Index Types

index_types:
  flat:
    description: Exact search, no approximation
    pros: Perfect accuracy
    cons: Slow at scale O(n)
    use: < 10K vectors, accuracy critical

  ivf:
    description: Inverted file, cluster-based
    pros: Good balance of speed/accuracy
    cons: Requires training, parameter tuning
    use: 10K - 1M vectors

  hnsw:
    description: Hierarchical navigable small world
    pros: Fast, high accuracy
    cons: Memory intensive
    use: Most common for production

  pq:
    description: Product quantization, compression
    pros: Memory efficient
    cons: Lower accuracy
    use: Very large datasets, memory constrained

HNSW Configuration

# HNSW parameters
hnsw_config = {
    "M": 16,           # Number of connections per layer
    "ef_construction": 100,  # Build quality (higher = better, slower)
    "ef_search": 50,   # Search quality (higher = better, slower)
}

# Trade-offs:
# Higher M: Better recall, more memory, slower
# Higher ef: Better recall, slower search
# Typical production: M=16, ef_construction=100-200, ef_search=50-100

Production Considerations

Chunking Strategies

class DocumentChunker:
    def __init__(self, chunk_size=500, overlap=50):
        self.chunk_size = chunk_size
        self.overlap = overlap

    def chunk(self, document):
        """Split document into overlapping chunks."""
        chunks = []
        text = document.content
        start = 0

        while start < len(text):
            end = start + self.chunk_size

            # Try to break at sentence boundary
            if end < len(text):
                last_period = text.rfind('.', start, end)
                if last_period > start + self.chunk_size // 2:
                    end = last_period + 1

            chunks.append({
                "text": text[start:end],
                "start": start,
                "end": end,
                "document_id": document.id
            })

            start = end - self.overlap

        return chunks

Metadata Design

metadata_best_practices:
  include:
    - Source document ID
    - Chunk position/index
    - Creation/update timestamp
    - Access permissions
    - Content type
    - Category/tags

  avoid:
    - Large text blobs (use IDs instead)
    - Frequently changing data
    - Complex nested structures

  enable:
    - Pre-filtering before vector search
    - Post-processing enrichment
    - Access control

Refresh and Updates

class VectorIndexManager:
    def __init__(self, vector_db, document_store):
        self.vector_db = vector_db
        self.document_store = document_store

    def refresh_document(self, doc_id):
        """Re-index a document after update."""
        # Get updated document
        document = self.document_store.get(doc_id)

        # Delete old vectors
        self.vector_db.delete(filter={"document_id": doc_id})

        # Create new chunks and vectors
        chunks = self.chunker.chunk(document)
        vectors = self.create_vectors(chunks)

        # Upsert new vectors
        self.vector_db.upsert(vectors)

    def full_reindex(self):
        """Rebuild entire index (for embedding model updates)."""
        # This is expensive - plan for maintenance windows
        pass

Key Takeaways

Vector databases enable semantic search through embedding similarity
Choose based on scale: pgvector for small, dedicated for large
HNSW is the most common production index type
Chunk documents appropriately for your use case
Hybrid search combines semantic and keyword for better results
Design metadata for filtering and post-processing
Plan for index updates and maintenance
Test with realistic data volumes and query patterns
Monitor latency and recall in production

Vector databases are infrastructure. Understand them to build effective AI applications.