Vector Database

Definition

A vector database is a storage system optimized for saving, indexing, and querying high-dimensional vectors (embeddings). The fundamental operation is similarity search: given a query vector, find the k most similar vectors in the database.

It’s the core infrastructure for RAG systems, semantic search, recommendations, and other AI applications requiring meaning-based retrieval instead of keyword matching.

How It Works

Indexing: vectors are organized in data structures enabling efficient search. Common algorithms: HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), PQ (Product Quantization).

Similarity search: given a query vector, the index identifies probable candidates and calculates actual distance. Common metrics: cosine similarity, euclidean distance, dot product.

Approximate Nearest Neighbor (ANN): to scale to millions/billions of vectors, precision is sacrificed for speed. The database returns “sufficiently close” results, not guaranteed nearest neighbors.

Main Solutions

Specialized databases: Pinecone, Weaviate, Qdrant, Milvus, Chroma. Optimized for vector workloads, managed or self-hosted.

Extensions of existing databases: pgvector (PostgreSQL), Atlas Vector Search (MongoDB), OpenSearch. Vector search without additional infrastructure.

In-memory: FAISS (Facebook), Annoy (Spotify). Libraries for vector search, not complete databases. Suitable for prototypes or datasets fitting in RAM.

Selection Criteria

Scale: how many vectors? Thousands (in-memory suffices), millions (specialized database), billions (enterprise solutions).

Latency requirements: acceptable p99 latency? HNSW offers low latency, IVF is more memory-efficient but slower.

Filtering: filter by metadata alongside similarity? Not all solutions handle hybrid search (vector + filter) well.

Managed vs self-hosted: Pinecone is fully managed, Qdrant/Milvus require operations. Trade-off cost vs control.

Costs: pricing models vary. Per-query, per-vector-stored, per-compute. At scale, differences are significant.

Practical Considerations

Dimensionality: vectors at 1536 dimensions (OpenAI) occupy ~6KB each. 10 million vectors = ~60GB just for vectors, plus indexes and metadata.

Index build time: building HNSW index on millions of vectors takes hours. Plan for incremental updates or periodic re-indexing.

Recall vs latency tradeoff: index parameters (ef_construction, M for HNSW) balance accuracy and speed. Tuning specific to use case.

Common Misconceptions

”I need a vector database for every RAG project”

For prototypes or small corpora, FAISS in-memory or pgvector on existing PostgreSQL may suffice. Dedicated database justified by scale or latency requirements.

”All vector databases are equivalent”

Differences in performance, features (hybrid search, multi-tenancy, filtering), operational complexity, and costs are significant. Benchmark on your use case.

”Similarity search is exact”

Most production implementations use ANN, which is approximate. Recall (fraction of true nearest neighbors found) is configurable but rarely 100%.

Embeddings: vectors being stored
RAG: architectural pattern using vector database for retrieval

Sources

ANN Benchmarks: algorithm and implementation comparison
Malkov, Y. & Yashunin, D. (2018). Efficient and Robust Approximate Nearest Neighbor using Hierarchical Navigable Small World Graphs. IEEE TPAMI