Retrieval-augmented generation (RAG), semantic search, and AI agents all depend on one thing: the ability to quickly find the most relevant vectors in a large dataset. As embedding datasets grow from millions to billions of records, purely in-memory vector indexes become financially unsustainable. DiskANN solves this problem by storing vector indexes on SSD rather than RAM, enabling web-scale search on commodity hardware.
What is DiskANN?
DiskANN is a graph-based vector search algorithm developed by Microsoft Research that was first published at NeurIPS 2019. It performs approximate nearest neighbor (ANN) search over billion-scale vector datasets, using SSD as its primary storage medium and keeping only a compressed representation of the index in RAM.
Before DiskANN, algorithms like HNSW and FAISS required the entire vector index to reside in DRAM. At a billion vectors, that demands hundreds of gigabytes of RAM and an infrastructure that’s expensive to provision and scale. DiskANN breaks this constraint by shifting the bulk of the index to disk while maintaining recall and latency characteristics competitive with in-memory approaches. It’s built on top of the Vamana algorithm, a novel graph construction method that produces a single-layer directed graph well-suited for efficient disk-based traversal.
Key benchmarks from the original NeurIPS 2019 paper:
- 1B+ vectors indexed on a single machine with 64GB RAM
- 5-10x more vectors per machine vs. DRAM-only solutions at equivalent latency
- <5 ms query latency with 95%+ recall on the SIFT-1B benchmark
DiskANN is now the basis of vector search infrastructure at Microsoft (used in Bing and Microsoft 365) and has been adopted by Couchbase, Azure Cosmos DB, Azure Database for PostgreSQL, TimescaleDB’s pgvectorscale, and other databases.
How DiskANN works
DiskANN combines two techniques: the Vamana graph for index structure and navigation, and product quantization (PQ) for in-memory vector compression.
Building the Vamana graph: Vamana constructs a single-layer directed graph where each node represents a vector. It initializes with random connections, then refines through two-pass pruning. The first pass removes redundant short-range edges, and the second adds long-range edges that let the search algorithm jump quickly to the right region of the graph without expensive multi-hop traversal. Unlike HNSW’s multi-layer hierarchy, DiskANN’s single-layer structure makes it practical on disk.
The RAM/SSD split: PQ-compressed vectors (e.g., 32 bytes vs. 512 bytes for a 128-dim float32 vector) are cached in RAM for fast approximate distance calculations. The full Vamana graph index and full-precision vectors live on SSD and are only read for final reranking.
Query execution: Search runs in two phases. First, the algorithm uses PQ-compressed vectors in RAM to navigate the graph and identify a candidate set – no disk reads required. Then it fetches full-precision vectors from SSD for that candidate set and computes exact distances. This two-phase design is what preserves high recall while keeping RAM requirements low.
FreshDiskANN: The original Vamana implementation produces a static index. FreshDiskANN, a follow-on from Microsoft Research, extends DiskANN to support concurrent real-time inserts, deletes, and updates without full index rebuilds. It maintains over 95% recall, making it practical for streaming datasets such as recommender systems and live document repositories.
DiskANN vs. HNSW vs. IVF
DiskANN, HNSW, and IVF are the three dominant ANN index types in production today.
| DiskANN (Vamana) | HNSW | IVF | |
| Primary storage | SSD + small RAM cache | RAM (full index in memory) | RAM or object storage |
| Max practical scale | Billions of vectors | 100-200M vectors (RAM-limited) | Hundreds of millions |
| Query latency | Low (5-15 ms typical) | Very low (1-5 ms) | Low-medium |
| Memory footprint | Low (PQ-compressed in RAM) | High (full vectors in DRAM) | Medium |
| Real-time updates | Via FreshDiskANN | Supported natively | Expensive (rebuild) |
| Filtered search | Via Filtered-DiskANN | Via post-filtering | Strong (IVF variants) |
| Best for | Billion-scale RAG, agents, recommendations on cost-effective hardware | Smaller datasets where ultra-low latency is critical and RAM is available | Filtered search with high filter ratios (>85%) |
If your dataset fits in RAM (under ~100M vectors), HNSW typically delivers the lowest latency. Once you’re indexing hundreds of millions of vectors, or when RAM cost is a constraint, DiskANN is the more practical choice. For workloads where 85%+ of the dataset is filtered out before search, IVF variants can outperform both.
Performance and benchmarks
Original NeurIPS 2019 results (Microsoft Research): On the SIFT-1B dataset (1 billion 128-dimensional vectors), DiskANN achieved 5,000 QPS at 95%+ recall@1 with sub-5 ms average latency on a single machine with 64GB RAM and an NVMe SSD. That’s 5-10x more vectors per machine than DRAM-based solutions at equivalent performance.
Couchbase Hyperscale Vector Index benchmark (October 2025): Couchbase’s Vamana/DiskANN-based Hyperscale Vector Index, introduced in Couchbase 8.0, was independently benchmarked using VectorDBBench against a 1 billion-vector dataset:
- 700+ QPS with sub-second latency at 93% recall
- 350x faster than MongoDB Atlas, which returned 2 QPS with over 40 seconds of average latency at 89% recall under identical conditions
DiskANN use cases
RAG: Enterprise RAG systems that span billions of document chunks need high-recall retrieval without RAM-heavy infrastructure. DiskANN is well-suited to workloads where prompt content is unpredictable and broad semantic coverage is essential.
AI agents and contextual memory: Agentic systems accumulate interaction history, preferences, and task context as vectors over time. DiskANN lets agents search an unbounded and growing memory corpus without RAM becoming a bottleneck.
Semantic search and recommendations: E-commerce, media, and enterprise search platforms operating over hundreds of millions to billions of items benefit from DiskANN’s throughput and recall accuracy, especially when combined with metadata prefiltering via Filtered-DiskANN.
Privacy-first and on-premises AI: When data cannot leave a controlled environment, DiskANN’s ability to run on local SSD hardware makes privacy-preserving GenAI applications more practical than approaches requiring cloud-hosted, RAM-intensive clusters.
DiskANN in databases and platforms
Couchbase Hyperscale Vector Index: Couchbase 8.0 introduced the Hyperscale Vector Index (HVI), a hybrid Vamana + IVF implementation available in Couchbase Capella and self-managed deployments. It operates across partitioned disks for distributed processing and is specifically designed for RAG workloads that require broad semantic coverage.
Microsoft’s Azure Cosmos DB and Azure Database for PostgreSQL: Azure Cosmos DB uses DiskANN to power vector search in its NoSQL API. Azure Database for PostgreSQL offers it as a Vamana-based alternative to pgvector’s HNSW and IVFFlat.
Milvus and Zilliz: Milvus is an open-source vector database that supports DiskANN as an on-disk index type (DISKANN) for billion-scale collections with the Vamana graph on disk and PQ-compressed vectors in RAM. Zilliz Cloud is the fully managed enterprise version of Milvus.
pgvectorscale (TimescaleDB): pgvectorscale is a PostgreSQL extension developed by Timescale that implements StreamingDiskANN. It’s optimized for continuously updated time-series and streaming datasets.
How to tune DiskANN
DiskANN’s key parameters govern the trade-off between recall latency and throughput:
- MaxDegree: Maximum out-edges per graph node. Higher values improve recall but increase index size and SSD reads. Default: 56.
- SearchListSize: Candidate list size during search. Increase to 150-200 for high-recall workloads (RAG, agent memory); keep at 100 for maximum throughput. Default: 100.
- PQCodeBudgetGBRatio: Fraction of dataset size to cache as PQ-compressed vectors in RAM. Increase to 0.2 if RAM headroom allows and latency is critical. Default: 0.125.
- BeamWidthRatio: Parallel SSD reads per query step. Tune upward (6.0-8.0) to maximize QPS on high-throughput workloads. Default: 4.0.
For hardware sizing, budget roughly 750GB-1TB of NVMe SSD for a 1B 128-dimensional float32 dataset (512GB for full-precision vectors + ~224GB for graph edges), and 64-128GB of RAM for the PQ cache. DiskANN is not CPU-bound; 8-16 cores are sufficient for most deployments.
Key takeaways
DiskANN addresses the fundamental economic problem of in-memory ANN indexing by using inexpensive SSDs rather than expensive RAM. By combining the Vamana graph construction algorithm with product quantization, it achieves recall and latency competitive with in-memory approaches at a fraction of the infrastructure cost.
- DiskANN is a graph-based ANN algorithm from Microsoft Research (NeurIPS 2019) that is built on the Vamana directed graph construction algorithm.
- It stores the full index and full-precision vectors on SSD, caching only PQ-compressed vectors in RAM for fast approximate routing.
- It indexes 1B+ vectors on a single machine with 64GB RAM, achieving 95%+ recall@1 with sub-5 ms latency on the SIFT-1B benchmark.
- DiskANN indexes 5-10x more vectors per machine than DRAM-only algorithms at equivalent latency, directly reducing infrastructure cost.
- It’s the right choice when datasets exceed 100-200M vectors or when RAM cost is a constraint. HNSW is preferable for smaller latency-critical workloads.
- FreshDiskANN extends DiskANN to support real-time inserts, deletes, and updates without full index rebuilds.
- Couchbase’s Hyperscale Vector Index delivers 700+ QPS at 93% recall at billion-vector scale. This is 350x faster than MongoDB Atlas in independent VectorDBBench testing.
Related resources
- Couchbase 8.0: Unified Data Platform for Hyperscale AI Applications
- Vector Search Using Hyperscale Vector Indexes
- AI Services in Capella
- Vector Search Database
- How I Built a Plant RAG Application With Couchbase Vector Search on iOS
- Artificial Intelligence (AI) Use Cases
FAQs
What is the Vamana algorithm, and how does it differ from HNSW? Vamana builds a single-layer directed graph, while HNSW builds a multi-layer hierarchy. HNSW’s structure requires the entire index to be in RAM for efficient pointer traversal. Vamana’s single-layer design, with explicit long-range edges added during construction, enables the same fast navigation from disk without the RAM dependency.
How does product quantization work in DiskANN, and why is it necessary? PQ compresses each vector into a compact code (typically 16-32x smaller) by dividing it into sub-vectors and mapping each to a learned centroid. DiskANN stores these codes in RAM for fast approximate routing, then fetches full-precision vectors from SSD only for final reranking, keeping the in-memory footprint tractable even at billion-vector scale.
What are DiskANN’s limitations, and when is it not the best choice? DiskANN is less suitable for small datasets (under ~10M vectors), where HNSW offers lower latency with less overhead. It’s also less suitable for workloads with very high filter ratios (85-98%), where IVF variants outperform graph-based indexes. Query latency is also highly sensitive to disk speed, and SATA SSDs will significantly underperform NVMe.
How do I estimate hardware requirements for a DiskANN deployment? For a 1B 128-dimensional float32 dataset, budget roughly 750GB-1TB of NVMe SSD (512GB for vectors plus ~224GB for graph edges) and 64-128GB of RAM for the PQ cache. DiskANN is I/O-bound rather than CPU-bound, so 8-16 cores are sufficient for most production deployments.Which databases support DiskANN? DiskANN is available in Couchbase 8.0 (Hyperscale Vector Index, benchmarked at 700+ QPS at billion-vector scale), Azure Cosmos DB, Azure Database for PostgreSQL, Milvus/Zilliz Cloud, and TimescaleDB’s pgvectorscale. Microsoft also uses DiskANN in Bing and Microsoft 365, making it the most widely deployed billion-scale vector search algorithm in enterprise infrastructure today.
댓글 남기기
댓글을 달기 위해서는 로그인해야합니다.