A vector index is the data structure that makes semantic search fast at production scale. Storing raw embedding vectors and comparing them one by one against a query embedding works at small scale but collapses past a few thousand records. A vector index uses approximate-nearest-neighbour algorithms to return the top-K most similar items in single-digit milliseconds across millions of records.

The common algorithms.

  • HNSW (Hierarchical Navigable Small World). The production default. Excellent recall, fast queries, moderate memory footprint. Used by FAISS, Pinecone, Weaviate, pgvector, Qdrant.
  • IVF (Inverted File). Clusters embeddings into cells; queries scan only nearby cells. Faster build, lower recall. Useful at very large scale.
  • Flat / brute force. Scan every vector. Highest recall, lowest latency at small scale (under ~50k vectors), no index- build step.

Where to host it.

Postgres + pgvector is the right default for most teams: the vector index lives in the same database as the rest of the application data, transactions are atomic, and operational complexity is minimal. Dedicated vector stores (Pinecone, Weaviate, Qdrant) earn their cost above a few million vectors, or when retrieval is the bottleneck and the team needs specialised index tuning.

Hybrid retrieval.

A vector index alone misses exact-match queries — names, codes, and rare terms get poor recall in pure semantic search. The production answer is hybrid retrieval: combine vector index results with BM25 keyword scores via reciprocal-rank fusion. The eval harness measures which mix wins for the specific query distribution and the team tunes accordingly. See RAG in production for the full architecture decisions.

Frequently asked.

What is a vector index?
A vector index is the data structure that stores embedding vectors with approximate-nearest-neighbour lookups, returning the top-K most similar items in milliseconds across millions of records. It's the production infrastructure that makes semantic search and RAG fast.
Do I need a dedicated vector database?
Probably not at first. Postgres + pgvector is the right default for most teams: the vector index lives in the same database as application data, transactions are atomic, operational complexity is minimal. Dedicated stores (Pinecone, Weaviate, Qdrant) earn their cost above a few million vectors or when retrieval is the bottleneck.
What's the difference between a vector index and a vector store?
A vector index is the data structure (HNSW, IVF, flat). A vector store is the managed service or database that hosts it. Postgres + pgvector is a vector store with an HNSW vector index. Pinecone is a vector store that hides its index implementation behind an API. The choice between hosting models matters more than the algorithm choice for most teams.
Should I tune the vector index for higher recall or lower latency?
Measure both against your actual query distribution. HNSW exposes two knobs: efConstruction (build-time recall) and efSearch (query-time recall). Lower efSearch is faster but misses more relevant chunks; higher efSearch is slower but catches more. The eval harness picks the right point for your specific retrieval-quality vs. latency budget.