Embedding space is the high-dimensional vector geometry into which an embedding model places text. Each piece of text becomes a point — typically 768 to 3072 dimensions — and semantically similar passages land near each other. Distance in this space is the proxy retrieval systems use for relevance.

How it's constructed.

Embedding models are trained on hundreds of millions of (anchor, positive, negative) triples — pairs of texts that should be close together, and pairs that should not. After training, the model emits a fixed-length vector for any input, with the property that nearby vectors mean "related" and far vectors mean "unrelated".

Properties worth knowing.

  • Cosine similarity is the standard metric. Dot product is equivalent for normalized vectors.
  • Dimensionality matters. Higher-dimensional embeddings generally separate concepts better but cost more in storage and search.
  • Models are not interchangeable. Two embeddings from different models occupy different spaces and cannot be compared directly. Re-embed everything when you change models.

Where it fails.

Embedding space captures topical similarity, not always relevance. A query for "Q3 revenue numbers" finds documents about revenue and Q3 — including ones that mention revenue prospects without ever stating numbers. The fix is rerank (a cross-encoder that judges relevance directly) plus hybrid retrieval (combine with keyword/BM25 scores for exact-match fallback).

Frequently asked.

What is embedding space in plain terms?
Embedding space is the multi-dimensional 'map' an embedding model uses for text. Each piece of text becomes a point in that space, and texts about similar things land near each other. Retrieval systems use distance in this space as a proxy for 'how relevant is this passage to my query'.
How many dimensions does an embedding have?
Typically 768 to 3072. OpenAI text-embedding-3-large is 3072, text-embedding-3-small is 1536. Voyage-3 is 1024. BGE-Large is 1024. Higher-dimensional embeddings separate concepts better but cost more in storage and search latency. For most production RAG, 1024–1536 is the sweet spot.
Can I mix embeddings from different models?
No. Embeddings from different models live in different spaces and cannot be compared directly. When you change embedding models, you must re-embed the entire knowledge base from scratch. Plan for this — model upgrades aren't free.
Why does embedding-only search fail on some queries?
Embedding space captures topical similarity, not always exact-match relevance. Queries with proper names, codes, version numbers, or rare terms can get poor recall in pure semantic search. The production answer is hybrid retrieval (combine vector + BM25 keyword scores) plus rerank.