What is an embedding model?

An embedding model is a neural network that turns text, images, or other inputs into a fixed-length numerical vector, where semantically similar inputs land near each other in high-dimensional space. The vector is what a vector database, semantic search system, or RAG retriever actually compares.

Why does the embedding model choice matter so much?

It quietly determines the quality ceiling of every retrieval-augmented AI system that depends on it. A weak embedding for the domain means the right documents never surface, no matter how good the generation model is. The fix is benchmarking embedding models on the real fixture set before committing.

Should we use a hosted embedding model or a local one?

Hosted is faster to deploy and metered per call. Local is cheaper at scale and necessary when data cannot leave the perimeter. For most Morvion engagements the hosted path is the right starting point; we move to local when cost or compliance forces the move.

How big should the embedding be?

Between 768 and 1,536 dimensions is the practical range for production text embeddings in 2026. Larger vectors capture more nuance but cost more storage and slower comparison. The smallest vector that still hits the retrieval accuracy target wins.

Embedding model · Morvion Glossary

An embedding model converts input data into a fixed-length numerical vector in a high-dimensional space, typically between 384 and 3,072 dimensions, such that semantically similar inputs end up near each other and unrelated inputs end up far apart. The vector is the substrate that makes retrieval, recommendation, clustering, and classification possible without keyword matching.

How an embedding model works.

The model is a neural network trained on large amounts of paired text so it learns the underlying structure of meaning. At inference time, raw input goes in, a vector comes out. Two documents about the same topic produce vectors close together by cosine distance, even if they share no words. The closeness is the signal a search system reads.

Why embedding models matter.

Every retrieval-augmented AI system depends on an embedding model. The model decides which records are relevant to a query, which means the model decides what the AI gets to read before it answers. A poor embedding choice quietly caps the quality ceiling of the whole system.

Picking an embedding model.

Domain match. Models trained on web text struggle on legal, medical, or code corpora. Use a domain-tuned model or fine-tune one when accuracy matters.
Dimension count. Larger vectors store more nuance and cost more storage. 768 to 1,536 is the sweet spot for most production systems in 2026.
Latency and cost. Hosted models are convenient and metered. Local models eliminate per-call cost at the price of GPU hosting. The trade is workflow-specific.
Version pinning. The embedding model is part of the production stack. Pin the version. A silent provider update shifts the whole index.

Morvion default.

We default to a current hosted model from a stable provider, pinned to a specific version, with a documented re-embed plan for the day the version moves. We benchmark against the customer's real fixture set rather than against marketing leaderboards. If the workflow is regulated or offline, we run a local model on dedicated infrastructure.

Frequently asked.

What is an embedding model?: An embedding model is a neural network that turns text, images, or other inputs into a fixed-length numerical vector, where semantically similar inputs land near each other in high-dimensional space. The vector is what a vector database, semantic search system, or RAG retriever actually compares.
Why does the embedding model choice matter so much?: It quietly determines the quality ceiling of every retrieval-augmented AI system that depends on it. A weak embedding for the domain means the right documents never surface, no matter how good the generation model is. The fix is benchmarking embedding models on the real fixture set before committing.
Should we use a hosted embedding model or a local one?: Hosted is faster to deploy and metered per call. Local is cheaper at scale and necessary when data cannot leave the perimeter. For most Morvion engagements the hosted path is the right starting point; we move to local when cost or compliance forces the move.
How big should the embedding be?: Between 768 and 1,536 dimensions is the practical range for production text embeddings in 2026. Larger vectors capture more nuance but cost more storage and slower comparison. The smallest vector that still hits the retrieval accuracy target wins.

Embedding model

How an embedding model works.

Why embedding models matter.

Picking an embedding model.

Morvion default.

Frequently asked.

Intelligent Systems & AI Infrastructure

Keep reading the glossary.

AI infrastructure

CRM intelligence

Immersive website

AI agent

Business intelligence dashboard

Client portal

Discovery sprint

Digital operating layer

Document intelligence

Eval-driven AI

Hospitality website

Marketplace platform

Multi-agent workflow

Real-time dashboard

Retrieval-augmented generation (RAG)

Prompt engineering

Vector database

AI observability

Fine-tuning

Vector search

Semantic search

Hallucination

Chain-of-thought

Function calling

Model distillation

Safety rails

Eval harness

Regression gate

Model Context Protocol (MCP)

Structured output

Agent tool use

Prompt injection

Agentic search

Observability traces

LLM guardrails

Agent handoff

Vector index

Token budget

Retrieval rerank

Embedding space

Semantic cache

Context window

Faithfulness

Cross-encoder

Model router

AI cost control

Agent memory

Structured extraction

AI evaluation framework

Retrieval quality

AI guardrail policy

Eval fixture

Eval rubric

AI incident

Agent orchestration

Eval versioning

Model fallback

Fine-grained routing

AI policy version control