June 2, 2026 · KoldOps

Every AI Team Builds a Context Engine. Most Don't Realize It.

Every team building with AI rebuilds the same six components. Connectors, retrieval, storage substrate, review and version, protocol, drift detection. Together they are a context engine. The category nobody named.

Context Engine AI Substrate Engineering Discipline

Every team building with AI ends up rebuilding the same six components. Connectors, a retrieval stack, a storage substrate, a review and version system, a protocol layer, a drift detector. Together they are a context engine. Most teams build one without ever calling it that. The unnamed pattern is the reason each company's AI initiative costs 4 to 9 months of engineering it did not budget for.

This piece names the pattern. Defines the term. Lists the six components and what each typically takes to build. Then explains why a category called "context engine, out of the box" is the natural answer to the unnamed cost.

The unnamed pattern

Walk into any engineering team that shipped an AI feature in the last 18 months and ask what they built. The answer rarely contains the phrase "context engine." It contains a sequence of project names: the Notion connector, the embedding pipeline, the RAG service, the document-review workflow, the MCP server, the staleness checker. Each project had a separate owner. Each shipped on a different timeline. Nobody drew a diagram that contained all of them.

If you draw the diagram, the shape is consistent across teams. The same six boxes show up. The boxes connect in the same way. The team that built it sometimes does not realize they built a coherent system, because they shipped it as six separate features.

The cost of not naming the pattern is real. Roadmaps treat each component as a standalone project that competes for headcount with feature work. Architectural decisions get made one component at a time, without the constraint that the components have to fit together. By month six, the components partially overlap, partially conflict, and require a refactor nobody scheduled.

What a context engine actually is

A context engine is the system that gathers, ranks, and delivers context to an AI agent at request time. The job: when the agent receives a query, the engine assembles the documents, structured data, prior conversations, and tool descriptions the agent needs to answer well, and delivers them in the right format and order.

The term is distinct from two adjacent terms:

Prompt engineering is single-turn copywriting. The job is to write the instructions the model sees on a specific query. A context engine produces the context that prompt engineering operates on.
Retrieval-augmented generation (RAG) is one technique inside the engine. A context engine that only does RAG is missing five of the six components below.

"Context engineering" is the discipline. The context engine is the running system that practices it.

The six components

Each component takes 2 to 6 weeks for a competent team to build from scratch the first time. Together, in series with integration testing, the total is typically 4 to 9 months. That is the unnamed cost of building a context engine by accident.

1. Source connectors

The connectors pull source documents and structured data from wherever the business keeps them. Notion, Slack, Drive, Github, Confluence, an ERP-adjacent system, a CMS, a wiki, an internal portal. Each source has a different API, a different auth pattern, a different rate limit, a different update model, a different way of representing deletions. Connectors handle all of that.

Typical build: 1 to 3 weeks per source. Plus ongoing maintenance every time a source vendor changes their API. Plus the queue infrastructure that runs the connectors on a schedule. Plus the dead-letter handling for the inevitable failures.

2. Retrieval stack

The retrieval stack turns the corpus into something queryable at request time. The minimum production setup is BM25 plus vector similarity. The serious setup adds a graph layer for relationships and a reranker on top. Each layer has its own infrastructure, indexes, tuning parameters, and failure modes.

Typical build: 4 to 8 weeks. The architecture decisions (chunking strategy, embedding model, reranker model, hybrid retrieval weighting) are load-bearing and difficult to revisit later.

3. Storage substrate

The substrate is the canonical record the connectors write into and the retrieval stack queries against. The substrate has to satisfy five properties: versioned, reviewable, retrieval-native, replayable, LLM-native. (Full treatment: Vector DBs Aren't Storage. They're Indexes.)

Most teams realize they need a substrate only after they have shipped the connectors and retrieval, when they discover that they cannot answer "what did this system know on March 14" or "who approved this change." Adding the substrate after the fact is a refactor of the entire data pipeline.

Typical build: 3 to 6 weeks if planned from the start. Multiple months if added after the connectors and retrieval are already live.

4. Review and version system

The substrate needs gates. Writes need named reviewers, named approvers, merge dates. The history needs to be reconstructable. This is the discipline that turns a wiki into a substrate. Without it, anyone with edit access can silently rewrite policy, and the AI starts citing the rewrite as if it were canonical.

Typical build: 2 to 4 weeks for the workflow plus integration with the team's existing identity layer. The cultural work of getting humans to actually use the review gates takes longer than the engineering.

5. Protocol layer

The protocol layer exposes the substrate and the retrieval stack to agents. In 2026 this is usually MCP (Model Context Protocol). Earlier teams built custom HTTP endpoints, OpenAPI specs, custom function-calling shims. Each agent vendor (Anthropic, OpenAI, the open-weight ecosystem) has different expectations. The protocol layer normalizes them.

Typical build: 2 to 4 weeks for MCP. Much longer for a custom protocol. Plus the work of supporting multiple agent vendors when the team wants flexibility.

6. Drift detector

The drift detector flags when operations have diverged from the recorded substrate. When the shop floor stops following the routing standard, when the AR team waives the credit hold, when an agent starts producing outputs the substrate would have rejected. Without drift detection, the substrate and reality silently desync.

Typical build: 4 to 8 weeks for a basic version. The hard part is not the detection. It is the human workflow for responding to drift alerts.

The total cost

Add the typical builds together: 16 to 33 weeks of engineering, before integration testing. Integration testing is where the components reveal their incompatibilities, because each was designed without seeing the others. A typical integration phase adds another 4 to 8 weeks.

Round numbers: 4 to 9 months. Plus ongoing maintenance, which compounds. Plus the cost of refactoring when the team learns, around month 12, that the architecture has to change because the AI workload has outgrown the original retrieval choice. The third version of an internally built context engine is the first one that actually works, and most teams ship the first version, declare victory, and starve the maintenance for two quarters before the second version is even scoped.

The category that is missing

A context engine, out of the box, would ship the six components pre-wired. The connectors come with the box. The retrieval stack is tuned to a reasonable default. The substrate is git-backed and follows the 5 properties. The review gates are configurable. The protocol layer speaks MCP. The drift detector is on by default with a defensible alerting threshold.

The team installs the engine, points it at their sources, sets the review policy, and starts shipping AI features against a substrate that already exists. Total time from procurement to first useful agent: days, not months. The 4-to-9-month cost moves from the customer's roadmap to the vendor's R&D, where it amortizes across all customers.

The category is emerging. A small number of vendors are building toward it. Most of what gets called a "context engine" today is one of the six components plus marketing.

Frequently asked

How is a context engine different from RAG?

RAG is one technique inside the retrieval stack. A context engine that only does RAG is missing the other five components. RAG is the verb; the context engine is the noun.

How is a context engine different from a vector database?

A vector database is one component inside the retrieval stack, which is one of the six components inside the context engine. A vector DB by itself is several refactors away from being a context engine.

How is a context engine different from a "memory" product?

Memory products (Mem0, Letta, others) typically focus on the storage substrate component and one slice of the retrieval stack, oriented to per-conversation or per-user state. A context engine is a system-level concept; memory is a single concern inside it.

What if my team is small and we only need RAG?

Then build RAG, ship it, and revisit when you discover the audit, version-control, or drift problems the substrate is supposed to solve. For small teams shipping internal tools where the AI's wrong answer has low cost, this is rational. For teams shipping AI into production for customers, the day you discover you need the substrate is the day you wish you had it from the start.

Is "context engine" a real term or marketing?

It is increasingly a real term. The exact phrase appears in research papers, vendor docs, and engineering blogs from late 2025 onward. The definition is still being staked. This piece is one of the stakes.

What's next

Map your current AI architecture against the 6 components. Identify which are built, which are bought, which are missing, and which are partially overlapping with another component because nobody drew the diagram. The map is the work plan.

For the philosophical grounding on why the substrate component is the load-bearing one, see Decision-State, Airlocked to Code-State: Defining the AI Substrate. For the audit that scores the substrate specifically, see The 5-Question Substrate Audit. For the storage-for-AI category the substrate sits in, see Vector DBs Aren't Storage. They're Indexes.