KoldOps
June 5, 2026 · KoldOps

Why Manufacturing AI Projects Stall (and the State Layer That Lets Them Ship)

CNC shops, fabricators, and machine builders carry deep state in BOMs, routings, FAI records, customer portals, AS9100 evidence, and OSP cert chains. Stateless LLMs cannot reason against that. The fix is a substrate. KoldOps installs it.

A 60-person CNC shop runs a Claude pilot on quality records. The first two weeks look promising. Six months later, the agent cannot answer "have we ever waived this dimension for this customer before," it confuses last revision's PPAP with the current one, and the quality director quietly stops opening the dashboard. The model is not the problem. The state the operation actually carries (BOMs, routings, FAI records, customer-specific quality manuals, AS9100 evidence files, OSP cert chains, ECN history) has no substrate the agent can read. This piece names the pattern, lists the state a manufacturing operation carries, and lays out the buildout that makes an AI agent useful for a shop.

The pattern in manufacturing

Manufacturing operations have spent the last 18 months running AI pilots. The pilots usually start with a narrow, plausible use case: summarize the morning's quality reports, draft customer responses, extract data from inbound RFQs. The demos work. The pilots launch. The first month is encouraging.

Then comes the second month, where the agent is asked to reason across history. Compare this part's CMM data to the same part run six months ago. Pull the customer's PPM trend across the last four quarters. Find every job in the last two years that ran through the same OSP vendor with a heat-treat spec in this range. These are the questions the experienced engineer answers from memory and the paper file. The AI cannot, because the substrate underneath it is mush, and the retrieval stack is pulling one document at a time from a Google Drive that nobody has organized since 2022.

The pilot stalls. The team concludes the AI is not ready. The model is the same model that demoed brilliantly. The substrate is the problem.

The state a manufacturing operation actually carries

Below is the state inventory for a typical precision machining, fabrication, or assembly operation. Each row is a category an experienced engineer reasons across daily. Each row is a category a stateless LLM cannot reason across without a substrate that makes it queryable.

State category Volume and shape What a stateless LLM does with it
Bills of materialsHundreds to thousands of multi-level BOMs, versioned, with effectivity dates and ECN history.Extracts components from one BOM at a time. Cannot reason about commonality across products, cannot trace component-level history.
Routings and standardsPer part family, per work center. Setup times, run times, tooling lists.Reads one routing. Cannot identify standard-time outliers, cannot suggest improvements from historical actuals.
Quality recordsFAI reports, PPAP packages, CMM data, inspection records, MRR / NCR history per part and per customer.Summarizes one record. Cannot pull defect trends across customers or formations of parts, cannot answer "have we seen this NCR before."
Drawings and revisionsCAD files, customer-supplied prints, ECN records, GD&T notes, revision histories.Reads the print handed to it. Cannot detect that a customer is on revision F but the floor is running revision E.
Customer portals and requirementsCustomer-specific quality manuals, supplier portals, special process specs, PPAP submission requirements.Reads the manual it is fed. Cannot reason about cross-customer requirement conflicts or surface customer-specific waivers from history.
AS9100 / ISO evidenceAudit findings, corrective actions, training records, calibration logs, supplier evaluations.Reads one finding. Cannot prepare an audit response that traces an issue across three years of corrective actions.
ITAR / DFARS recordsCitizenship attestations, country-of-origin certs, specialty-metals compliance, controlled-tech logs.Cannot verify compliance for a new job by walking the existing evidence corpus.
OSP cert chainsHeat treat, plating, anodizing, NDT, passivation. Lot-level certs from outside processors.Cannot assemble a complete cert chain for a job that touched three OSP vendors without manual stitch-up.
ERP / open jobsOpen quotes, WIP, on-hand inventory, customer orders, AR / AP. Often in Visual Job Shop, JobBOSS, Global Shop, ProShop, or E2.Reads the snapshot handed to it. Cannot answer "what's our WIP by customer" without integration into the live ERP.
Tooling and fixturesFixture inventory by part family, tool life records, calibration logs.No knowledge of which fixtures exist or what condition they are in without an indexed corpus.
Customer scoreboardsPPM, OTD, supplier scorecards customers issue back to the shop.No memory of the trend. Each new scorecard is read in isolation.

The pattern: every row is the kind of question an experienced production planner, quality director, or program manager routinely answers by walking to a filing cabinet, opening ProShop, calling the OSP vendor, and reading a 6-year-old AS9100 audit response. The AI was supposed to make those walks unnecessary. Without a substrate, it makes them worse, because the agent confidently fills in the blanks the experienced engineer would know to leave open.

The substrate a shop needs

Shape is the same as every other vertical. Contents are manufacturing-specific.

  • Versioned BOM and routing repository in markdown. Each part family as a canonical document. ECN history visible. Routings cross-referenced to actual standard times by work center.
  • Quality corpus indexed by part, customer, and defect mode. FAIs, PPAPs, CMM data, MRR / NCR history. Searchable across time and across customers.
  • Drawing register with revision tracking. Customer's current revision, internal current revision, every prior revision retrievable with the date and ECN that drove it.
  • Customer-requirements substrate. Each customer as a first-class document. Their quality manual, their PPAP requirements, their special processes, their portal links. Effective-dated.
  • Compliance evidence corpus. AS9100 / ISO 13485 / ITAR / DFARS evidence indexed by clause and by date. The substrate that lets the AI prepare audit-ready answers without re-deriving them from the filing cabinet.
  • OSP vendor records with cert chains. Each outside processor as a first-class entity. Their certs, their lot-level traceability, their historical performance per process.
  • Live ERP integration for the operational layer. Open jobs, WIP, inventory pulled at request time. The substrate holds the policy; the ERP holds the current state.

None of these are exotic. Each is a substrate the shop almost has, scattered across Visual or JobBOSS, a shared drive nobody curates, a quality manager's local folder, a customer portal that takes 90 seconds to log into, and the head of compliance's filing cabinet. The substrate work is to consolidate, version, and review-gate the scattered state into a layer the agent can read fluently.

What the engagement looks like

KoldOps installs manufacturing substrates as fixed-scope engagements. The starting point is one decision domain. Usually quality records, customer-requirements management, or routing standards. Plus one pilot part family or one pilot customer. The sequence:

  1. Business System Review (2 weeks). Map current state. Score the substrate against the 5-question audit. Identify the highest-ROI domain. Hand back a written report.
  2. Substrate buildout for the chosen domain (4 to 8 weeks). Consolidate scattered state into markdown, in git, with named-reviewer gates. Wire retrieval. Stand up MCP layer. Integrate with the existing ERP for live operational data.
  3. Agent pilot on the pilot part family or customer (2 to 4 weeks). Point Claude (or whichever frontier model) at the substrate. Run against real production questions. Measure answer quality against the experienced engineer's judgment.
  4. Expansion to the next domain. The shop's team runs the pattern on their own where possible. KoldOps stays on as advisor.

Total time from contract to a useful in-production agent on the first domain: 8 to 14 weeks. The substrate the shop owns at the end is portable, version-controlled, and works with any frontier model the shop chooses now or later.

Frequently asked

We already have Visual Job Shop / JobBOSS / ProShop / Global Shop. Why do we need a substrate?

Those are systems of record. They are excellent at what they are for (job tracking, scheduling, costing, MRP). They are not, by themselves, substrates an LLM can read fluently. The substrate KoldOps installs sits alongside the ERP, pulls operational data from it, and presents an LLM-native view the agent can reason against. Your ERP stays. The substrate is additive.

We are AS9100 / ISO 13485. Will the substrate help us pass audits?

Directly, yes. Auditors ask "show me the evidence for clause 8.5.1" and the substrate returns the indexed corpus with author, date, and review trail per document. The clauses the substrate is most useful for are documented information (7.5), monitoring and measurement (9.1), and corrective action (10.2). The substrate does not replace the QMS itself; it makes the existing QMS auditable in minutes instead of hours.

Our customer-specific quality manuals are PDFs that change every six months. Can the substrate keep up?

Yes. The ingestion pipeline turns each PDF into a markdown representation on receipt, stores both, and version-controls the markdown so a customer's revision-3 manual and revision-4 manual are diffable. When the agent is asked "what does this customer require for first-article inspection," it pulls the current effective document with the date and reviewer attached.

We are ITAR. Can the substrate be on-premise?

Yes. The substrate is markdown in git, the retrieval stack is open-source, the inference layer can run on a Mac Mini cluster or dedicated GPU box on the shop floor. See on-premise AI for deployment options. Nothing in the substrate architecture requires the cloud.

Can we start with one customer or one part family?

Yes, and we recommend it. The substrate pattern proves itself fastest on a narrow domain. One customer, or one high-volume part family, or one OSP vendor. Once the substrate is providing useful answers on the pilot domain, expansion to the next domain is cheaper than the first.

What's next

If your shop has run an AI pilot that demoed well and stalled in production, run the substrate audit. 15 minutes. The result will tell you whether the work is upstream of the model (substrate) or downstream of it.

If the audit confirms a substrate gap, the next step is a Business System Review. Fixed scope, written report, no further commitment required. We map the decision domains, score the substrate as it stands, and hand back a prioritized buildout plan.

For the broad framing on why this problem exists across all industries, see Your AI Demo Worked. Your AI Project Failed. Here's Why. For the substrate philosophy underneath, see Decision-State, Airlocked to Code-State: Defining the AI Substrate.