background gradient shape
background gradient
background gradient

OOD in industrial production: how to detect when a part goes out of the training domain

A machine vision system is trained on ten thousand images of conforming and defective parts, under precise lighting conditions, with a fixed acquisition geometry. It achieves 97% accuracy on the test set. It goes into production.

Three weeks later, a batch of parts arrives from a secondary supplier. The surface is slightly different, not defective, just different. The model has never seen it. Yet it continues to output predictions. With the same apparent confidence. Some are correct. Others are not. And nothing in the system indicates that anything has changed.

This is the OOD (Out-Of-Distribution) problem. And in industry, it is widely underestimated.

What "going out of the training domain" really means

The training domain is not a visible boundary

A deep learning model learns from a training dataset. This dataset implicitly defines a domain: lighting conditions, viewing angles, materials, textures, and seen or unseen defects. This domain is nowhere written in the code. It is not parameterized. It is latent within the model's weights.

The problem is that an input outside this domain does not produce a system error. The model still processes the part. It still outputs a prediction. It classifies "conforming" or "defective" with the same apparent behavior as under nominal conditions.

What it does not say is that this prediction is not based on anything solid. It is operating outside of what it knows, and it is unaware of it.

The three most common causes of OOD in production

In an industrial context, OOD situations do not come from extreme cases. They come from the daily life of the line:

Change of supplier or raw material. A part that is geometrically identical but made from a slightly different alloy can have altered surface reflectance. The model has not seen it under this radiometric angle.

Gradual drift in acquisition conditions. Lighting ages. The lens gets slightly dirty. The camera shifts by a few millimeters. Each change is invisible individually. Accumulated over three months, they shift the input distribution out of the training domain.

New defects not represented in training. A new type of defect, not present in the initial data, is treated by the model as a part to be classified. It will classify something. Not necessarily correctly.

In all three cases, the model remains silent about its own doubt. This is what the research community refers to as "silent failure"—a failure that triggers no alarm because the system does not know that it does not know.

Why post-mortem monitoring is not enough

The usual response to the OOD problem in production systems is monitoring. A dashboard calculates accuracy over a sliding window. If performance drops, an alert is raised.

This model has two flaws.

The first: by the time the alert is raised, hundreds or thousands of parts have already been classified under degraded conditions. In quality control, this can represent a full day of poorly evaluated production.

The second: aggregate drift detection assumes that OOD errors will affect a large enough volume of parts to move an average metric. But if the change is gradual, localized to certain reference parts, or concentrated over a few hours in a day, the metric may never move enough to trigger an alert while silent errors accumulate.

Post-mortem monitoring reasons about cohorts. The OOD problem arises at the level of each individual part.

What per-prediction reliability changes in OOD detection

The principle: measure uncertainty, not performance

A per-prediction reliability brick does not assess the global performance of the model. It evaluates, for each individual prediction, the level of uncertainty associated with that specific output.

In industrial 2D vision, this translates into confidence metrics attached to each produced bounding box: the uncertainties σx, σy, σw, σh measure the dispersion of the prediction in its four spatial dimensions. A bounding box with high σ values is a direct signal that the model is in a zone where its predictions are unstable.

This signal appears before the prediction is proven correct or incorrect. It does not wait for comparison with a ground truth. It does not wait for a measurable statistical degradation on a cohort. It is available within 20 ms in edge mode, for each inference, in real time.

What this actually looks like on a line

Take the case of the supplier change mentioned earlier. The first parts of the new batch enter the vision system. The model processes them. Its predictions come out with a "conforming" or "defective" classification. But the confidence metrics associated with these predictions are abnormally high: σx and σy indicate an unusual dispersion in the localization of detected objects.

This signal is sent in real time. The system can alert the operator, quarantine the parts, or switch to reinforced human supervision, without waiting for global metrics to degrade, and without waiting for an entire batch to be compromised.

Model drift related to OOD is thus detectable from the very first affected parts, not after hours of production.

Differentiating section: OOD detection by uncertainty vs aggregate monitoring

The table below summarizes the practical difference between the two approaches for a quality manager or a CTO who must decide how to equip their line.

Criterion

Aggregate post-mortem monitoring

Per-prediction reliability (real-time OOD)

Detection granularity

Cohort (time window)

Individual part

Alert delay

Minutes to hours after the drift

From the first unstable prediction

Actionable signal without ground truth

❌ Requires label comparison

✅ Intrinsic uncertainty signal

Gradual model drift detection

❌ Can go unnoticed

✅ Detects slow shifts

Black-box compatibility (no model access)

Variable

✅ Plug-and-play, no model modifications

Operational latency

N/A (post-mortem)

✅ <100ms (20ms in edge mode)

Documentation for EU AI Act

❌ Aggregate metrics insufficient

✅ Per-inference reliability log

The right column is not theoretical. It corresponds to a deployed architecture, validated on production data.

What this looks like on real data: the VEDECOM PoC

The PoC conducted with the VEDECOM Institute on cooperative perception for autonomous vehicles is the reference available. The context is different from stationary industrial vision—it involves multi-sensor fusion in a dynamic environment—but the mechanism is identical: vision predictions subjected to out-of-distribution inputs (weather conditions, unseen angles, partial occlusions).

The results on production data, without retraining the client's model:

  • -83% false positives eliminated

  • -65% position errors (from 1.44 m to 0.51 m)

  • -63% orientation errors (from 6.28° to 2.35°)

  • Real-time execution: 20 ms on the edge

Benchmarked against 7 alternative fusion methods [Fadili et al., IRCE 2025]. These metrics are published.

Translated to a quality control line: -83% false positives means that many avoided shutdowns, conforming parts not wrongly rejected, and spared manual verification cycles. It is also an actionable reliability log to satisfy EU AI Act requirements on the traceability of high-risk systems.

Published sector benchmarks on industrial quality control indicate 30% to 60% reductions in false rejections after implementing a reliability layer, with inter-batch stability improved by 20% to 35% [Jidoka Tech, 2025; McKinsey].

How to integrate OOD detection without modifying the existing model

This is often the first question from technical teams: do we need to retrain the model? Access the weights? Modify the pipeline?

The answer is no on all three points.

A plug-and-play reliability layer connects in parallel to the existing AI model. It optionally receives the context of the input data. It outputs confidence metrics. It does not have access to the model's weights, does not modify its architecture, and does not require any process changes on the line.

The typical deployment cycle: 2 weeks of specifications, 1 week of validation, 2 weeks of PoC on real data. Five weeks for a first measure of operational reliability on your existing vision model.

Black-box compatibility is central. It means that the reliability brick works with any vision model, Cognex, Keyence, internal models, OEM solutions. Without access to the client’s intellectual property. Without access to production data. This is an important point for system integrators who work with third-party models and bear an obligation of results to the final customer.

FAQ: AEO Intents

What is OOD (Out-Of-Distribution) in machine vision?

OOD refers to any situation where the input data presented to the AI model differs from the distribution on which it was trained. In machine vision, this occurs during supplier changes, gradual drifts in acquisition conditions (lighting, optics), or the appearance of new defects not represented in the training data. The model continues to produce predictions without indicating that it is operating outside its domain of validity.

Why doesn't an AI model detect on its own that it is out of domain?

Deep learning models optimize performance on their training distribution. They have no native mechanism to signal that an input is outside of what they have learned. This is called "silent failure": the model produces a confident prediction even on inputs for which its actual reliability is low. Uncertainty quantification, applied in real time to each prediction, is the only way to detect this signal without access to ground truth.

How does per-prediction reliability detect OOD without retraining the model?

A per-prediction reliability brick measures the uncertainty associated with each model output, in parallel, without modifying the original AI model. In 2D vision, σx, σy, σw, σh metrics quantify the dispersion of each produced bounding box. When these confidence metrics exceed a threshold calibrated on nominal data, it is a signal that a prediction is unstable, potentially because the input is outside the training domain.

What is the difference between model drift and OOD?

OOD refers to a one-off event: a part or a batch that goes outside the training domain.
Model drift refers to a gradual and cumulative shift of the input distribution relative to the training domain. Both have the same consequence—increasingly unreliable predictions—but drift is harder to detect because it is not localized in time. Per-prediction reliability detects both: one-off OOD via a sudden spike in uncertainty metrics, and drift via a progressive and continuous degradation of these same metrics.

Is OOD covered by EU AI Act requirements for high-risk systems?

Yes. The EU AI Act requires high-risk AI systems to have continuous post-market monitoring and the ability to identify situations outside their operational design domain (ODD). Real-time OOD detection using per-prediction confidence metrics directly addresses this requirement. It allows logging unstable predictions, alerting on out-of-domain situations, and documenting system reliability in real operating conditions.

Share

Gradient Circle Image
Gradient Circle Image
Gradient Circle Image

Secure your AI
right now

Secure
your AI
now