Offline inference Machine Learning; MLOps
noun phrase
Definition: An inference mode in which a trained model generates predictions in advance for a set or batch of unlabeled examples, after which those predictions are cached or stored for later retrieval rather than computed on demand at serving time [Google ML Glossary].
Example in context: “As the latency objective is highly relaxed, maximizing throughput is the primary goal in offline inference.” [Han et al. 2024]
Synonyms: static inference; batch inference
Related terms: online inference; dynamic inference; model serving; cached predictions; batch prediction