Model cascading

Model cascading                           Machine Learning; MLOps

noun phrase

Definition: A multistage inference setup in which inputs are processed by a sequence of models, typically beginning with a smaller or cheaper model and escalating only selected cases to a larger or more accurate model, in order to balance cost, latency, and quality. Recent ML-systems literature defines these systems as model cascades built around routing or deferral rules that decide which requests should be handled locally and which should be passed to a stronger model [Rabanser et al. 2025].

Example in context:In contrast, model cascades leverage a deferral rule for selecting the most suitable model to process a given request …” [Rabanser et al. 2025].

Synonyms: model cascade; cascaded models

Related terms: model router; cascade classifier; staged inference; deferral rule; speculative decoding

Добавить комментарий 0

Ваш электронный адрес не будет опубликован. Обязательные поля помечены *