Gradient accumulation – Artificial Intelligence terminology

Gradient accumulation Deep Learning

noun phrase

Definition: A training technique in which gradients from multiple mini-batches are accumulated before performing an optimizer step, effectively simulating a larger batch size [PyTorch docs; Hugging Face training docs].

Example in context: “For all of our experiments, we train on various hardware but fix the batch size to 64 using gradient accumulation and leverage the hyperparameters in 12.” [Campos, Zhai 2023]

Related terms: accumulated gradients, mini-batch training, effective batch size scaling