Gradient accumulation Deep Learning
noun phrase
Definition: A training technique in which gradients from multiple mini-batches are accumulated before performing an optimizer step, effectively simulating a larger batch size [PyTorch docs; Hugging Face training docs].
Example in context: “For all of our experiments, we train on various hardware but fix the batch size to 64 using gradient accumulation and leverage the hyperparameters in 12.” [Campos, Zhai 2023]
Related terms: accumulated gradients, mini-batch training, effective batch size scaling