Given the it won’t be normal to feed a deep learning model the training data all at once (especially if the training data is 1000+) due the available RAM and other factors, I really want to know the effects of running the backward propagation after feeding all batches of data to the model per epoch versus running the backward propagation after feeding the model per batch.
Thanks to anyone who answers this.
Generally larger batches will provide a more accurate estimate of the gradient, while smaller batches will introduce more noise (which is often seen as beneficial up to a certain degree).
Chapter 8.1.3 - DeepLearningBook discusses these effects in more detail.
Thank you for your reply, but I just wanted to know if backward propagation should be carried out after each batch within one epoch or after the entire batch has been fed the model (per epoch)
Usually, you would update the model after each batch, but that’s not a hard rule as explained.
Depending on your use case, you might want to accumulate the gradients and update the model after a couple of batches (or all of them).
thanks alot I think i get it now