How to implement accumulated gradient in pytorch (i.e. iter_size in caffe prototxt)

zhuyi490 · June 2, 2017, 11:01pm

@Hengck @smth Hi, I have a quick question. As mentioned in here,

loss += criterion(outputs, Variable(labels.cuda()))

this will build the graph again and again inside the loop, which may increase memory usage. So should I just write

loss = criterion(outputs, Variable(labels.cuda()))

This will also accumulate the gradients, right? I am confusing about which one to use, “=” or “+=”? I just want to have the effect of “iter_size” in Caffe to train large models. Thanks.