About backward method

Procedure_1:
Loss1= crit(pred,target)
Loss1.backward()
Loss2 = crit(pred,target)
Loss2.backward()
Optim.step()

Procedure_2:
Loss1= crit(pred,target)
Loss1.backward()
Optim.step()
Optim.zero_grad()
Loss2 = crit(pred,target)
Loss2.backward()
Optim.step()
Optim.zero_grad()

Are these two procedure same in nature meaning they will yield same amount of gradient update?

I am trying to switch to efficient b5 from b0 for a project but my gpu can only handle batchsize of 3 on b5. If i run with batchsize 3, the loss does not converge as well as before with b0. So i am thinking about gradient accumulation. Is the abovementioned way is the right way to do it?
Also what are the other way to mitigate the batch issue? I am trying to install apex for the last 3 hours but failed! :’(

This would yield the same results, if you are using an optimizer without any running estimates.
E.g. this example returns the same updated values for SGD, but will fail for e.g. Adam:


torch.manual_seed(2809)
model = nn.Linear(10, 1, bias=False)
w0 = model.weight.clone()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

data = torch.randn(1, 10)
target = torch.randn(1, 1)

# 1)
output = model(data)
loss1 = criterion(output, target)
loss1.backward(retain_graph=True)
loss2 = criterion(output, target)
loss2.backward()
optimizer.step()

print(w0 - model.weight)
optimizer.zero_grad()

# 2)
torch.manual_seed(2809)
model = nn.Linear(10, 1, bias=False)
# make sure weight is equal
print((w0 == model.weight).all())
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

output = model(data)
loss1 = criterion(output, target)
loss1.backward(retain_graph=True)
optimizer.step()
optimizer.zero_grad()
loss2 = criterion(output, target)
loss2.backward()
optimizer.step()
optimizer.zero_grad()

print(w0 - model.weight)

Have a look at this post for some possible work flows.

What error are you getting while trying to install apex?

Hi,
Is there a way to get the same result with an adam optimizer before and after gradient accumulation ?

the apex error i am getting,

Compiling cuda extensions with
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130
from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0/bin

multi_tensor_sgd_kernel.cu
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1379): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1437): note: see reference to class template instantiation ‘ska::flat_hash_map<K,V,H,E,A>’ being compiled
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1383): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1391): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1473): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1510): note: see reference to class template instantiation ‘ska::flat_hash_set<T,H,E,A>’ being compiled
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1478): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1482): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1486): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1490): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
error: command ‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\nvcc.exe’ failed with exit status 2
Running setup.py install for apex … error

If you would like to simulate a larger batch size, the first approach should be the valid one.

The build error might be related to an older VS version as described here.