Procedure_1:
Loss1= crit(pred,target)
Loss1.backward()
Loss2 = crit(pred,target)
Loss2.backward()
Optim.step()
Procedure_2:
Loss1= crit(pred,target)
Loss1.backward()
Optim.step()
Optim.zero_grad()
Loss2 = crit(pred,target)
Loss2.backward()
Optim.step()
Optim.zero_grad()
Are these two procedure same in nature meaning they will yield same amount of gradient update?
I am trying to switch to efficient b5 from b0 for a project but my gpu can only handle batchsize of 3 on b5. If i run with batchsize 3, the loss does not converge as well as before with b0. So i am thinking about gradient accumulation. Is the abovementioned way is the right way to do it?
Also what are the other way to mitigate the batch issue? I am trying to install apex for the last 3 hours but failed! :’(
This would yield the same results, if you are using an optimizer without any running estimates.
E.g. this example returns the same updated values for SGD
, but will fail for e.g. Adam
:
torch.manual_seed(2809)
model = nn.Linear(10, 1, bias=False)
w0 = model.weight.clone()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()
data = torch.randn(1, 10)
target = torch.randn(1, 1)
# 1)
output = model(data)
loss1 = criterion(output, target)
loss1.backward(retain_graph=True)
loss2 = criterion(output, target)
loss2.backward()
optimizer.step()
print(w0 - model.weight)
optimizer.zero_grad()
# 2)
torch.manual_seed(2809)
model = nn.Linear(10, 1, bias=False)
# make sure weight is equal
print((w0 == model.weight).all())
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
output = model(data)
loss1 = criterion(output, target)
loss1.backward(retain_graph=True)
optimizer.step()
optimizer.zero_grad()
loss2 = criterion(output, target)
loss2.backward()
optimizer.step()
optimizer.zero_grad()
print(w0 - model.weight)
Have a look at this post for some possible work flows.
What error are you getting while trying to install apex?
Hi,
Is there a way to get the same result with an adam optimizer before and after gradient accumulation ?
the apex error i am getting,
…
Compiling cuda extensions with
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130
from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0/bin
…
multi_tensor_sgd_kernel.cu
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1379): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1437): note: see reference to class template instantiation ‘ska::flat_hash_map<K,V,H,E,A>’ being compiled
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1383): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1391): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1473): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1510): note: see reference to class template instantiation ‘ska::flat_hash_set<T,H,E,A>’ being compiled
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1478): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1482): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1486): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
c:/users/rafi/anaconda3/envs/myenv2/lib/site-packages/torch/include\c10/util/flat_hash_map.h(1490): error C3203: ‘templated_iterator’: unspecialized class template can’t be used as a template argument for template parameter ‘_Ty1’, expected a real type
error: command ‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\nvcc.exe’ failed with exit status 2
Running setup.py install for apex … error
If you would like to simulate a larger batch size, the first approach should be the valid one.
The build error might be related to an older VS version as described here.