I am trying to write code for simple objective: I have usual PyTorch gradients, I make a copy of these gradients and add some noise to it. For each batch, I check the loss for the original gradients and I check the loss for the new gradients. I pick the gradients that gives me lower loss values. While I alter gradients, I do not wish to alter optimiser momentum parameters learnt via optimiser.step(). Can you let me know answers to the following two questions?
How do I reference all gradients at once (instead of by layer name like model.conv1.grad) may be like a list comprehension and assign values?
What mistake am I doing in my code? Do I need to set gradients to zero somehow like grad.zero_()? If so, I think I need requires_grad = True. So, is deep copy of gradients a solution? If I do a deep copy of the model, how do I assign the original optimiser parameters to the new model?
P.S.: For reference, I have also included important function:
def train_epoch(eta, model, train_loader, criterion):
model.train()
running_loss = 0.0
predictions = []
ground_truth = []
loss_den = 1
start_time = time.time()
optimiser = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
for batch_idx, (data, target) in enumerate(train_loader):
data = data.to(device)
target = target.to(device)
#previous model
outputs = model(data.float())
_, predicted = torch.max(outputs.data, 1)
total_predictions = target.size(0)
correct_predictions = (predicted == target).sum().item()
acc = (correct_predictions/total_predictions)*100.0
loss = criterion(outputs, target)
loss.backward()
optimiser.step()
#convGrad is the set of old gradients
conv1grad = model.conv1.weight.grad
conv2grad = model.conv2.weight.grad
conv3grad = model.conv3.weight.grad
noisyGrad1 = eta * np.abs(conv1grad.detach().cpu().numpy())
noisyGrad2 = eta * np.abs(conv2grad.detach().cpu().numpy())
noisyGrad3 = eta * np.abs(conv3grad.detach().cpu().numpy())
newGrad1 = conv1grad + torch.from_numpy(np.random.uniform(-noisyGrad1, noisyGrad1))
newGrad2 = conv2grad + torch.from_numpy(np.random.uniform(-noisyGrad2, noisyGrad2))
newGrad3 = conv3grad + torch.from_numpy(np.random.uniform(-noisyGrad3, noisyGrad3))
model.conv1.weight.grad = nn.Parameter(torch.from_numpy(newGrad1.detach().numpy()).float())
model.conv2.weight.grad = nn.Parameter(torch.from_numpy(newGrad2.detach().numpy()).float())
model.conv3.weight.grad = nn.Parameter(torch.from_numpy(newGrad3.detach().numpy()).float())
#The new loss value for the new gradients is computed
outputs = model(data.float())
_, predicted = torch.max(outputs.data, 1)
total_predictions = target.size(0)
correct_predictions = (predicted == target).sum().item()
acc_new = (correct_predictions/total_predictions)*100.0
loss_new = criterion(outputs, target)
loss_den += 1
#calculuating confusion matrix
predictions += list(predicted.detach().cpu().numpy())
ground_truth += list(target.detach().cpu().numpy())
if loss_new.item() > loss.item():
model.conv1.weight.grad = conv1grad
model.conv2.weight.grad = conv2grad
model.conv3.weight.grad = conv3grad
running_loss += loss.item()
else:
running_loss += loss_new.item()
end_time = time.time()
running_loss /= loss_den
print('Training Loss: ', running_loss, 'Time: ',end_time - start_time, 's')
return running_loss, model
I read my own question again (typed while I was having some other work; Sorry for that). My question is somewhat difficult to comprehend when seeing for the first time. So, let me ask questions one by one. First question:
How do I access all weights of a model? Currently, I am accessing with layer names like model.conv1.weight.grad. But, is there anything like model.weight.grad to access all gradients (setter and getter)?
To all PyTorch developers, thanks for the beautiful library and to those helping, thanks for your time.
Thank you so much for this. It answered my first question so well. Let me consolidate my code and ask the same question better.
My next and last question is how I can undo loss.backward() for a given batch. I wish to compute the autograd backprop gradients. But, I do not wish to apply those gradients to model computation graph. How could I do that?
The final objective is to get two sets of gradients for each batch (regular and noisy one) and apply the gradient set that produces lower loss values. How can I achieve that?
The cleaner code (thanks to @albanD ) is as follows:
def train_epoch(eta, model, train_loader, criterion):
model.train()
running_loss = 0.0; predictions = []; ground_truth = []; loss_den = 1
start_time = time.time()
optimiser = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
for batch_idx, (data, target) in enumerate(train_loader):
data = data.to(device); target = target.to(device)
#previous model
outputs = model(data.float())
loss = criterion(outputs, target)
loss.backward()
optimiser.step()
#convGrad is the set of old gradients
oldGrad = list()
for param in model.parameters():
convGrad = param.grad
oldGrad.append(convGrad)
noisyGrad = eta * np.abs(convGrad.detach().cpu().numpy())
newGrad = convGrad + torch.from_numpy(np.random.uniform(-noisyGrad, noisyGrad))
param.grad = nn.Parameter(torch.from_numpy(newGrad.detach().numpy()).float())
#The new loss value for the new gradients is computed
outputs = model(data.float())
loss_new = criterion(outputs, target)
loss_den += 1
if loss_new.item() > loss.item():
for paramIdx, param in enumerate(model.parameters()):
param.grad = oldGrad[paramIdx]
running_loss += loss.item()
else:
running_loss += loss_new.item()
end_time = time.time()
running_loss /= loss_den
print('Training Loss: ', running_loss, 'Time: ',end_time - start_time, 's')
return running_loss, model
You can use autograd.grad() to get the value of the gradient as a list and not modify the .grad fields of the parameters.
You will need to update the .grad fields before calling the optimizer.step() function Iām afraid.
Also I am fairly confused by this line: param.grad = nn.Parameter(torch.from_numpy(newGrad.detach().numpy()).float()) why do you wrap the grad in a Parameter? And why do you send it to numpy?
Thank you so much for your time and helpful reply.
Also I am fairly confused by this line: param.grad = nn.Parameter(torch.from_numpy(newGrad.detach().numpy()).float()) why do you wrap the grad in a Parameter? And why do you send it to numpy?
I tried reading it up before replying. I was going through 11-785 Deep Learning and I think PyTorch is similar. I am still confused by autograd (even after a week) if I have to manually apply gradients myself.
From what I know, Parameter wraps a numpy array as a tensor. Because, I want to select the gradients manually and apply at each iteration, my gradients are in numpy array (because I am more familiar with numpy as compared to PyTorch).
You will need to update the .grad fields before calling the optimizer.step() function Iām afraid.
That is not true.
To get a Tensor from a numpy array, you can do t = torch.from_numpy(your_numpy_array).
A nn.Parameter is a thin wrapper around Tensor that has special meaning in the torch.nn library. Namely it is a parameter of a Module and so will be returned by mod.parameters() call.