I am trying to write code for simple objective: I have usual PyTorch gradients, I make a copy of these gradients and add some noise to it. For each batch, I check the loss for the original gradients and I check the loss for the new gradients. I pick the gradients that gives me lower loss values. While I alter gradients, I do not wish to alter optimiser momentum parameters learnt via optimiser.step(). Can you let me know answers to the following two questions?

How do I reference all gradients at once (instead of by layer name like model.conv1.grad) may be like a list comprehension and assign values?

What mistake am I doing in my code? Do I need to set gradients to zero somehow like grad.zero_()? If so, I think I need requires_grad = True. So, is deep copy of gradients a solution? If I do a deep copy of the model, how do I assign the original optimiser parameters to the new model?
P.S.: For reference, I have also included important function:
def train_epoch(eta, model, train_loader, criterion):
model.train()
running_loss = 0.0
predictions = []
ground_truth = []
loss_den = 1
start_time = time.time()
optimiser = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
for batch_idx, (data, target) in enumerate(train_loader):
data = data.to(device)
target = target.to(device)
#previous model
outputs = model(data.float())
_, predicted = torch.max(outputs.data, 1)
total_predictions = target.size(0)
correct_predictions = (predicted == target).sum().item()
acc = (correct_predictions/total_predictions)*100.0
loss = criterion(outputs, target)
loss.backward()
optimiser.step()
#convGrad is the set of old gradients
conv1grad = model.conv1.weight.grad
conv2grad = model.conv2.weight.grad
conv3grad = model.conv3.weight.grad
noisyGrad1 = eta * np.abs(conv1grad.detach().cpu().numpy())
noisyGrad2 = eta * np.abs(conv2grad.detach().cpu().numpy())
noisyGrad3 = eta * np.abs(conv3grad.detach().cpu().numpy())
newGrad1 = conv1grad + torch.from_numpy(np.random.uniform(noisyGrad1, noisyGrad1))
newGrad2 = conv2grad + torch.from_numpy(np.random.uniform(noisyGrad2, noisyGrad2))
newGrad3 = conv3grad + torch.from_numpy(np.random.uniform(noisyGrad3, noisyGrad3))
model.conv1.weight.grad = nn.Parameter(torch.from_numpy(newGrad1.detach().numpy()).float())
model.conv2.weight.grad = nn.Parameter(torch.from_numpy(newGrad2.detach().numpy()).float())
model.conv3.weight.grad = nn.Parameter(torch.from_numpy(newGrad3.detach().numpy()).float())
#The new loss value for the new gradients is computed
outputs = model(data.float())
_, predicted = torch.max(outputs.data, 1)
total_predictions = target.size(0)
correct_predictions = (predicted == target).sum().item()
acc_new = (correct_predictions/total_predictions)*100.0
loss_new = criterion(outputs, target)
loss_den += 1
#calculuating confusion matrix
predictions += list(predicted.detach().cpu().numpy())
ground_truth += list(target.detach().cpu().numpy())
if loss_new.item() > loss.item():
model.conv1.weight.grad = conv1grad
model.conv2.weight.grad = conv2grad
model.conv3.weight.grad = conv3grad
running_loss += loss.item()
else:
running_loss += loss_new.item()
end_time = time.time()
running_loss /= loss_den
print('Training Loss: ', running_loss, 'Time: ',end_time  start_time, 's')
return running_loss, model