Hello all,
I’ve been trying to train a federated learning scheme with 25 different workers, so I need to get the gradient vector of each model in my train method. Here is the code:
def train(model, device, train_loader, optimizer, epoch, num_epochs, worker):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data.float())
loss = criterion(output, torch.max(target, 1)[1])
loss.retain_grad()
loss.backward()
a = loss.grad
print(a)
optimizer.step()
print ('Epoch %d/%d, Loss: %.4f'
%(epoch+1, num_epochs, loss.data))
worker.train_loss.append(loss.data)
I am using cross entropy loss with adam optimizer. As an output, I always get 1 as the gradient of each network (losses belong to independent networks). When I remove “loss.retain_grad()”, I always see None as the gradient value.
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3039
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3212
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3052
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3079
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3136
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.2929
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3002
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3018
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.2962
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3137
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3000
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3146
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3238
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3346
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3222
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3005
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3209
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.2871
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3237
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.2833
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3195
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3187
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3118
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3188
tensor(1., device='cuda:0')
Epoch 1/10, Loss: 2.3252
I want to get a gradient vector with a size same with my number of weights (which is 7850). What should I do? Thanks in advance.