Cost function backward error

Is it possible to forward a batch of images, let say 64 images, through a network and backward image by image? Here is my code:

def train(epoch):
    global steps
    global s
    global optimizer
    epochLoss = 0
    for index, (images, labels) in enumerate(trainLoader):
          if s in steps:
                learning_rate = learning_rate * 0.1
                optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=momentum, weight_decay=decay)
          if cuda:
          images = images.cuda()
          images = V(images)
          optimizer.zero_grad()
          output = net(images).cpu()  # 64*95*7*7
          loss = 0
          for ind in range(images.size()[0]):    # images.size()[0] = 64
               target = V(jsonToTensor(labels[ind]))
               cost = criterion(output[ind,:,:,:].unsqueeze(0), target)
               loss += cost.data[0]
               cost.backward(retain_variables=True)     <---- Error Occurres here!
         epochLoss += loss
         optimizer.step()
         print("(%d,%d) -> Current Batch Loss: %f"%(epoch, index, loss))
         s = s + 1
    losses.append(len(epochLoss), epochLoss)

In above code, criterion is my customized cost function which gets two tensors as input. i have tried above code but I received an error like this:

RuntimeError: inconsistent tensor size at /py/conda-bld/pytorch_1490979338030/work/torch/lib/TH/generic/THTensorMath.c:827

could you please tell me what is the problem? How can i solve this problem?

Yes, you should be able to do that. It looks one of your sizes doesn’t match up, but it’s hard to tell where from your snippet. Can you post a link to a full working example?

Here’s a simple snippet showing multiple calls to backards:

import torch
a = torch.autograd.Variable(torch.randn(5, 5), requires_grad=True)
b = torch.autograd.Variable(torch.randn(5, 5), requires_grad=True)
c = a @ b

for i in range(5):
  cost = c[i,:].sum()
  cost.backward(retain_variables=True)

Thanks for your response @colesbury ! Actually I forward 64 images at one time. So the size of the output is 649577. When I want to backward, I don’t backward based on whole batch at one time (Because of the complexity of my cost function- it is a little hard to make it parallel). Actually it is image by image. In another word, In my point of view, what backward function expects is a gradient tensor with size 649577 as same as output size. So I think I should make such a tensor. Actually It is my belief. Am I right?

So in backward function, I try to make a tensor with above size and backward it!

1 Like