The backpropagation in end-to-end

WormWakedEarlyEatenB · December 27, 2019, 6:29am

I’m confused about the backpropagation in end-to-end experiments. I replace the layer fc of Resnet50 with a CNN classifier. The input of CNN is a feature-map with size 100x2048, each row of the feature-map is generated by Resnet50 and they are just simply concatenated to form a feature-map, which means that several outputs of Resnet50 form one input of CNN classifier. So is it possible to backpropagate gradients from CNN classifier to Resnet50 when the loss function is based on the prediction of CNN classifier?

albanD · December 27, 2019, 9:26am

Hi,

If you don’t change the setting of the resnet50 from torchvision, then its weights will require gradients and so when you call .backward() on your loss, all the .grad field in the resnet will be populated and you can perform optimizer steps on it.

WormWakedEarlyEatenB · December 31, 2019, 2:41am

def finetuneTrain(model, CNNModel, device, train_loader, optimizer, epoch):
    model.train()
    correct = 0
    for (data, target) in train_loader:
        target = target.to(device)
        features = []
        for volume_count in range(len(data)):
            volume = torch.Tensor(data[volume_count])
            volume = volume.to(device)
            feature = model(volume).cpu().detach().numpy()
            tmp = np.zeros((100, feature.shape[1]), dtype=np.float32)
            volume_size = feature.shape[0]
            tmp[:volume_size, :] = feature
            features.append(tmp)

        optimizer.zero_grad()
        features = torch.Tensor(features).to(device)
        output = CNNModel(features)
        loss = F.cross_entropy(output, target)
        loss.backward()
        for name, param in model.named_parameters():
            if 'layer4' in name or 'layer3' in name:
                print(param.grad)
        optimizer.step()

Here is my training function, but the output is all None, I think that the backpropagation failed.

albanD · December 31, 2019, 9:58am

You are explicitly calling .detach() on the output of the model. So you explicitly break the history as no gradient will be computed beyond the detach operation. You will need to remove it (and remove the use of numpy Tensors as we cannot differentiate through these operations, you need to use only Tensors).

WormWakedEarlyEatenB · December 31, 2019, 10:08am

I just followed the suggestion of PyCharm and didn’t understand .detach() well. I’ll try as your suggestion.