For example, here is a snippet of code of old version of pytorch:
# classifier is the classifier of a torchvision pre-trained model
fc6 = nn.Conv2d(512, 4096, kernel_size=7)
fc6.weight.data.copy_(classifier[0].weight.data.view(4096, 512, 7, 7))
fc6.bias.data.copy_(classifier[0].bias.data)
Is there a better way to write this in version 0.4?
operations on .data are hidden from autograd. in this case, if you used weight and bias in a graph before this segment, and don’t do .data or .detach or with torch.no_grad(), autograd may complain about necessary tensors being modified inplace.
As I understood migration guide correctly, we can just simply replace code with .data to .detach if operation was not inplace and torch.no_grad() if it was inplace operation. Like in this example: Detach and .data
What happens if I have second_loss(func(x.detach()), func2(x.detach()), y), then it would only matter if x.detach(), being a function that I do not know its implementation, has an overhead or not.
Given that x.data is a property that does function calls in the background to re-wrap the underlying stuff in a new variable, I would not know the overhead of x.data either (but believe both are similar, after looking at the implementation).