In v0.4, is there any reason to still use

(Wenliang Dai) #1

For example, here is a snippet of code of old version of pytorch:

# classifier is the classifier of a torchvision pre-trained model
fc6 = nn.Conv2d(512, 4096, kernel_size=7)[0], 512, 7, 7))[0]

Is there a better way to write this in version 0.4?

Thanks in advance!

(Jerome R) #2

You simply need to remove the .data

(Simon Wang) #3

operations on .data are hidden from autograd. in this case, if you used weight and bias in a graph before this segment, and don’t do .data or .detach or with torch.no_grad(), autograd may complain about necessary tensors being modified inplace.

(Artyom) #4

As I understood migration guide correctly, we can just simply replace code with .data to .detach if operation was not inplace and torch.no_grad() if it was inplace operation. Like in this example: Detach and .data

(Simon Wang) #5

Yes you can generally do that, unless you are doing some hacks that you want hidden from autograd :wink:

(dashesy) #6

I have a layer that is forward-only and using .data was the simplest way to implement is.

    second_loss(func(, y)

I could use no_grad context, but that would have been not as elegant because I do want grad for second_loss, just not for func.

(Artyom) #7

why not to use x.detach() ?

(dashesy) #8

What happens if I have second_loss(func(x.detach()), func2(x.detach()), y), then it would only matter if x.detach(), being a function that I do not know its implementation, has an overhead or not.

(Thomas V) #9

Given that is a property that does function calls in the background to re-wrap the underlying stuff in a new variable, I would not know the overhead of either (but believe both are similar, after looking at the implementation).

The migration guide fairly clearly advises

Best regards