Conv.weight.data VS conv.weight

mohit117 · May 27, 2020, 5:08am

Hello,

I am confused when to use conv.weight.data VS conv.weight. For example the following code uses,

nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

but I also see at many places

nn.init.kaiming_normal_(m.weight.DATA, mode='fan_out', nonlinearity='relu')

Whic one to use? I am using PYTORCH 1.3.1.

To investigate I checked the source code on GITHUB for v1.3.1 (https://github.com/pytorch/pytorch/blob/v1.3.1/torch/nn/init.py#L353)

fan = _calculate_correct_fan(tensor, mode)
    gain = calculate_gain(nonlinearity, a)
    std = gain / math.sqrt(fan)
    with torch.no_grad():
        return tensor.normal_(0, std)

Since the function accepts a TENSOR I expect not to use .DATA and using it should throw a runtime error. But surprisingly no error comes. Hence I am confused which one to use (especially for v1.3.1)?

Thank you

ptrblck · May 27, 2020, 5:28am

Don’t use the .data attribute as it might yield unwanted side effects.
While you will be able to manipulate the underlying data without raising an error, Autograd won’t be able to track these operations and you might run into a variety of issues later (we had quite a few of these issues already here in the forum ).

This attribute is also removed step by step as seen in this PR by @albanD.

mohit117 · June 3, 2020, 4:17am

Thankyou for the explanation,
but then how to do the following operation,

conv_shuffle.weight.copy_(kernel)

here conv_shuffle is an instance of nn.Conv2d. I want to explicitly state its weights using kernel.

However, this results in the following error

conv_shuffle.weight.copy_(kernel)
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

but is rectified using the following.

conv_shuffle.weight.data.copy_(kernel)

So now that .data is discouraged, what other alternative do I have?

Thankyou

ptrblck · June 3, 2020, 6:41am

You could explicitly wrap this manipulation in a torch.no_grad() block:

conv = nn.Conv2d(3, 3, 3, 1, 1)
with torch.no_grad():
    conv.weight.copy_(torch.rand_like(conv.weight))