Tensor and tensor.data

page · May 21, 2018, 9:08am

Hi, I am trying to reproduce the DCGAN tutorial code.
In the code, there is weights_init() method to initialize weights of model.

m.weight.data.normal_(0.0, 0.02)

As I know, in 0.4.0 the Variable is replaced by the Tensor.
So I delete the data cause its already Tensor.

m.weight.normal_(0.0, 0.02)

Then I got an error below.

RuntimeError: Leaf variable was used in an in-place operation

It seems like I still use Variable under Tensor.

Could anyone help to understand current relationship of Tensor and Variable?
Why m.weight.normal_(0.0, 0.02) without .data doesn’t work?

Thanks.

ptrblck · May 21, 2018, 7:44pm

A leaf variable is a variable, which you created directly and which is not a result of an operation.
So in your case the parameters of your model are leaf variables and shouldn’t be modified in-place.
You can check if with:

m.weight.is_leaf

Using .data is still the way to go to initialize your model. @tom has a nice explanaition in this post.

.data wasn’t removed in the latest version and still has similar semantics. Have a look at the Migration Guide.

rasbt · May 21, 2018, 9:39pm

So, when I understand correctly, both

    self.linear = torch.nn.Linear(num_x, num_y)
    self.linear.weight.data.zero_()
    self.linear.bias.data.zero_()

and

    self.linear = torch.nn.Linear(num_x, num_y)
    tmp = self.linear.weight.detach()
    tmp.zero_()

    tmp = self.linear.bias.detach()
    tmp.zero_()

would do the same thing in PyTorch 0.4, but detach() is generally recommended (the lower approach), since modifying .data in some different cases would lead to weird results (i.e., changing the .data of leaf variables during backpropagation and then calculating the gradient incorrectly, whereas detach() would result in a “safer” error instead)?

Just curious, what’s a use-case of .data now that Variables have been deprecated, is this purely kept due to backwards-compatibility reasons?

page · May 22, 2018, 6:28am

Thanks for helps!
So the key is that somehow Pytorch blocks in-place operation on a leaf variable and .data is way to be a non-leaf variable.

Is there special reason for stricter blocking in case of the leaf variable?
I read that in-place operations are discouraged but not clear which of two is relevant with the leaf variable case.

rasbt · May 22, 2018, 12:45pm

on a leaf variable and .data is way to be a non-leaf variable.

It would still be a leaf variable, but as far a I understand, .data would allow you to workaround the fact that it is a leaf variable and would allow an in-place modification during a forward pass nonetheless, which can be dangerous (since users may be unaware that in certain cases that this would lead to “unintended” gradient computations).

Is there special reason for stricter blocking in case of the leaf variable?
I read that in-place operations are discouraged but not clear which of two is relevant with the leaf variable case.

Have a look at the “What about .data?” section in the PyTorch 0.4 migration guide: Redirecting…

page · May 22, 2018, 2:57pm

Thanks for making my wrong words right.
Yes it will definitely stay as a leaf node.

About the migration part, I read and understand .data could be dangerous so detach() is recommended if in-place is necessary.

But when read it, I thought the part was saying about normal inner node cases. And there might be more critical issue for the leaf node case because pytorch not gave the strict “leaf variable was used …” error in this normal case.

Now it seems I made too much assumption.

Thanks for help!

tom · May 22, 2018, 6:22pm

Now that you pinged me, I prefer to initialize with with torch.no_grad():.
But yeah, lots of places use .data to modify parameters (initialization, optimizers).

Best regards

Thomas

page · May 22, 2018, 9:46pm

Could you post param init code with

with torch.no_nograd():

?

tom · May 24, 2018, 9:28am

Most of the stock init functions use it these days, see torch/nn/init.py.

Best regards

Thomas

github.com

pytorch/pytorch/blob/master/torch/nn/init.py

import math
import random
import warnings

import torch


def calculate_gain(nonlinearity, param=None):
    r"""Return the recommended gain value for the given nonlinearity function.
    The values are as follows:

    ================= ====================================================
    nonlinearity      gain
    ================= ====================================================
    Linear / Identity :math:`1`
    Conv{1,2,3}D      :math:`1`
    Sigmoid           :math:`1`
    Tanh              :math:`\frac{5}{3}`
    ReLU              :math:`\sqrt{2}`
    Leaky Relu        :math:`\sqrt{\frac{2}{1 + \text{negative_slope}^2}}`

This file has been truncated. show original

dreamyun · October 12, 2018, 1:40am

Can you give a detailed example? what’s your meaning is like as follows?

with torch.no_grad():
        m.weight.normal_(0.0, 0.02)

is it right?

ptrblck · October 12, 2018, 1:50am

Yeah, it’s the way to go. Previously, the underlying .data was used to initialize the weights, but it seems all was merged to syntax in your example.