Bug in autograd module?

jmaronas · November 12, 2018, 4:13pm

Hello.

Today I was performing some experiments using a variational autoencoder. I realized about something quite surprising that I did not expect. Hope someone can help me.

I was sampling from the posterior distribution and wanted to save an image of the latent space. The image on the latent space (which comes from a gaussian distribution) is normalized to fit in range 0-1. I use this code:

mean_p,logvar_p=variatonal.forward(x_test)#encode params
z_lat=sampler([mean_p,logvar_p],'gaussian',mean_p.shape)#sample from gaussian distribution
z_=normalize(z_lat)#normalize latent code
save_image(z_)#save latent code
mean,logvar=reconstruction.sample(z_lat)#decoder parameters
x_=sampler([mean,logvar],decoder_type,mean.shape)#sample from decoder 
x_=normalize(x_)
save_image(x_)

Normalize is a function that does this (assume batch=100)

def normalize(x_):
	min_val_aux,ind=torch.min(x_.data,1)   
	max_val_aux,ind=torch.max(x_.data,1)
	min_val = torch.zeros(100,1).cuda()
	max_val = torch.zeros(100,1).cuda()                                                     
	min_val[:,0]=min_val_aux                                                          
	max_val[:,0]=max_val_aux 
	a,b=x_.shape
	min_val=min_val.expand(100,b)
	max_val=max_val.expand(100,b)
	
	x_.data=(x_.data-min_val)/(max_val-min_val)
	return x_.data.cpu()

What is incredible is that the function normalize is modifying the content of z_lat!!! so when decoding z_lat I observed mode collapse. At the beginning I though it was because a FC VAE encoder is not expressive enough, however I realized that by taking out the call to z_=normalize(z_lat)#normalize latent code everything works find. Moreover if normalize acts directly on z_lat.data instead of z_lat everything works so it seems something related to autograd. Should not a call to a python function pass the data instead of a pointer to a variable? It seems that autograd pass variable references to functions instead of actually copying data.

I am impressed by this fact.

albanD · November 12, 2018, 4:23pm

Hi,

The thing is that you should not use .data anymore. It is the old api and is really bugprone as you can see.
If the goal is to get a tensor with the same content but without the autograd graph, you should use .detach(). It will return a tensor with the same content and requires_grad=False.
If the goal is to make some changes on a tensor that are not tracked by the autograd, then you should do these ops within a with torch.no_grad(): block. It will allow you to do any op on a tensor as if it was not requirering any gradient and these ops will not be recorded by the autograd engine.

jmaronas · November 12, 2018, 4:28pm

Well, this is an old code (i did it like 9 months ago and rescue now because i remember that mode collapse and want to compare with the new one). Actually, in that code, z_lat is a Variable, and if I pass .data everything works. However, if normalize() is expecting a torch.Variable that is what happens.

So for the current release I understand that .data is also bugged… Are you planning to take it out in future releases? It is a bit confusing. Can one copy a tensor using detach().data? Should that work?

Well, at least I have seen that aVAEs can suffer from mode collapse when linearly transforming the latent space… One can always learn something interesting even from a bug

Thanks for your reply

jmaronas · November 12, 2018, 4:33pm

Anyway, I think that when this kind of bugs appears (this was in version 0.3.0) either a warning should appear when using .data or it should be taken out immediately. This bugs are very difficult to detect for people that uses pytorch since long time.

Even if I search information related to torch.data there is no post, nor docs, nor general information talking about this bug…

thanks

justusschock · November 12, 2018, 4:34pm

I think .data is still used inside the optimizers (and maybe other internals) which would result in tons of warnings.

jmaronas · November 12, 2018, 4:36pm

Well, something should be done. At least an initial warning when importing torch or just take out .data from the API, or something.

Or put it here https://pytorch.org/blog/pytorch-0_4_0-migration-guide/

The problem is that this bug appears at least in version 0.3.0.

justusschock · November 12, 2018, 4:39pm

As you mentioned the migration guide: it is explicitly mentioned there, as there is a section “what about .data ?”

But in general I agree with you.
@albanD isn’t it possible to replace all .data calls with .detach() in the internals?

albanD · November 12, 2018, 4:44pm

@jmaronas This is not a bug !
This is the intended behavior of .data. Here you have a given variable x_, when you do x_.data=, then you modify what x_ contains. It is an inplace change into x_.

@justusschock either .detach() or a with torch.no_grad() block depending on what what the .data used for. Yes that should done if you have someone has some free time

jmaronas · November 12, 2018, 4:52pm

Well. I agree with you 50%. This is why.

If you have a code that modifies .data you expect to modify what x_.data contains, that is true. The problem is, as far as I now, that when you pass an argument to a python function the input argument is not modified.

a=1
def function(a):
     return a+1
print function(a)
print a

a is still 1. In autograd this does not happens. In autograd if “a” is autograd variable its content would be modified. If a is not a variable it would not.

I think that this is not the tipical behaviour of python code and that is why I call it a “bug”. In my example I understood that the variable inside normalize(), x_, is independent from the argument passed when calling normalize(). I expect that internal in the function to the variable would not change the variable in the main program. With the next function everything would persist.

def normalize(y_):
	min_val_aux,ind=torch.min(y_.data,1)   
	max_val_aux,ind=torch.max(y_.data,1)
	min_val = torch.zeros(100,1).cuda()
	max_val = torch.zeros(100,1).cuda()                                                     
	min_val[:,0]=min_val_aux                                                          
	max_val[:,0]=max_val_aux 
	a,b=y_.shape
	min_val=min_val.expand(100,b)
	max_val=max_val.expand(100,b)
	
	y_.data=(y_.data-min_val)/(max_val-min_val)
	return y_.data.cpu()

And the “bug” would persists.

Hope it is clearer.

jmaronas · November 12, 2018, 4:54pm

Yes, but it does not says nothing related to what I am exposing in this thread.

albanD · November 12, 2018, 5:00pm

I disagree with you here, it works that way for numbers and strings only in python. Other objects have the same behavior as variables here.
In your case you pass an object and modify one of it’s field (.data in that case), it’s like working with a list (or dict) and modifying one of it’s elements:

a=[1]
def function(a):
     a[0] = 2
     return a
print function(a)
print a

Actually any python object that can contain a subfield like .data will always have this behavior.

jmaronas · November 12, 2018, 5:06pm

Yes totally agree. But one cannot expect that with torch.autograd happens and not with torch.tensor as both have very similar functionality (apart from automatic differentiation). It would be nice to include in documentation or migration guide, that is what I mean.

albanD · November 12, 2018, 5:12pm

But one cannot expect that with torch.autograd happens and not with torch.tensor.

You mean that if you pass a torch.Tensor to that function it won’t have the same behavior? I would expect it does (or just crash for old pytorch versions where .data did not exist on torch.Tensors).

For the migration guide, isn’t what @justusschock linked here about migration for 0.4.0 what you’re looking for?

jmaronas · November 12, 2018, 5:15pm

I mean that if I pass .data to normalize everything works fine, the modifications are not done to the variable in the main function:

x=Variable(torch.zeros(10,10).normal_(0,1)
new_x=normalize(x.data)
print new_x.sum()==x.data.sum() #return false

###############
###############
x=Variable(torch.zeros(10,10).normal_(0,1)
new_x=normalize(x)
print new_x.sum()==x.data.sum() #return True

In this example normalize function is different. For the last case is the example I posted. For the first one is:

def normalize(y_):
	min_val_aux,ind=torch.min(y_,1)   
	max_val_aux,ind=torch.max(y_,1)
	min_val = torch.zeros(100,1).cuda()
	max_val = torch.zeros(100,1).cuda()                                                     
	min_val[:,0]=min_val_aux                                                          
	max_val[:,0]=max_val_aux 
	a,b=y_.shape
	min_val=min_val.expand(100,b)
	max_val=max_val.expand(100,b)
	
	y_=(y_-min_val)/(max_val-min_val)
	return y_.cpu()

i.e. take out .data from y_

albanD · November 12, 2018, 5:26pm

Which is the same as if in my list example I give a[0] to the function instead of a. It won’t change the content of a anymore. Because it never had the object a to begin with.

I’m not sure to understand what you were expecting to happen differently than the following?

If you change input.data (or input[0] in my list example), then you modify your input object. And so this change will be seen every place where you use this object.
If you pass the content of input.data (or input[0] in my list example) and then just use it for some computation (not modifying any object inplace), then you don’t modify any existing object and so no existing objects will be changed.

jmaronas · November 12, 2018, 5:39pm

Well, what I mean is that both torch.autograd and torch.Tensor are objects. As far I understand torch.autograd.data is a torch.tensor

What I expect to happen is that if I modify a torch.autograd class variable and that change is visible for the whole program, i.e is implicit, why torch.tensor variable modifications are not. I would expect the same behavior for both of them, as both are class variables.

In your example if you have a=[[1,2,3,4]] and you pass a[0], i.e. a list object the same holds because the internal object to the list is a class variable that supports implicit modification (as you clearly state before):

a=[[1,2,3]]
def function(a):
        a[0]=2
        return a
function(a[0]) #a now would be a[2,2,3]

I do not see why on torch.autograd the implicit operation is preserved while for a torch.tensor is not.

albanD · November 12, 2018, 6:03pm

Ho ok,
The confusion is that y_ = xxx in you second example is not an inplace change of the tensor ! a[0] is an inplace change of the list, but if you were doing a = 2 it would not change the original list. It’s just reassigning the variable named “a” or “y_” to a different python object. If you were doing y_.copy_(xxx) or y[0] = xxx then the change would be visible as well.

The code below cover all cases for list and tensors.
You will see that in each case, inplace modifications will modify what was given as input.

import torch

def inplace_change(a):
    print("doing inplace change")
    a[0] = 2
    return a
def inplace_tensor_change(a):
    print("doing inplace change on a.data")
    # This is bad, never use .data in proper code !
    a.data = torch.rand(5)
    return a
def out_of_place_change(a):
    print("doing out of place change")
    a = 2
    return a

a=[[1, 2, 3]]
print("working with ", a)
print("Giving a[0]")
print(inplace_change(a[0]))
print(a)
print("")
a=[[1, 2, 3]]
print("working with ", a)
print("Giving a")
print(inplace_change(a))
print(a)
print("")
a=torch.Tensor([1, 2, 3])
print("working with ", a)
print("Giving a")
print(inplace_change(a))
print(a)
print("")
a=torch.Tensor([1, 2, 3])
print("working with ", a)
print("Giving a")
print(inplace_tensor_change(a))
print(a)
print("")
a=torch.Tensor([1, 2, 3])
print("working with ", a)
print("Giving a.data")
print(inplace_change(a.data))
print(a)
print("")


a=[[1, 2, 3]]
print("working with ", a)
print("Giving a[0]")
print(out_of_place_change(a[0]))
print(a)
print("")
a=[[1, 2, 3]]
print("working with ", a)
print("Giving a")
print(out_of_place_change(a))
print(a)
print("")
a=torch.Tensor([1, 2, 3])
print("working with ", a)
print("Giving a")
print(out_of_place_change(a))
print(a)
print("")
a=torch.Tensor([1, 2, 3])
print("working with ", a)
print("Giving a.data")
print(out_of_place_change(a.data))
print(a)
print("")

jmaronas · November 12, 2018, 7:43pm

After experimenting a bit my problem was thinking that arguments are copied instead of passed by reference. The problem is that

x=x+1

Is creating a new variable while

x.data=x+1

is accessing a method. Taking in account that arguments are passed by reference instead of copied in a function call yield my error. In fact the key error is not coding this like:

def normalize(x_):
	min_val_aux,ind=torch.min(x_.data,1)   
	max_val_aux,ind=torch.max(x_.data,1)
	min_val = torch.zeros(100,1).cuda()
	max_val = torch.zeros(100,1).cuda()                                                     
	min_val[:,0]=min_val_aux                                                          
	max_val[:,0]=max_val_aux 
	a,b=x_.shape
	min_val=min_val.expand(100,b)
	max_val=max_val.expand(100,b)
	
	x_=(x_.data-min_val)/(max_val-min_val)# change x_.data= to x=
	return x_.cpu()

Entirely my mistake. Normally when I code in python I forget about memory management. I only activate that part of my brain when coding C or C++. An that, in addition to the fact that in previous versions of pytorch combining computation of Variables and tensor (for example gaussian noise addition) was reduced to accessing .data lots of times I did not take the problem presented when coding this function.

Thanks @albanD for your time.

albanD · November 13, 2018, 10:21am

No problem !
In python, only number, strings and booleans (I might be missing things here like functions but I’m not sure) are passed by value, everything else is passed by reference.