During dealing with the in-place operation problem of
autograd, I faced sth like below.
def __init__(self, *args):
self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
self.X = torch.ones(200).long().to(device)
def forward(self, *args1):
print(self.X._version) # prints out 1 not 0
print(torch.ones(1).long().to(self.device)._version) # prints out 0
1.Why is it happening?
2.Is it a good practice (or sth desired to be done) to maintain the
torch.Tensor._version during the forward pass (say if it starts with 1 then it maintains 1 until it meets
Why do you worry about the version being changed?
I would say it’s most likely changed when doing the weight initialization?
@albanD You mean that(_version changing during the forward pass) hardly happens in the most situations right?
Actually I want to use something similar to
tf.placeholder in my model for sake of performance issue. And slicing and indexing to change its values are messing up
reason for using placeholder even with pytorch
The problem I’m dealing with is seq2seq based language model but it deploys the random variables labeled to each tokens and their intermediate vector representations used for every timestep such that I cannot just run rnn with packed_sequence or anything similar to it.
Thus, I’m running it token by token but trying to run it with minibatches of examples (like batches of tokens by tokens). So in my
model.py looks sth like below
self.predicted_tokens_placeholder = torch.zeros(batchsize, maxlen, vocabsize).to(device)
def forward(self, *args):
#updating placeholder with predicted token
ph = self.predicted_tokens_placeholder.clone()
ph[:, tstep] = b_tokens_predicted
other self.variables are used here similarly, requires grad=True,
even without .clone() for carrying intermediate representations
#indexing self.variables like above example happens alot.
#oops those are in-place ops --> autograd problem occurs
And I found out (just today) to make autograd work correct, I need to avoid in-place operations. That is the reason I wonder about
torch.Tensor._version stays intact during the forward pass in general.
_version changes whenever you do inplace operations on a Tensor.
But there is a good reason for that, it’s because changing that Tensor’s value could lead to wrong result from the autograd.
What is the reason why you cannot append the predicted tokens in a python list and then cat them at the end?
Thx. I think I wrote codes w/o considering autograd in-place condition (now it looks like numpy or tf rather than torch). Hope I don’t need to redesign the code but just replacing in-place ops with torch supported equivalents for now
Yes it is a bit tricky.
To be clearer the change I propose is the following, hope this is easy enough to do in your code.
placeholder = torch.zeros(1000, 5, 5)
for idx in range(1000):
some_tensor = torch.rand(5, 5)
results = 
for idx in range(1000):
return torch.stack(results, 0)
Note that because of the custom allocator used by pytorch, the second one won’t be slower by any significant amount compared to the first one even if you create the big tensor only once and reuse it in the first case !
Thx @albanD I didn’t know that there was custom allocator backing up behind the scene.
In my case, with the assumption that codes executes line by line even in the GPU computation, I just used
placeholder.data[:, position] = some_tensor instead of
placeholder[:, position] = some_tensor. It made me circumvent the autograd problem.
I know I need to check if autograd works as expected but I learn a new thing from your comments! Thx alot!
Do not use
.data ! it breaks the autograd !
.data was used before is now replaced by
OMG @albanD Thx a lot! I will follow that. Thank you so much again!