Terminate called after throwing an instance of 'c10::IndexError' what(): index 1701270418 is out of bounds for dimension 0 with size 500

Ramansh_Sharma · October 23, 2023, 4:49pm

Hi,

I am to trying to train a model in which I am updating only some indices of a PyTorch tensor with the evaluations of a network. For this, I am using the

output_tensor.index_put_({torch::from_blob(indices.data(), {indices.size()}, torch::kInt32)}, output_tensor.index({torch::from_blob(indices.data(), {indices].size()}, torch::kInt32)}) + networks->forward(points_tensor));

Where output_tensor is a tensor with some shape Nx1, indices is a std::vector<int> with the indices of the output_tensor I want to add the network prediction to. I keep getting this error:

terminate called after throwing an instance of 'c10::IndexError' what(): index 1701270418 is out of bounds for dimension 0 with size 500

Where the really large integer index out of bound changes to a different large integer every time I run the code. Can anyone help me debug this?

Thank you.

Ramansh_Sharma · October 23, 2023, 5:16pm

Hi @ptrblck. Do you know what might be causing this error?

Thank you.

ptrblck · October 23, 2023, 5:18pm

The underlying data is most likely freed and the tensor thus contains garbage. .clone() the tensor before indices is deleted to make sure the values are valid.

Ramansh_Sharma · October 23, 2023, 5:36pm

Thank you so much for your reply @ptrblck. I tried cloning the tensor but the error persists. This is what I changed the code to:

output_tensor.index_put_({torch::from_blob(indices.data(), {indices.size()}, torch::kInt32).clone()}, output_tensor.index({torch::from_blob(indices.data(), {indices].size()}, torch::kInt32).clone()}) + networks->forward(points_tensor));

It is worth mentioning that the forward pass of this code goes perfectly smoothly. Only the backward pass has a problem. Could it be that the in place index_put_ operation is not backward compatible? In which case I should do something like this,

output_tensor.index(torch::from_blob(indices.data(), {indices.size()}, torch::kInt32)) = output_tensor.index(torch::from_blob(indices.data(), {indices.size()}, torch::kInt32)) + network->forward(points_tensor);

ptrblck · October 23, 2023, 5:39pm

“Backward compatible” refers to software versioning, but I assume you are asking about the support of the backward pass? If so, index_put_ won’t break the backward pass and the error still sounds as if a tensor/data is going out of scope. You could write a quick check in Python, which should work also in the backward pass.

Ramansh_Sharma · October 23, 2023, 5:40pm

Ah yes, sorry, I meant as in the backward pass. I will try writing a quick check in Python. Thank you.