loss = F.binary_cross_entropy_with_logits(input=pred_logits[0:1, 0:1], target=torch.tensor([[1.]]), reduction=‘none’)

I get RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of AsStridedBackward0, is at version 1; expected version 0 instead.

But if I change it to

loss = F.binary_cross_entropy_with_logits(input=pred_logits.clone()[0:1, 0:1], target=torch.tensor([[1.]]), reduction=‘none’)

I get no errors. Is it a pytorch bug? After all, slicing just creates a new Tensor, where is the inplace operation then?

Slicing doesn’t really create an entirely new tensor.

In a technical sense, slicing creates a new tensor in that a new pytorch Tensor python object is created, but this Tensor is a wrapper object
that contains the same underlying data as the original Tensor. We say
that slicing returns a view into the original Tensor.

You can use python’s id() function to see that a new Tensor is created,
but then use .storage().data_ptr() to see that both Tensors reference
the same underlying data. Furthermore, modifying the sliced Tensor
modifies the shared data so those modifications are reflected in the
original Tensor.

Here is an illustration of these points:

>>> import torch
>>> torch.__version__
'1.13.0'
>>> t = torch.arange (5.)
>>> t
tensor([0., 1., 2., 3., 4.])
>>> u = t[1:2]
>>> u
tensor([1.])
>>> id (t)
1803859016768
>>> id (u)
1803859016448
>>> t.storage().data_ptr()
1803813400832
>>> u.storage().data_ptr()
1803813400832
>>> u[0] = 666.
>>> t
tensor([ 0., 666., 2., 3., 4.])

The inplace operation could be hiding most anywhere. Try running both your
forward and backward pass inside of a with autograd.detect_anomaly():
block and see if the additional information this provides points you to the
problematic inplace operation.