Pytorch (>= 1.3) showing an error, perhaps for loss computed from paired outputs

I am trying to use the code for IIC (Invariant Information Clustering) : https://github.com/xu-ji/IIC

It basically calls the network twice on original input and modified input.

x_out = net(sample) # Softmax output for original sample
x_tf_out = net(noisy_sample) # Softmax output for noisy sample

loss = IIC_loss(x_out, x_tf_out)

And then the loss wants the original sample and the noisy sample to be labeled to the same class. It is computed by the IIC loss :

def IIC_loss(x_out, x_tf_out, EPS=sys.float_info.epsilon):
# has had softmax applied
_, k = x_out.size()
p_i_j = compute_joint(x_out, x_tf_out)
assert (p_i_j.size() == (k, k))

p_i = p_i_j.sum(dim=1).view(k, 1).expand(k, k)
p_j = p_i_j.sum(dim=0).view(1, k).expand(k, k) # but should be same, symmetric

# avoid NaN losses. Effect will get cancelled out by p_i_j tiny anyway
p_i_j[(p_i_j < EPS).data] = EPS
p_j[(p_j < EPS).data] = EPS
p_i[(p_i < EPS).data] = EPS

loss = - p_i_j * (torch.log(p_i_j)
- torch.log(p_j)
- torch.log(p_i))

loss = loss.sum()

return loss

def compute_joint(x_out, x_tf_out):
# produces variable that requires grad (since args require grad)

bn, k = x_out.size()
assert (x_tf_out.size(0) == bn and x_tf_out.size(1) == k)

p_i_j = x_out.unsqueeze(2) * x_tf_out.unsqueeze(1) # bn, k, k
p_i_j = p_i_j.sum(dim=0) # k, k
p_i_j = (p_i_j + p_i_j.t()) / 2. # symmetrise
p_i_j = p_i_j / p_i_j.sum() # normalise

return p_i_j

When I now call backward() on the loss, it leads to the following error :

loss.backward()
optimizer.step()

in backward
#allow_unreachable=True) # allow_unreachable flag
RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.

What changes in the code are needed in new version of Pytorch ? The code ran perfectly even for PyTorch 1.0.

Hi,

Yes we added more checks for operations that were potentially invalid.
Could you run your code with anomaly mode by setting autograd.set_detect_anomaly(True) at the beginning of your script. That will let us know what is causing the error.

Hi albanD,

I figured out the issue yesterday. The inplace operations inside the function IIC_loss created the problem.

Instead of the original lines,
p_i_j[(p_i_j < EPS).data] = EPS
p_j[(p_j < EPS).data] = EPS
p_i[(p_i < EPS).data] = EPS

The correct version for new PyTorch (>= 1.3) should be as follows :
p_i_j = torch.where(p_i_j < EPS, torch.tensor([EPS], device = p_i_j.device), p_i_j)
p_j = torch.where(p_j < EPS, torch.tensor([EPS], device = p_j.device), p_j)
p_i = torch.where(p_i < EPS, torch.tensor([EPS], device = p_i.device), p_i)

2 Likes