Pytorch (>= 1.3) showing an error, perhaps for loss computed from paired outputs

Sudipto · February 5, 2020, 11:38pm

I am trying to use the code for IIC (Invariant Information Clustering) : https://github.com/xu-ji/IIC

It basically calls the network twice on original input and modified input.

x_out = net(sample) # Softmax output for original sample
x_tf_out = net(noisy_sample) # Softmax output for noisy sample

loss = IIC_loss(x_out, x_tf_out)

And then the loss wants the original sample and the noisy sample to be labeled to the same class. It is computed by the IIC loss :

def IIC_loss(x_out, x_tf_out, EPS=sys.float_info.epsilon):
# has had softmax applied
_, k = x_out.size()
p_i_j = compute_joint(x_out, x_tf_out)
assert (p_i_j.size() == (k, k))

p_i = p_i_j.sum(dim=1).view(k, 1).expand(k, k)
p_j = p_i_j.sum(dim=0).view(1, k).expand(k, k) # but should be same, symmetric

# avoid NaN losses. Effect will get cancelled out by p_i_j tiny anyway
p_i_j[(p_i_j < EPS).data] = EPS
p_j[(p_j < EPS).data] = EPS
p_i[(p_i < EPS).data] = EPS

loss = - p_i_j * (torch.log(p_i_j)
- torch.log(p_j)
- torch.log(p_i))

loss = loss.sum()

return loss

def compute_joint(x_out, x_tf_out):
# produces variable that requires grad (since args require grad)

bn, k = x_out.size()
assert (x_tf_out.size(0) == bn and x_tf_out.size(1) == k)

p_i_j = x_out.unsqueeze(2) * x_tf_out.unsqueeze(1) # bn, k, k
p_i_j = p_i_j.sum(dim=0) # k, k
p_i_j = (p_i_j + p_i_j.t()) / 2. # symmetrise
p_i_j = p_i_j / p_i_j.sum() # normalise

return p_i_j

When I now call backward() on the loss, it leads to the following error :

loss.backward()
optimizer.step()

in backward
#allow_unreachable=True) # allow_unreachable flag
RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.

What changes in the code are needed in new version of Pytorch ? The code ran perfectly even for PyTorch 1.0.

albanD · February 6, 2020, 2:56pm

Hi,

Yes we added more checks for operations that were potentially invalid.
Could you run your code with anomaly mode by setting autograd.set_detect_anomaly(True) at the beginning of your script. That will let us know what is causing the error.

Sudipto · February 7, 2020, 12:10am

Hi albanD,

I figured out the issue yesterday. The inplace operations inside the function IIC_loss created the problem.

Instead of the original lines,
p_i_j[(p_i_j < EPS).data] = EPS
p_j[(p_j < EPS).data] = EPS
p_i[(p_i < EPS).data] = EPS

The correct version for new PyTorch (>= 1.3) should be as follows :
p_i_j = torch.where(p_i_j < EPS, torch.tensor([EPS], device = p_i_j.device), p_i_j)
p_j = torch.where(p_j < EPS, torch.tensor([EPS], device = p_j.device), p_j)
p_i = torch.where(p_i < EPS, torch.tensor([EPS], device = p_i.device), p_i)