I think the authors of the repository will give you a better answer, but based on the code it seems the backward method was reimplemented for the negated cdist method.
From the docs of cdist:
Computes batched the p-norm distance between each pair of the two collections of row vectors.
I’m not familiar with the implementation of the repository and would recommend to create an issue there.
Let’s go through all operations separately in the calls:
toch.cdist(a, b, p) calculates the p-norm distance between each pair of the two collections of row vectos, as explained above
.squeeze() will remove all dimensions of the result tensor where tensor.size(dim) == 1
.transpose(0, 1) will permute dim0 and dim1, i.e. it’ll “swap” these dimensions
torch.unsqueeze(tensor, dim) will add a new dimension specific by dim expand() will manipulate the meta data to create a view with the new shape (no copy of the data) permuteis similar totranspose` for multiple dimensions
I was told that _temp1 = torch.unsqueeze(X, 2).expand(X.shape[0], X.shape[1], W.shape[0]).permute(1, 0, 2) seems to use up a lot of GPU memory during intermediate calculation .
But how to modify this line to use less GPU memory ?
All mentioned operations manipulate the metadata (shape, stride) of the tensor and will not use more memory in this particular line of code.
However, the result will be a non-contiguous tensor, and the next _temp1.contiguous() call (either manually or in a function) will trigger the copy and use more memory.
You cannot avoid this copy, if a method needs a contiguous tensor to operate on.
decrease the tensors dimension, then increase the tensors dimension again or
increase the tensors dimensions and then use depth-wise convolution or
use mixed-precision training
I know I can use https://github.com/NVIDIA/apex for the mixed precision training.
However, I am not sure how to modify the code for suggestions 1 and 2 above.
So, are you implying that torch.cdist() function itself contains contiguous() call ?
I tried to replace torch.cdist() using fast_cdist() as shown below. However, I still have GPU out-of-memory error.
def fast_cdist(x1, x2):
adjustment = x1.mean(-2, keepdim=True)
x1 = x1 - adjustment
x2 = x2 - adjustment # x1 and x2 should be identical in all dims except -2 at this point
# Compute squared distance matrix using quadratic expansion
# But be clever and do it with a single matmul call
x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
x1_pad = torch.ones_like(x1_norm)
x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
x2_pad = torch.ones_like(x2_norm)
x1_ = torch.cat([-2. * x1, x1_norm, x1_pad], dim=-1)
x2_ = torch.cat([x2, x2_pad, x2_norm], dim=-1)
res = x1_.matmul(x2_.transpose(-2, -1))
# Zero out negative values
res.clamp_min_(1e-30).sqrt_()
return res
Probably yes, and these lines are probably the right ones.
In your code other methods, such as torch.cat will create contiguous tensors as seen here:
a = torch.randn(1, 1).expand(10, 10)
print(a.is_contiguous())
> False
b = torch.randn(10, 10)
print(b.is_contiguous())
> True
c = torch.cat((a, b), dim=1)
print(c.is_contiguous())
> True
The main issue, is that your data is too large for the applied operations, as at least some of them work on contiguous tensors, which will create the memory increase.
For mixed-precision training, I would recommend to install the nightly and use native amp as described here.
As for using pytorch native amp library, I have the following runtime error with train.py
Traceback (most recent call last):
File "train.py", line 112, in <module>
loss, outputs = model(imgs, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 563, in __call__
result = self.forward(*input, **kwargs)
File "/home/rog/Downloads/PyTorch-YOLOv3/models.py", line 266, in forward
x, layer_loss = module[0](x, targets, img_dim)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 563, in __call__
result = self.forward(*input, **kwargs)
File "/home/rog/Downloads/PyTorch-YOLOv3/models.py", line 203, in forward
loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 563, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 520, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2417, in binary_cross_entropy
input, target, weight, reduction_enum)
RuntimeError: torch.nn.functional.binary_cross_entropy and torch.nn.BCELoss are unsafe to autocast.
Many models use a sigmoid layer right before the binary cross entropy layer.
In this case, combine the two layers using torch.nn.functional.binary_cross_entropy_with_logits
or torch.nn.BCEWithLogitsLoss. binary_cross_entropy_with_logits and BCEWithLogits are
safe to autocast.
As the error message stats, you should replace the usage of sigmoid + nn.BCELoss to logits + nn.BCEWithLogitsLoss, as the former approach is unsafe for autocasting.
I am also getting this error, the output of my model uses F.sigmoid. When i compute loss I use BCELoss. If I decide to use autocast, should I just remove sigmoid and use bcelosswithlogits?