Initially I tried to compute histogram between my outputs and targets - but I tossed a wall that made me doubt all my autograd knowledge
Check this custom loss:
class DoSthLoss(nn.Module):
def __init__(self):
"""
"""
super(DoSthLoss, self).__init__()
def do_something(self, tensor, *args):
tensor = torch.ones_like(tensor, device=tensor.device, requires_grad=True) * torch.rand(1).item()
return tensor
def forward(self, outputs: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
"""
Computes the Do Something loss.
"""
outputs = self.do_something(outputs)
targets = self.do_something(targets)
loss = (targets - outputs) ** 2
# Sum over spatial dimensions (D, H, W for 3D, or H, W for 2D)
loss = loss.view(loss.size(0), -1).sum(dim=-1) # Sum over D,H,W
return loss.mean()
after playing some lego game with my code, I simplified my loss function to this point that it gets still cuda memory error whenever I call a new tensor (no matter in place operation, I tried)
my question is why? how to properly initialize new / temporary tensors inside custom loss function?
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacity of 15.66 GiB of which 851.56 MiB is free. Process 3171 has 249.95 MiB memory in use. Including non-PyTorch memory, this process has 13.75 GiB memory in use. Of the allocated memory 12.73 GiB is allocated by PyTorch, and 754.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Oops, I am sorry I forgot to put but very classical oom error by cuda.
Yes, I know it means that. Maybe I wasn’t clear about my problem: I don’t have a memory problem if I don’t create a new tensor inside a function. Otherwise, I am using MSE loss perfectly with the same data.
If your custom method allocates a tensor, that won’t fit onto your device, the OOM error is expected. I don’t fully understand why you are not expecting this behavior and how it relates to a custom function.