Clamping leads to cuda out of memory but works. why?

I have a standard dataloader which loads images.
On top of every image I want to add a static tensor.
But I want to clamp this to (0,1).
This new image is used to train a model.

The following code roughly show the important steps.
(everything is on gpu)

static_tensor = torch.load(path)
for img in dataloader:
  img = img.cuda()
  addition_tensor = img + static_tensor
  clamped_tensor = addition_tensor.clamp(0,1)
  eval = model(clamped_tensor)
  loss = criterion (eval, label)

This creates out of memory errors.

But if I change the clamping into

clamped_tensor =,1)

it no longer creates this error.

What is the reason behind this?

Edit: I noticed that my static tensor has requires_grad = True.

Am I correct to assume that this tensor ‘saves’ the autograd with the model and the graph of this gets bigger in every instance of the for loop?


Your static Tensor should not have requires_grad=True I guess.
And in that case, doing any op here will increase the memory usage as we need to save some values for the backward.

Note that you should never use .data in general!
Here you can use .detach() which will have similar behavior but will avoid weird side effects!

Hi, thank you for your advice!
I got another question about clamping, if static Tensor do have requires_grad=True and clamping tensor cause errors like “CUDA error: an illegal memory access was encountered”.
Do you have any ideas about this? Thank you ! I’m sorry if this is an unrelated issue! :sob:

Could you update to the latest PyTorch release and check, if you are still hitting this issue?
If so, could you post a minimal executable code snippet to reproduce the issue, please?

Thank you for your reply!
Here is code:

import torch.multiprocessing as mp
import torch.nn.functional as F
from torch.cuda.amp import autocast, GradScaler

def task():
    for epoch in range(0, 100):
        trainer = Trainer()

class Trainer():
    def training(self, model, dataloader):
        train_loss = 0.0
        for i, (images, target) in enumerate(dataloader):
            with autocast():
                output = model(images)
                loss = criterion(output, target)
                train_loss += loss.item()

def criterion(predict, target):
    loss_0 = torch.tensor(0.0, device='0, 1')
    n, c, h, w = predict.size()
    for i in range(n):
        loss = F.nll_loss(predict, target, reduction='None').view(-1)
        loss = loss.clamp(min=0.0, max=1000.0) #RuntimeError: CUDA error: an illegal memory access was encountered
        loss_hard, _ = loss.topk(int(0.1 * h * w))
        loss_0 += loss_hard
    return loss_0 / n

if __name__ == '__main__':
    mp.spawn(task, nprocs, args)

My pytorch version is 1.7 and cuda version is 11.0.The reason I don’t update to the latest release is because loss function used to work fine until I trained with another larger dataset and this error shows accidently during training.
So is this about the scale of dataset? If not, may be other parts of my codes is wrong.
Thank you!