RuntimeError: set_sizes_and_strides is not allowed on a Tensor created from .data or .detach()

I have a GAN style code, like below:

self.optimizer_generator.zero_grad()
fake_high_resolution = self.generator(low_resolution)

score_real = self.discriminator(high_resolution)
score_fake = self.discriminator(fake_high_resolution)

# calculate generator_loss

generator_loss.backward()
self.optimizer_generator.step()

self.optimizer_discriminator.zero_grad()

score_real = self.discriminator(high_resolution)
score_fake = self.discriminator(fake_high_resolution.detach())

# calculate discriminator_loss

discriminator_loss.backward()
self.optimizer_discriminator.step()

How this error come from and why? since I written my code based on code can correctly run.
env:

PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce RTX 3090

Nvidia driver version: 460.67
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] pytorch-lightning==1.2.5
[pip3] torch==1.8.0
[pip3] torchelastic==0.2.2
[pip3] torchmetrics==0.2.0
[pip3] torchtext==0.9.0
[pip3] torchvision==0.9.0
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.1.74              h6bb024c_0    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py38he904b0f_0  
[conda] mkl_fft                   1.3.0            py38h54f3939_0  
[conda] mkl_random                1.1.1            py38h0573a6f_0  
[conda] numpy                     1.19.2           py38h54aff64_0  
[conda] numpy-base                1.19.2           py38hfa32c7d_0  
[conda] pytorch                   1.8.0           py3.8_cuda11.1_cudnn8.0.5_0    pytorch
[conda] pytorch-lightning         1.2.5                    pypi_0    pypi
[conda] torchelastic              0.2.2                    pypi_0    pypi
[conda] torchmetrics              0.2.0                    pypi_0    pypi
[conda] torchtext                 0.9.0                      py38    pytorch
[conda] torchvision               0.9.0                py38_cu111    pytorch

Could you post the complete stack trace for this error, which would also point to the line of code, which raises the issue, please?

Ok. this it the complete stack:

Traceback (most recent call last):
  File "/ghome/luoxin/projects/multi-scale-liif/run.py", line 39, in <module>
    main()
  File "/opt/conda/lib/python3.8/site-packages/hydra/main.py", line 33, in decorated_main
    _run_hydra(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 364, in _run_hydra
    run_and_report(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 215, in run_and_report
    raise ex
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 212, in run_and_report
    return func()
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 365, in <lambda>
    lambda: hydra.run(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 109, in run
    return run_job(
  File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job
    ret.return_value = task_function(task_cfg)
  File "/ghome/luoxin/projects/multi-scale-liif/run.py", line 33, in main
    return train(config)
  File "/ghome/luoxin/projects/multi-scale-liif/src/train.py", line 78, in train
    trainer.fit(model=model, datamodule=datamodule)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
    self.dispatch()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
    self.accelerator.start_training(self)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
    self._results = trainer.run_train()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 637, in run_train
    self.train_loop.run_training_epoch()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 493, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 658, in run_training_batch
    self._curr_step_result = self.training_step(
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 293, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 156, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 125, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "/ghome/luoxin/projects/multi-scale-liif/src/lightning_modules/liif.py", line 466, in training_step
    score_fake = self.D(fake_image.detach()).mean(dim=(1, 2, 3))
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/ghome/luoxin/projects/multi-scale-liif/src/architectures/discriminator.py", line 27, in forward
    return self.model(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: set_sizes_and_strides is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.
For example, change:
    x.data.set_(y)
to:
    with torch.no_grad():
        x.set_(y)

Could see that when run score_fake = self.D(fake_image.detach()).mean(dim=(1, 2, 3)) error happened.

That’s quite a weird error. Could you post the model definition of self.D as well as the shape of fake_image?

This is my D, quite simple:

class PatchDiscriminator(nn.Module):
    """Defines a PatchGAN discriminator"""

    def __init__(self, input_nc=3, ndf=64):
        """Construct a PatchGAN discriminator
        Parameters:
            input_nc (int)  -- the number of channels in input images
            ndf (int)       -- the number of filters in the last conv layer
            n_layers (int)  -- the number of conv layers in the discriminator
            norm_layer      -- normalization layer
        """
        super(PatchDiscriminator, self).__init__()
        sequence= []
        sequence += [nn.Conv2d(input_nc, ndf, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(0.2)]

        sequence += [nn.Conv2d(ndf, ndf*2, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(0.2)]
        sequence += [nn.Conv2d(ndf*2, ndf*4, kernel_size=3, stride=1, padding=1), nn.LeakyReLU(0.2)]
        sequence += [nn.Conv2d(ndf*4, ndf*8, kernel_size=3, stride=1, padding=1), nn.LeakyReLU(0.2)]

        sequence += [nn.Conv2d(ndf*8, 1, kernel_size=3, stride=1, padding=1)]
        self.model = nn.Sequential(*sequence)

    def forward(self, input):
        """Standard forward."""
        return self.model(input)

And the shape of fake_image in my case is varied, I have three different group fake_image, that may be the reason. see code below:

        real_labels = torch.ones((hr.size(0), 1)).type_as(hr)
        fake_labels = torch.zeros((hr.size(0), 1)).type_as(hr)

        ##########################
        #   training generator   #
        ##########################
        optimizer_generator.zero_grad()

        adversarial_loss = 0
        fake_image_group = [lr, pred1, pred2]
        for fake_image in fake_image_group:
            score_real = self.D(hr).mean(dim=(1, 2, 3))
            score_fake = self.D(fake_image).mean(dim=(1, 2, 3))

            discriminator_rf = score_real.unsqueeze(dim=1) - score_fake.mean()
            discriminator_fr = score_fake.unsqueeze(dim=1) - score_real.mean()

            adversarial_loss_rf = self.adversarial_criterion(discriminator_rf, fake_labels)
            adversarial_loss_fr = self.adversarial_criterion(discriminator_fr, real_labels)
            adversarial_loss += (adversarial_loss_fr + adversarial_loss_rf) / 2

        adversarial_loss /= len(fake_image_group)
        
        perceptual_loss = self.perception_criterion(hr, pred1) + self.perception_criterion(hr, pred2) #+ self.perception_criterion(hr, pre3)
        content_loss = self.content_criterion(pred1, hr) + self.content_criterion(pred2, hr) #+ self.content_criterion(pred3, hr)

        generator_loss = adversarial_loss * self.hparams.adversarial_loss_factor + \
                            perceptual_loss * self.hparams.perceptual_loss_factor + \
                            content_loss * self.hparams.content_loss_factor

        self.manual_backward(generator_loss)
        optimizer_generator.step()

        ##########################
        # training discriminator #
        ##########################

        optimizer_discriminator.zero_grad()

        adversarial_loss = 0
        for fake_image in fake_image_group:
            score_real = self.D(hr).mean(dim=(1, 2, 3))
            score_fake = self.D(fake_image.detach()).mean(dim=(1, 2, 3))

            discriminator_rf = score_real.unsqueeze(dim=1) - score_fake.mean()
            discriminator_fr = score_fake.unsqueeze(dim=1) - score_real.mean()

            adversarial_loss_rf = self.adversarial_criterion(discriminator_rf, real_labels)
            adversarial_loss_fr = self.adversarial_criterion(discriminator_fr, fake_labels)
            adversarial_loss += (adversarial_loss_fr + adversarial_loss_rf) / 2

        adversarial_loss /= len(fake_image_group)

        self.manual_backward(adversarial_loss)
        optimizer_discriminator.step()

pred1 and pred2 have same shape, and lr have it’s own shape.

Actually, lr have shape [batch_size, 3, uniform(48, 96), uniform(48, 96)], and pred1, pred2 with [batch_size, 3, 96, 96]

Your standalone model works and I’m unable to reproduce the issue in the forward pass:

model = PatchDiscriminator()
x = torch.randn(1, 3, 96, 96, requires_grad=True)
out = model(x)
x = torch.randn(1, 3, 96, 96, requires_grad=True)
out = model(x.detach())

Any news on this? The error seems happened in the backward pass at some certain environment.

No news, as I’m unable to reproduce it as described in the last post.

This seems to be related to the CUDNN implementation of Resize with conv2d, specifically if your generator has any uses for channels being the last dimension. (eg. transformers) .
@Jiatao_GU Using detach().contiguous() instead seems to fix the problem. However, it does bring the overhead of contiguous.

2 Likes

This helped. What’s the overhead? Would it be better to just copy the tensor with torch.no_grad()?

Thank you.

@adeandrade that depends on your use case. Calling with no grad () would break the graph, unless that is what you intend to do. Although, I’m not quite sure, you can try that out and see if all layers get the gradients.

So, contiguous() overhead comes from the fact that you need to change the memory layout of the tensor. Ideally all the view, permute operations only switch the iterators/ adjust offsets on the tensor but not the way the tensor is stored. contiguous() is expensive and depends on the size of the tensor. And the underlying reason for requiring contiguous is perhaps something to do with the C/C++ vectorized implementation.

1 Like