Error during test with BigGAN

I am trying to use the BigGAN model, but I found the below error during test when I generated images using the pretrained weights. How Can I solve the problem? (Not related to the memory)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_34/2818109655.py in <module>
     12                # real_img = imgs.to(device)
     13                # bs = imgs.size(0)
---> 14                 fake_imgs = pretrained_G(imgs.cuda())
     15 
     16 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/tmp/ipykernel_34/2782497050.py in forward(self, z)
     66         # Feed through generator blocks
     67         for idx, g_block in enumerate(self.g_blocks):
---> 68             h = g_block(h)
     69            # h = g_block[1](h)
     70 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/tmp/ipykernel_34/2262133493.py in forward(self, x)
     31 
     32         # Compute dot product attention with query (theta) and key (phi) matrices
---> 33         beta = F.softmax(torch.bmm(theta.transpose(1, 2), phi), dim=-1)
     34 
     35         # Compute scaled dot product attention with value (g) and attention (beta) matrices

RuntimeError: CUDA out of memory. Tried to allocate 6.00 GiB (GPU 0; 15.90 GiB total capacity; 6.89 GiB already allocated; 2.16 GiB free; 12.68 GiB reserved in total by PyTorch)

Your GPU is running out of memory, so you would need to reduce the memory usage e.g. via reducing the batch size.

I’m not sure how to understand this information, as the error points towards the OOM issue.

because the model work during training so memory is not the problem and also I set batch_size to 1 and I still have the same problem

If the model is working during training, I assume it’s failing during validation?
If so, your memory usage might be still too high once you start the validation loop and you should check if unnecessary tensors are kept alive (e.g. the last loss tensor, which could still be attached to the computation graph).

the validation seems works like the below Code.
In validation without backward(), optimizer.step(), and optimizer.zero_grad() that I put in training?

   
    valid_loss_d = 0.0
    valid_loss_g = 0.0

    
    G.eval()
    D.eval()
    d_val_losses, g_val_losses = [], []
    
    #with torch.no_grad():
    for i, (data, _,_) in enumerate(valid_dataloader):
            data = data.to(device, dtype= torch.float32)

            z = torch.cuda.FloatTensor(np.random.normal(0, 1, (data.shape[0], latent_dim))).to(device)
            
            fake_imgs = G(z)
            
            real_validity = D(data)
            fake_validity = D(fake_imgs.detach())
            
            # Gradient penalty
            gradient_penalty = compute_gradient_penalty(D, data.data, fake_imgs.data)
            
            # Adversarial loss
            val_d_loss = -torch.mean(real_validity) + torch.mean(fake_validity) +lambda_gp *  gradient_penalty
            if i % n_critic == 0:
                # -----------------
                #  Train Generator
                # -----------------
                
                fake_imgs = G(z)
                fake_validity = D(fake_imgs)

                val_g_loss = -torch.mean(fake_validity)
                
                d_val_losses.append(val_d_loss.item())
                g_val_losses.append(val_g_loss.item())
                
                valid_loss_d+= val_d_loss.item() 
                valid_loss_g+= val_g_loss.item() 
            
            

    print(f"[G_Train_Loss: {g_running_loss / len(train_dataloader)}] "
            f"[D_Train_Loss: {d_running_loss / len(train_dataloader)}]"
            f"[G_Val_Loss: {valid_loss_g / len(valid_dataloader)}]"
            f"[D_Val_Loss: {valid_loss_d / len(valid_dataloader)}]"
           )

                
    val_total_g_losses.append(valid_loss_g/len(valid_dataloader))
    val_total_d_losses.append(valid_loss_d/len(valid_dataloader))
    
    
    if  min_valid_loss_g > valid_loss_g :
            print(f'G_Val_Loss_Decreased({min_valid_loss_g:.6f}--->{valid_loss_g:.6f})\t Saving The Model')
            min_valid_loss_g = valid_loss_g
            
            torch.save(G.state_dict(), f"./generator.pth")
            
    elif min_valid_loss_d > valid_loss_d :
        print( f'D_Val_Loss_Decreased({min_valid_loss_d:.6f}--->{valid_loss_d:.6f})\t Saving The Model')
        min_valid_loss_d = valid_loss_d
        torch.save(D.state_dict(), f"./discriminator.pth")

    else:
        print('None')
        

Yes, during the validation you wouldn’t update the model, as this would be a data leak.
It’s still unclear in which setup you are hitting the OOM.
In any case, wrap the validation loop into with torch.no_grad() and check the memory usage via adding print(torch.cuda.memory_summary()) into the validation code to see how large the actual memory requirement is and which line of code causes the OOM error.

The problem is during test while trying to generate fake image from the test set: