CUDA error: CUBLAS_STATUS_INVALID_VALUE

machina · September 28, 2021, 12:18am

RESOLVED: resolved it the issue, disc() (fc1) took size [32, 784] and I had the input tensor (adv_ex) as [32,28,28]

Receiving a runtime error and I’m not sure why this line is causing it.
Here is the code block:

for epoch in range(num_epochs):
    for batch_idx, (real, labels) in enumerate(loader):
        #get a fixed input batch to display gen output
        if batch_idx == 0:
            if epoch == 0:
                fixed_input = real.view(-1,784).to(device)
        
        adv_ex = real.clone().reshape(-1,784).to(device) # [32, 784] advex copy of first batch flattened
        real = real.view(-1, 784).to(device) # [32, 784] # real batch flattened
        labels = labels.to(device) # size() [32] 32 labels in batch
        
        
        #purturb each image in adv_ex
        tmp_adv_ex = []
        for idx, item in enumerate(adv_ex):
            purturbation = gen(adv_ex[idx])
            tmp_adv_ex.append(adv_ex[idx] + purturbation)
        adv_ex = torch.cat(tmp_adv_ex, dim=0)
        adv_ex = real.clone().reshape(-1,784).to(device)
        
         
        ### Train Generator: min log(1 - D(G(z))) <-> max log(D(G(z))
        output = disc(adv_ex).view(-1)
        lossG = torch.mean(torch.log(1. - output)) #get loss for gen's desired desc pred

        adv_ex = adv_ex.reshape(-1,1,28,28)
        f_pred = target(adv_ex)
        f_loss = CE_loss(f_pred, labels) #add loss for gens desired f pred
        loss_G_Final = f_loss+lossG # can change the weight of this loss term later

        opt_gen.zero_grad()
        loss_G_Final.backward()
        opt_gen.step()
        
        ### Train Discriminator: max log(D(x)) + log(1 - D(G(z)))
        
        disc_real = disc(real).view(-1)
        disc_fake = disc(adv_ex).view(-1)
        lossD = -torch.mean(torch.log(disc(real)) + torch.log(1. - disc(adv_ex)))
        # can decide later how much that loss term weighs
        
        opt_disc.zero_grad()
        lossD.backward(retain_graph=True)
        opt_disc.step()

This is the error code:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_25258/819410292.py in <module>
     47         ### Train Discriminator: max log(D(x)) + log(1 - D(G(z)))
     48         disc_real = disc(real).view(-1)
---> 49         disc_fake = disc(adv_ex).view(-1)
     50         lossD = -torch.mean(torch.log(disc(real)) + torch.log(1. - disc(adv_ex)))
     51         # can decide later how much that loss term weighs

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/AdvGan9-4/models.py in forward(self, x)
     28 
     29     def forward(self, x):
---> 30         x = self.fc1(x)
     31         x = nn.ReLU()(x)
     32         x = self.fc2(x)

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/modules/linear.py in forward(self, input)
     94 
     95     def forward(self, input: Tensor) -> Tensor:
---> 96         return F.linear(input, self.weight, self.bias)
     97 
     98     def extra_repr(self) -> str:

~/.conda/envs/mypytorch19/lib/python3.9/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1845     if has_torch_function_variadic(input, weight):
   1846         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1847     return torch._C._nn.linear(input, weight, bias)
   1848 
   1849 

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

 ** On entry to SGEMM  parameter number 10 had an illegal value

How should I interpret this and how should I modify line 49 to resolve it?

ptrblck · September 28, 2021, 5:01am

Could you post the setup of self.fc1 as well as the shape of x?
Also, which device are you using and in case you are using an older PyTorch release, could you update to the latest one and rerun the code, please?

machina · September 28, 2021, 5:33am

I resolved it the issue, disc() (fc1) took size [32, 784] and I had the input tensor (adv_ex) as [32,28,28]

This is the current issue that I’m struggling with MmBackward returned an invalid gradient at index 0?

ptrblck · September 28, 2021, 5:37am

Which PyTorch version are you using?
A wrong input shape should yield an easy to understand shape mismatch error. The shape check was recently broken, but was fixed in 1.9.1. If you are seeing this error in this version, it would be great to see your code, as this error message is more then unhelpful.

machina · September 28, 2021, 5:41am

my pytorch version is 1.9.0 and the code for disc() is

class Discriminator(nn.Module):
    def __init__(self, in_features):
        super().__init__()
        self.fc1 = nn.Linear(in_features, 128)
        self.fc2 = nn.Linear(128, 1)
        
    def forward(self, x):
        x = self.fc1(x)
        x = nn.ReLU()(x)
        x = self.fc2(x)
        x = nn.Sigmoid()(x)
        return x

is updating torch version just “conda install pytorch”?

ptrblck · September 28, 2021, 5:43am

Yes, either via conda or pip, depending how you’ve installed it previously.
I know that users here had good experience with updating the binaries, but I’m usually just uninstalling the old ones and reinstall the new one.
In case you are concerned about an update, create a new virtual conda environment and install the latest release there.

machina · September 28, 2021, 5:45am

It seems like it didn’t install the later version
should i do conda install pytorch=1.9.1

ptrblck · September 28, 2021, 5:46am

Yes, use the instructions from the website.