THCudaCheck FAIL

Hi guys, I was having the error of THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument and after doing some research around the forum: A error when using GPU, I tried doing

  1. CUDA_LAUNCH_BLOCKING=1 python args

  2. torch.backends.cudnn.benchmark = True to False

  3. pip install -U

All three methods throw the same error even with the first method not showing further details apart from being stuck before the training loop.

I am running on Google Cloud Platform with accelerator="type=nvidia-tesla-t4,count=2". Any help is appreciated. Thank you.

Could you post the complete stack trace using option 1?
This should point to the line of code, which throws this error.
Also, is your code running fine on the CPU?

hi ptrblck, thanks for the reply. Sorry, what do you mean by “complete stack trace”? The full error is THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument. It somehow doesn’t run after throwing this error.

I have the code of

if torch.cuda.device_count() > 1: 
    model = torch.nn.DataParallel(model)

to “put” my model on the 2 GPUs. Other than that, it works fine on my CPU.

edit: it works on accelerator="type=nvidia-tesla-p4,count=2", i.e., no error thrown. Apparently it only fails for t4.

Thanks for the information.
By stack trace I mean the output before the THCudaCheck error.
Do you get any lines of code?

Is the code working on a single T4 or does it also crash?
Would it be possible to get a reproducible code snippet, so that we could have a look?

There is no output or error or any code before the THCudaCheck error. I am having

(env) me@my-instance:~/folder/sub_folder$ CUDA_LAUNCH_BLOCKING=1 python3 
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument

I did not try it on a single T4.

Here is a toy example that also produces the error:

import torch
import torch.nn as nn
import torch.nn.functional as F
import pretrainedmodels
from easydict import EasyDict as edict

class ToyNet(nn.Module):

    def __init__(self, n_classes, model_name='resnet50', use_fc=False, fc_dim=512, dropout=0.0, loss_module='softmax'):
        super(ToyNet, self).__init__()        

        self.backbone = getattr(pretrainedmodels, model_name)(num_classes=1000)

        trained_kernel = self.backbone.conv1.weight

        new_conv = nn.Conv2d(6, 64, kernel_size=7, stride=2, padding=3, bias=False)

        with torch.no_grad():
            new_conv.weight[:,:] = torch.stack([torch.mean(trained_kernel, 1)]*6, dim=1)

        self.backbone.conv1 = new_conv

        final_in_features = self.backbone.last_linear.in_features

        self.backbone = nn.Sequential(*list(self.backbone.children())[:-2])

        self.pooling = nn.AdaptiveAvgPool2d(1)

        self.use_fc = use_fc
        if use_fc:
            self.dropout = nn.Dropout(p=dropout)
            self.fc = nn.Linear(final_in_features, fc_dim)
   = nn.BatchNorm1d(fc_dim)
            final_in_features = fc_dim

        self.loss_module = loss_module = nn.Linear(final_in_features, n_classes)

    def _init_params(self):
        nn.init.constant_(self.fc.bias, 0)
        nn.init.constant_(, 1)
        nn.init.constant_(, 0)

    def forward(self, x, label):
        feature = self.extract_feat(x)
        logits =
        return logits

    def extract_feat(self, x):
        batch_size = x.shape[0]
        x = self.backbone(x)
        x = self.pooling(x).view(batch_size, -1)

        if self.use_fc:
            x = self.dropout(x)
            x = self.fc(x)
            x =

        return x

def get_model(config):
    n_classes = config.model.num_classes
    model_name = config.model.arch
    use_fc = config.model.use_fc
    fc_dim = config.model.fc_dim
    dropout = config.model.dropout
    loss_module = config.model.loss_module
    net = ToyNet(n_classes, model_name, use_fc, fc_dim, dropout, loss_module)

    return net

if __name__ == "__main__":
    cfg = edict()
    cfg.model = edict()
    cfg.model.arch = 'resnet18'
    cfg.model.dropout = 0
    cfg.model.loss_module = 'softmax' 
    cfg.model.use_fc = False
    cfg.model.fc_dim = 512
    cfg.model.image_size = 224 # resize
    cfg.model.num_classes = 1200
    cfg.model.pretrained = True = 3e-4

    model = get_model(cfg)

    if torch.cuda.is_available() and torch.cuda.device_count() > 1: 
        model = torch.nn.DataParallel(model)

    if torch.cuda.is_available(): 
        model = model.cuda()

    input_ = torch.randn((8, 6, 224, 224))
    label_ = torch.randn((8, 6))
    print(model(input_, label_))

Here is the instance i created:

export IMAGE_FAMILY="pytorch-latest-gpu"
export ZONE="us-central1-b"
export INSTANCE_NAME="my-instance"
export INSTANCE_TYPE="n1-highmem-8"

gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator="type=nvidia-tesla-t4,count=2" \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=200GB \
        --metadata="install-nvidia-driver=True" \

Thank you, once again!