THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered

Hello,

l got the following bug :

/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [27,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void

cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/ClassNLLCriterion.cu:101: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered

My data are as follow :

print(train_data.shape)
(9586, 2480)
print(test_data.shape)
(3734, 2480)
classes [0...100] (101 classes)

The bug occur when calling the function loss :

    def loss(self, y, y_target, l2_regularization):
        print('loss1')
        loss = nn.CrossEntropyLoss()(y, y_target)
        print('loss2')
        l2_loss = 0.0
        print('loss3')
        for param in self.parameters(): # here the except
            try:

                print('loss4')
                data = param * param
                print('loss5')
                l2_loss += data.sum()
                print('loss6')
                loss += 0.5 * l2_regularization * l2_loss
                print('loss7')
                return loss
            except:
                import ipdb;ipdb.set_trace()

the bug occurs at
for param in self.parameters(): in loss function

where

self.parameters
<bound method Module.parameters of Graph_ConvNet_LeNet5(
  (cl1): Linear(in_features=25, out_features=32)
  (cl2): Linear(in_features=800, out_features=64)
  (fc1): Linear(in_features=10112, out_features=512)
  (fc2): Linear(in_features=512, out_features=10)
self.parameters()
<generator object Module.parameters at 0x7f40f5ccee60>

Here is my code :

class my_sparse_mm(torch.autograd.Function):
    """
    Implementation of a new autograd function for sparse variables,
    called "my_sparse_mm", by subclassing torch.autograd.Function
    and implementing the forward and backward passes.
    """
    def forward(self, W, x):  # W is SPARSE
        self.save_for_backward(W, x)
        y = torch.mm(W, x)
        return y
    def backward(self, grad_output):
        W, x = self.saved_tensors
        grad_input = grad_output.clone()
        grad_input_dL_dW = torch.mm(grad_input, x.t())
        grad_input_dL_dx = torch.mm(W.t(), grad_input)
        return grad_input_dL_dW, grad_input_dL_dx
class Graph_ConvNet_LeNet5(nn.Module):
    def __init__(self, net_parameters):
        print('Graph ConvNet: LeNet5')
        super(Graph_ConvNet_LeNet5, self).__init__()
        # parameters
        D, CL1_F, CL1_K, CL2_F, CL2_K, FC1_F, FC2_F = net_parameters
        FC1Fin = CL2_F * (D // 16)
        # graph CL1
        self.cl1 = nn.Linear(CL1_K, CL1_F)
        Fin = CL1_K;
        Fout = CL1_F;
        scale = np.sqrt(2.0 / (Fin + Fout))
        self.cl1.weight.data.uniform_(-scale, scale)
        self.cl1.bias.data.fill_(0.0)
        self.CL1_K = CL1_K;
        self.CL1_F = CL1_F;
        # graph CL2
        self.cl2 = nn.Linear(CL2_K * CL1_F, CL2_F)
        Fin = CL2_K * CL1_F;
        Fout = CL2_F;
        scale = np.sqrt(2.0 / (Fin + Fout))
        self.cl2.weight.data.uniform_(-scale, scale)
        self.cl2.bias.data.fill_(0.0)
        self.CL2_K = CL2_K;
        self.CL2_F = CL2_F;
        # FC1
        self.fc1 = nn.Linear(FC1Fin, FC1_F)
        Fin = FC1Fin;
        Fout = FC1_F;
        scale = np.sqrt(2.0 / (Fin + Fout))
        self.fc1.weight.data.uniform_(-scale, scale)
        self.fc1.bias.data.fill_(0.0)
        self.FC1Fin = FC1Fin
        # FC2
        self.fc2 = nn.Linear(FC1_F, FC2_F)
        Fin = FC1_F;
        Fout = FC2_F;
        scale = np.sqrt(2.0 / (Fin + Fout))
        self.fc2.weight.data.uniform_(-scale, scale)
        self.fc2.bias.data.fill_(0.0)
        # nb of parameters
        nb_param = CL1_K * CL1_F + CL1_F  # CL1
        nb_param += CL2_K * CL1_F * CL2_F + CL2_F  # CL2
        nb_param += FC1Fin * FC1_F + FC1_F  # FC1
        nb_param += FC1_F * FC2_F + FC2_F  # FC2
        print('nb of parameters=', nb_param, '\n')
    def init_weights(self, W, Fin, Fout):
        scale = np.sqrt(2.0 / (Fin + Fout))
        W.uniform_(-scale, scale)
        return W
    def graph_conv_cheby(self, x, cl, L, lmax, Fout, K):
        # parameters
        # B = batch size
        # V = nb vertices
        # Fin = nb input features
        # Fout = nb output features
        # K = Chebyshev order & support size
        B, V, Fin = x.size();
        B, V, Fin = int(B), int(V), int(Fin)
        # rescale Laplacian
        lmax = lmax_L(L)
        L = rescale_L(L, lmax)
        # convert scipy sparse matric L to pytorch
        L = L.tocoo()
        indices = np.column_stack((L.row, L.col)).T
        indices = indices.astype(np.int64)
        indices = torch.from_numpy(indices)
        indices = indices.type(torch.LongTensor)
        L_data = L.data.astype(np.float32)
        L_data = torch.from_numpy(L_data)
        L_data = L_data.type(torch.FloatTensor)
        L = torch.sparse.FloatTensor(indices, L_data, torch.Size(L.shape))
        L = Variable(L, requires_grad=False)
        if torch.cuda.is_available():
            L = L.cuda()
        # transform to Chebyshev basis
        x0 = x.permute(1, 2, 0).contiguous()  # V x Fin x B
        x0 = x0.view([V, Fin * B])  # V x Fin*B
        x = x0.unsqueeze(0)  # 1 x V x Fin*B
        def concat(x, x_):
            x_ = x_.unsqueeze(0)  # 1 x V x Fin*B
            return torch.cat((x, x_), 0)  # K x V x Fin*B
        if K > 1:
            x1 = my_sparse_mm()(L, x0)  # V x Fin*B
            x = torch.cat((x, x1.unsqueeze(0)), 0)  # 2 x V x Fin*B
        for k in range(2, K):
            x2 = 2 * my_sparse_mm()(L, x1) - x0
            x = torch.cat((x, x2.unsqueeze(0)), 0)  # M x Fin*B
            x0, x1 = x1, x2
        x = x.view([K, V, Fin, B])  # K x V x Fin x B
        x = x.permute(3, 1, 2, 0).contiguous()  # B x V x Fin x K
        x = x.view([B * V, Fin * K])  # B*V x Fin*K
        # Compose linearly Fin features to get Fout features
        x = cl(x)  # B*V x Fout
        x = x.view([B, V, Fout])  # B x V x Fout
        return x
    # Max pooling of size p. Must be a power of 2.
    def graph_max_pool(self, x, p):
        if p > 1:
            x = x.permute(0, 2, 1).contiguous()  # x = B x F x V
            x = nn.MaxPool1d(p)(x)  # B x F x V/p
            x = x.permute(0, 2, 1).contiguous()  # x = B x V/p x F
            return x
        else:
            return x
    def forward(self, x, d, L, lmax):
        # graph CL1
        x = x.unsqueeze(2)  # B x V x Fin=1
        x = self.graph_conv_cheby(x, self.cl1, L[0], lmax[0], self.CL1_F, self.CL1_K)
        x = F.relu(x)
        x = self.graph_max_pool(x, 4)
        # graph CL2
        x = self.graph_conv_cheby(x, self.cl2, L[2], lmax[2], self.CL2_F, self.CL2_K)
        x = F.relu(x)
        x = self.graph_max_pool(x, 4)
        # FC1
        x = x.view(-1, self.FC1Fin)
        x = self.fc1(x)
        x = F.relu(x)
        x = nn.Dropout(d)(x)
        # FC2
        x = self.fc2(x)
        return x
    def loss(self, y, y_target, l2_regularization):
        print('loss1')
        loss = nn.CrossEntropyLoss()(y, y_target)
        print('loss2')
        l2_loss = 0.0
        print('loss3')
        for param in self.parameters(): # here the except
            try:

                print('loss4')
                data = param * param
                print('loss5')
                l2_loss += data.sum()
                print('loss6')
                loss += 0.5 * l2_regularization * l2_loss
                print('loss7')
                return loss
            except:
                import ipdb;ipdb.set_trace()



    def update(self, lr):
        update = torch.optim.SGD(self.parameters(), lr=lr, momentum=0.9)
        return update
    def update_learning_rate(self, optimizer, lr):
        for param_group in optimizer.param_groups:
            param_group['lr'] = lr
        return optimizer
    def evaluation(self, y_predicted, test_l):
        _, class_predicted = torch.max(y_predicted.data, 1)
        return 100.0 * (class_predicted == test_l).sum() / y_predicted.size(0)

let’s run

try:
    del net
    print('Delete existing network\n')
except NameError:
    print('No existing network to delete\n')

# network parameters
D = train_data.shape[1]
CL1_F = 32
CL1_K = 25
CL2_F = 64
CL2_K = 25
FC1_F = 512
FC2_F = 10
net_parameters = [D, CL1_F, CL1_K, CL2_F, CL2_K, FC1_F, FC2_F]

# instantiate the object net of the class
net = Graph_ConvNet_LeNet5(net_parameters)
if torch.cuda.is_available():
    net.cuda()
print(net)

# Weights
L_net = list(net.parameters())

# learning parameters
learning_rate = 0.05
dropout_value = 0.5
l2_regularization = 5e-4
batch_size = 100
num_epochs = 2
train_size = train_data.shape[0]
nb_iter = int(num_epochs * train_size) // batch_size
print('num_epochs=', num_epochs, ', train_size=', train_size, ', nb_iter=', nb_iter)

# Optimizer
global_lr = learning_rate
global_step = 0
decay = 0.95
decay_steps = train_size
lr = learning_rate
optimizer = net.update(lr)

# loop over epochs
indices = collections.deque()
e=[]
training_loss=[]
accurary_train=[]
accuracy_test=[]
for epoch in range(num_epochs):  # loop over the dataset multiple times
    try:
        # reshuffle
        indices.extend(np.random.permutation(train_size))  # rand permutation
        # reset time
        t_start = time.time()
        # extract batches
        running_loss = 0.0
        running_accuray = 0
        running_total = 0
        while len(indices) >= batch_size:
            # extract batches
            print('1')
            batch_idx = [indices.popleft() for i in range(batch_size)]
            print('2')
            train_x, train_y = train_data[batch_idx, :], train_labels[batch_idx]
            print('3')
            train_x = Variable(torch.FloatTensor(train_x).type(dtypeFloat), requires_grad=False)
            print('4')
            train_y = train_y.astype(np.int64)
            print('5')
            train_y = torch.LongTensor(train_y).type(dtypeLong)
            print('6')
            train_y = Variable(train_y, requires_grad=False)
            # Forward
            print('7')
            y = net.forward(train_x, dropout_value, L, lmax)
            print(y)
            print(train_y)
            print('8')
            loss = net.loss(y, train_y, l2_regularization)
            print('9')
            loss_train = loss.data[0]
            # Accuracy
            print('10')
            acc_train = net.evaluation(y, train_y.data)
            # backward
            loss.backward()
            # Update
            global_step += batch_size  # to update learning rate
            optimizer.step()
            optimizer.zero_grad()
            # loss, accuracy
            running_loss += loss_train
            running_accuray += acc_train
            running_total += 1
            # print
            if not running_total % 100:  # print every x mini-batches
                print('epoch= %d, i= %4d, loss(batch)= %.4f, accuray(batch)= %.2f' % (
                    epoch + 1, running_total, loss_train, acc_train))
        # print
        t_stop = time.time() - t_start
        e.append(epoch)
        training_loss(running_loss / running_total)
        accurary_train(running_accuray / running_total)
        print('epoch= %d, loss(train)= %.3f, accuracy(train)= %.3f, time= %.3f, lr= %.5f' %
              (epoch + 1, running_loss / running_total, running_accuray / running_total, t_stop, lr))
        # update learning rate
        lr = global_lr * pow(decay, float(global_step // decay_steps))
        optimizer = net.update_learning_rate(optimizer, lr)
        # Test set
        running_accuray_test = 0
        running_total_test = 0
        indices_test = collections.deque()
        indices_test.extend(range(test_data.shape[0]))
        t_start_test = time.time()
        while len(indices_test) >= batch_size:
            print('hi')
            batch_idx_test = [indices_test.popleft() for i in range(batch_size)]
            print('hi2')
            test_x, test_y = test_data[batch_idx_test, :], test_labels[batch_idx_test]
            print('hi3')
            test_x = Variable(torch.FloatTensor(test_x).type(dtypeFloat), requires_grad=False)
            print('hi4')
            y = net.forward(test_x, 0.0, L, lmax)
            print('hi5')
            test_y = test_y.astype(np.int64)
            test_y = torch.LongTensor(test_y).type(dtypeLong)
            test_y = Variable(test_y, requires_grad=False)
            acc_test = net.evaluation(y, test_y.data)
            running_accuray_test += acc_test
            running_total_test += 1
        t_stop_test = time.time() - t_start_test
        accuracy_test.append(running_accuray_test / running_total_test)
        print('  accuracy(test) = %.3f %%, time= %.3f' % (running_accuray_test / running_total_test, t_stop_test))
        e.append(epoch)
        training_loss(running_loss / running_total)
        accurary_train(running_accuray / running_total)
    except:
        import ipdb;ipdb.set_trace()

Actually the bug occurs here :slight_smile:

 print('8')
 loss = net.loss(y, train_y, l2_regularization)  # within this function

Thank you for you help

I think it’s more likely that the error is here:

Can you print the sizes of y, y_target and y_target.min() and y_target.max()? Maybe y_target has some values outside of the class index range.

1 Like

Thank you for your answer it really helps. The bug is that l have 90 classes but l set FC laster layer to 10 rather than 90 .
Thanks again

1 Like

This was gold! Exactly what I was looking for.

Even in my case this is happening I have a classes in the range of 0 to 12 but still my target.min() and target.max() is 0 to 442.
target shape is [8,1,128,128]
output shape is [8,13,128,128]
Thanks if you can help.

How did you create the target?
Did you use and built-in methods, e.g. ImageFolder?

I am working on https://github.com/mit-han-lab/proxylessnas/tree/e9cb66de5612531576d9194226477467b1b2d885
but when I am evaluating it on imagenet dataset, using below command :–
python eval.py --path ‘Your path to imagent’ --arch proxyless_cpu # pytorch ImageNet

it gives me below error

/home/gashwin1/tools/compiled_tools/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.
/home/gashwin1/tools/compiled_tools/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
Test: 0%| | 0/391 [01:55<?, ?it/s]

Traceback (most recent call last):
File “eval.py”, line 152, in
losses.update(loss.item(), _input.size(0))
RuntimeError: CUDA error: device-side assert triggered

Any thoughts please

The error message points to an out-of-bounds indexing in nn.NLLLoss or nn.CrossEntropyLoss (which calls into nn.NLLLoss).
Usually this happens, if your target doesn’t contain class indices in the range [0, nb_classes-1].

Could you check the min and max values of your target tensors and make sure they are in the right range?

where to check min and max values of target tensors as per the github link that I have provided in the question ? I am analysing this code and trying to run it on imagenet dataset.
Can you please let me know in which file min and max values would be and what would be the possible range as per the given code ?
Your answer would be appreciated

I’m not sure which script you are using, but check the target shape before calculating the loss:

print(target.min(), target.max())
loss = criterion(output, target)

The expected range is posted in the previous answer as [0, nb_classes-1], i.e. if you are dealing with 5 classes, the target tensor should contain the values [0, 1, 2, 3, 4].