RuntimeError: CUDA error: device-side assert triggered `cumdist[size - 1] > static_cast<scalar_t>(0)` failed

Dongwon_Ryu · May 14, 2021, 2:37pm

Hi all

I am currently facing RuntimeError: CUDA error: device-side assert triggered when I am trying to sample. So, the code is,

softmax_tmpl_distribution = self.softmax(tmpl_distribution)
tmpl_token_id = softmax_tmpl_distribution.multinomial(num_samples=self.agent_tmpl_n)

tmpl_token_id_random_size = torch.ones([self.batch_size, self.agent_tmpl_n])
tmpl_token_id_random = F.softmax(tmpl_token_id_random_size, dim=-1).multinomial(num_samples=1).to(device)
tmpl_token_id = tmpl_token_id.gather(-1, tmpl_token_id_random)
# 1st error
# tmpl_token_id_random = torch.randint(self.agent_tmpl_n, (self.batch_size, 1))
# tmpl_token_id_random = tmpl_token_id_random.to(device) # error
# tmpl_token_id = tmpl_token_id.gather(-1, tmpl_token_id_random)
# 2nd error
# tmpl_token_id_random = torch.randint(self.agent_tmpl_n, (self.batch_size, 1))
# tmpl_token_id = tmpl_token_id.cpu() # error
# tmpl_token_id = tmpl_token_id.gather(-1, tmpl_token_id_random)
# tmpl_token_id = tmpl_token_id.to(device)
# 3rd error
# tmpl_token_id_random = torch.randint(self.agent_tmpl_n, (self.batch_size,))
# tmpl_token_id_random = tmpl_token_id_random.unsqueeze(-1).to(device) # error
# tmpl_token_id = tmpl_token_id.gather(-1, tmpl_token_id_random)
# errors are: /pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:87: int at::native::<unnamed>::bina
# rySearchForMultinomial(scalar_t *, scalar_t *, int, scalar_t) [with scalar_t = float]: block: [2,0,0],
# thread: [0,0,0] Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.

where tmpl_distribution is just output from network. I apply softmax to it and sample it through multinomial to get tmpl_token_id. This works. However, when I try to uniformly sample from tmpl_token_id using all the above method, including random.sample (I did not include this above), it just throws me an error message.

/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:87: int at::native::<unnamed>::binarySearchForMultinomial(scalar_t *, scalar_t *, int, scalar_t) [with scalar_t = float]: block: [6,0,0], thread: [0,0,0] Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.
Traceback (most recent call last):
  File "run.py", line 98, in <module>
    trainer.train(params['steps'])
  File "/projects/da33/dkRyu/main/phd-main-github/trainer/trainer.py", line 224, in train
    obs_reps, scores, graph_state_reps, graph_mask_tt
  File "/home/dryu0002/da33/dkRyu/conda-envs/main/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/projects/da33/dkRyu/main/phd-main-github/models/models.py", line 340, in forward
    tmpl_token_id = self.random_selecting_tmpl(tmpl_distribution)
  File "/projects/da33/dkRyu/main/phd-main-github/models/models.py", line 511, in random_selecting_tmpl
    tmpl_token_id_random = F.softmax(tmpl_token_id_random_size, dim=-1).multinomial(num_samples=1).to(device)
RuntimeError: CUDA error: device-side assert triggered

Thank you in advance

ptrblck · May 14, 2021, 8:26pm

Based on the error message MultinomialKernel fails with:

Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.

so it seems that invalid (negative) values are passed to this kernel.

Dongwon_Ryu · May 15, 2021, 6:44am

Thank you for your quick reply. I examined further and found this.

import torch
import torch.nn.functional as F

tensor = [
    -39.4640, -30.4600, -29.8240, -28.6050, -28.4820, -34.7070, -26.9110,
    -28.2490, -32.5890, -27.0400, -34.9490,  -1.7063, -36.6630, -46.3450,
    -37.5270, -38.0160, -33.5740, -40.9670, -30.6230, -35.1960, -39.3370,
    -35.7420, -38.7020, -60.9540, -35.6590, -38.3470, -26.1920, -34.4350,
    -27.2520, -43.7620, -34.3750, -44.2680, -29.7320, -36.3130, -31.2850,
    -32.4330, -38.2280, -38.1470, -31.9970, -34.7630, -31.4100, -26.1840,
    -47.5360, -19.4740, -35.3710, -19.8780, -28.0600, -35.7290, -39.2160,
    -33.4030, -31.3240,  -0.5209, -27.5570, -23.1630, -33.9700, -44.1090,
     -7.2233, -37.1100, -38.3130, -10.4190,  -5.7243, -31.1580, -51.2760,
    -13.0130, -22.0170, -17.6730, -43.7560, -23.4110, -33.3230, -31.4360,
    -46.2880, -28.1670, -25.4950, -34.3870, -31.4010, -32.2750, -27.2390,
    -39.0870, -29.7090, -39.4410, -40.9310, -32.0670, -37.1720, -38.8740,
    -47.4230, -37.0320, -29.7280,   6.5632, -33.0120, -28.7500, -37.2510,
    -46.9050, -25.2030, -39.9600, -29.4190, -28.0210, -32.1230, -23.1150,
    -41.5070,   2.2157, -33.7980, -33.1460, -44.1950, -31.1630, -37.2860,
    -42.8840,  -0.5351, -39.7500, -27.8490, -47.7960, -31.5340, -39.1790,
    -37.6900, -33.9880, -37.7890, -34.6530, -42.7190, -53.4900, -32.6800,
    -29.5050, -36.9310, -32.2680, -37.3720, -35.4500, -45.3900,  -1.6217,
     14.6630, -35.6150, -24.4610, -33.8140, -36.1280, -34.7040, -17.1700,
      0.3916, -33.1330, -41.6470, -30.9100, -38.7520, -42.8320, -45.4720,
    -35.4720, -28.6920, -34.7470,  -4.6244, -46.5550, -35.8260, -30.0120,
    -27.1230, -41.3560, -44.6070, -29.4610, -39.9710, -35.2720, -37.3260,
    -38.0640, -32.6790, -29.2660, -25.4990, -33.5540, -22.8300, -37.6420,
    -43.2870, -19.7370, -41.0410, -32.5190, -35.9680, -35.0390, -39.8550,
    -28.5200, -33.2970,  -3.3740, -34.7050, -31.8580, -39.1770, -52.6580,
    -43.4300, -24.9220, -30.2970, -26.0870, -31.1790, -29.3620, -40.6420,
     -3.0459, -30.5280, -32.6050, -30.7650, -33.7710, -24.9610, -30.7920,
    -33.4430, -40.1660, -45.0060, -38.5810, -40.9300, -26.7960, -32.1680,
    -37.3440, -43.3280, -24.4050, -27.9040, -32.3530, -31.2500, -23.1490,
     -3.8144, -41.0400, -33.4050, -41.3210, -38.0230, -40.0120, -37.9040,
    -31.1950, -38.1710, -26.5500, -17.9750, -25.8590, -37.0900, -29.6660,
    -17.0330, -29.1300, -33.1770, -32.3660, -37.6070,  -1.7284,  -1.6887,
    111.6700,  -5.1498,  -4.6433, -14.5410, -44.0680, -24.2630,   4.9073,
     -1.3849, -27.9320, -34.2830, -37.2280
]

tensor = torch.tensor(tensor)
dist = F.softmax(tensor, dim=-1)
print(dist)
print(dist.multinomial(1))
print(dist.multinomial(2))
print(dist.multinomial(3))

When I get multinomial of 3, there is an error

tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        7.4269e-43, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00])
tensor([224])
tensor([224,   0])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-251-36a133b1a74b> in <module>
     44 print(dist.multinomial(1))
     45 print(dist.multinomial(2))
---> 46 print(dist.multinomial(3))
     47 

RuntimeError: invalid multinomial distribution (with replacement=False, not enough non-negative category to sample)

I assume it is because the probability is extremely small, but it is not negative. Is this a bug? or how do I avoid this?

Thank you in advance

ptrblck · May 15, 2021, 7:01am

Your input yields only two probabilities != 0 for the logits tensor([ 14.6630, 111.6700]):

out = F.softmax(x, dim=0)
print(out[out!=0.])
> tensor([7.4269e-43, 1.0000e+00])

which causes the issue.
Since the magnitude is large [-60.9540, 111.6700], I’m unsure if you can avoid this issue using float32, so you might need to convert the logits to float64 by calling tensor.double() to tensor.to(torch.float64).

Dongwon_Ryu · May 15, 2021, 9:47am

Cheers. I got it now. Thank you for your help

mbugert · January 18, 2022, 8:49am

A minor correction, and answer to the overall question:

This assertion failure

/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:89: binarySearchForMultinomial: block: [0,3,0], thread: [64,0,0] Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.

happens when all input weights to torch.multinomial are zero (source).

When passing a negative weight value, the assertion failure looks like this (source).

/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:39: renormRowsL1: block: [0,0,0], thread: [464,0,0] Assertion `!THCNumerics<scalar_t>::lt(val, zero)` failed.