GPU integer multiplication overflow

Hi I have a simple model below that runs on cpu and returns the output as expected. The moment I set this model to CUDA I get the interger multiplication overflow issue. This model only embeds a set of integers so it’s not clear to me where this is coming from.

import torch
import numpy

device = (torch.device('cuda') if torch.cuda.is_available()
else torch.device('cpu'))

class SmallModel(torch.nn.Module):
    def __init__(self, skill_num, emb_size, max_seq_length):
        self.skill_embeddings=torch.nn.Embedding(self.skill_num, self.emb_size)
        self.embd_pos = torch.nn.Embedding(self.max_seq_length , self.emb_size)

    def forward(self, x, y):
        query = self.skill_embeddings(x) # shape bs X seq_len X emb_size 
#         mask_labels = y * (y > -1).long()
#         key = self.inter_embeddings(x+mask_labels*self.skill_num)
#         values = self.inter_embeddings(x+mask_labels*self.skill_num)
#         pos = self.embd_pos(torch.arange(x.shape[1]))
#         key = key+pos 
#         query = query+pos 
        return query

# create some mock data, 5 students with 10 seq length

# set model params

# init model
test_mod=SmallModel(skill_num=skill_num, emb_size=emb_size, max_seq_length=max_seq_length).to(device)

# run 
#query=test_mod(input, output)  # with cpu
query=test_mod(, # with gpu

Error message

RuntimeError                              Traceback (most recent call last)
RuntimeError: numel: integer multiplication overflow

Could you check the number of elements in your input and if this would overflow int32?
If so, you might need to use long as the dtype.

Thank you for the response.

torch.numel(input) gives 50, which I think is what I expect by setting dim to (5,10). I tried converting input into long tensor as below. Still get the same error.


And to be clear this only happens on cuda, works fine on cpu.

I cannot reproduce the issue using your code on my GPU with 2.1.0.dev20230605+cu121 and based on the provided shapes the error should never be raised.

Good to know. I wonder if it’s the version issue.


I doubt it, as I also cannot reproduce it in 1.13.1+cu117.

Ok this is an extremely bizzare thing. I reduced it all to bare bones, and made the input completely explicit. The embedding layer outside of the model returns what it should. The model itself runs and if I call query.shape I get the right shape output. The moment I call query to get the output I get the same int overflow error. This is on Vertex AI, python version 3.7.12.

Is this possibly because output (query) is on gpu and calling it results in an error? I don’t have a clue what else could be going on on something this simple.

import torch
import numpy


class SmallModel(torch.nn.Module):
    def __init__(self, skill_num, emb_size):
        self.skill_embeddings=torch.nn.Embedding(self.skill_num, self.emb_size)
    def forward(self, x):
        query = self.skill_embeddings(x) 
        return query

test_mod=SmallModel(skill_num=6, emb_size=12).to(device)


Follow up'cpu') gets rid of the error

I don’t know what Vertex AI is, but I’m just printing the CUDATensor and get the valid result.
For the sake of completeness: the error is raised here and I don’t know how it can fail in your setup.

Thanks for the help. Vertex AI is just google’s cloud computing platfrom. To close this out, I restarted the kernel and added this line before setting the device and the problem seems to be gone.

os.environ['CUDA_LAUNCH_BLOCKING'] = "1"