Matrix multiplication for large sparse matrices which does not fit into GPU

I am trying to do matrix multiplication from a large dataframe, and cannot create the matrix, for the following statement.

scores = * freq_adjustment.unsqueeze(0), diagnoses.permute(1, 0))

The output in the console is “RuntimeError: CUDA out of memory. Tried to allocate 266.31 GiB (GPU 0; 7.79 GiB total capacity; 2.48 GiB already allocated; 3.52 GiB free; 2.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF”

I have 8GB of GPU. How can I do it efficiently in my GPU?

Hi Nasim,

What you can do is reduce the data type of the tensor from your current type to a lower dataset (like from float64 to int8 if your sparse matrices can be repeated by int8 without any issues) and split the tensor into multiple small pieces that can fit into memory.

I already reduced the datatype to int8, and I split the tensor to chunks. It returns a tuple, then how do I pass it to

splitted_diagnoses = torch.split(diagnoses, 1000)


You will split based on some columns and some rows as matrix multiplication are done, could be done in series, then you will concatenate the matrices together.