How to deal with high-dim sparse tensors in torch?

SakuraiJJ · April 1, 2024, 3:48pm

I am finding a way to treat high-dim sparse tensors.

I downloaded this data from the UCI dataset. It is a tabular dataset with a size of 67557 × 42. Each row can take a categorical quantity (o, ×, b).

import numpy as np
import torch
import pandas as pd
from ucimlrepo import fetch_ucirepo
dataset = fetch_ucirepo(id=26)

X = dataset.data.features
X

I want to convert this data into a sparse tensor as follows. Firstly, I replace (o, ×, b) with integers (0,1,2).

X = X.astype('category')
cat_columns = X.select_dtypes(['category']).columns
X[cat_columns] = X[cat_columns].apply(lambda x: x.cat.codes)
X

Now, X is an integer matrix. Then, I am trying to convert it into 42-dim sparse tensor.
Each row of the table can be regarded as a 42-dim coordinate where the tensor has 1 value. The tensor values in other indices are 0.

X_np = np.transpose(X.to_numpy())
(D, N) = np.shape(X_np)
X_np = X_np.astype(np.int32)
X_th = torch.from_numpy(X_np)
X_torch = torch.sparse_coo_tensor(X_th, np.ones(N) )

However, I got the following error:

RuntimeError
Traceback (most recent call last)
in
----> 1 X_torch = torch.sparse_coo_tensor(X_th, np.ones(N) )
RuntimeError: numel: integer multiplication overflow

When the number of features of table data is less, I can use torch.sparse_coo_tensor in this way. For example, just changing fetch_ucirepo(id=26) into fetch_ucirepo(id=101) (TicTacToe dataset), it works.

I am just confused because, essentially, the data quantity of a sparse tensor is not huge, even if it is highly dimensional. So, why does this method not work? What is the alternative to treating high-dimensional sparse tensors?

Note: The use of the UCI dataset above is for illustrative purposes, not because I want to turn table data into a tensor. I am looking for a way to handle high-dimensional sparse tensors in Torch.

HalveLuve1022 · March 11, 2025, 6:56am

Hi there. It’s been almost one year and now I run into similar problems. I would like to build a torch.sparse_coo_tensor using a dictionary with massive amount of entries, and it always throws the error of numel: integer multiplication overflow. Did you manage to solve the problem?