"sizes must be non-negative" But aren't they? Indexes in sparse tensors

kchalk · December 12, 2018, 3:47am

Can anyone tell me what mistake I’m making here? My error message is:

     22         # Load data and get label
     23         X = torch.sparse.FloatTensor(
---> 24             torch.LongTensor([self.i[ID]]),
     25             torch.FloatTensor(self.v[ID]),
     26             torch.Size([self.vecSize])

RuntimeError: sizes must be non-negative

I did of course check that none of the values in self.i.[ID] are negative…
This results from calling my dataGenerator at Epoch [1/5], Step [33/2814] :

    for local_batch, local_labels in training_generator:
       #a batch training step

Below is definition of my dataset, the main challenge of which is converting from PySpark sparse vectors to tensors.

from torch.utils import data

class Dataset(data.Dataset):
  def __init__(self, list_IDs, labels, indices,values,vecSize):

        #all inputs except list_IDs are dictionaries keyed to ID

        'Initialization'
        self.labels = labels
        self.list_IDs = list_IDs
        self.i= indices
        self.v =values
        self.vecSize=vecSize

  def __len__(self):
        return len(self.list_IDs)

  def __getitem__(self, index):
        # Select sample
        ID = self.list_IDs[index]

        # Load data and get label
        X = torch.sparse.FloatTensor(
            torch.LongTensor([self.i[ID]]),
            torch.FloatTensor(self.v[ID]), 
            torch.Size([self.vecSize])
        ).to_dense()
        i = self.labels[ID]

        return X, torch.LongTensor([i])
    
training_set = Dataset(partition['train'], labels_train,i_train,v_train,inputSize)
training_generator = data.DataLoader(training_set, **params)

In the spirit of citation, this code is heavily influenced by: https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel

rasbt · December 12, 2018, 4:00am

I don’t think this error refers to the values in the tensor but rather the vector sizes. I.e, size in terms of .size(). E.g., see

>>> torch.zeros(1, 2, 3)
tensor([[[0., 0., 0.],
         [0., 0., 0.]]])
>>> torch.zeros(1, 2, 3, 0) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: sizes must be non-negative

kchalk · December 12, 2018, 4:06am

Thanks rasbt. Your explanation makes sense. In the line it’s complaining about, I’m not setting any sizes except for the size of the sparse vector, which is always ~8000. It’s set once and none of the other data loaders are upset. So how do I interpret this in the context of my code?

X = torch.sparse.FloatTensor(
            torch.LongTensor([self.i[ID]]),
            torch.FloatTensor(self.v[ID]), 
            torch.Size([self.vecSize])
        ).to_dense()
        i = self.labels[ID]

rasbt · December 12, 2018, 4:21am

Not sure, but could be a similar issue related to emptry tensors, like

i = torch.LongTensor([[], [], []])
v = torch.FloatTensor([3,      4,      5    ])
torch.sparse.FloatTensor(i.t(), v, torch.Size([2, 3])).to_dense()

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-34-cc45613f2ec1> in <module>()
----> 1 i = torch.LongTensor([[], [], []])
      2 v = torch.FloatTensor([3,      4,      5    ])
      3 torch.sparse.FloatTensor(i.t(), v, torch.Size([2, 3])).to_dense()

RuntimeError: sizes must be non-negative

kchalk · December 12, 2018, 4:29am

Oh that could happen in my data. Somewhat unlikely (that none of at least 100 words in a reddit post were in vocabulary), but definitely possible. I’ll look for that.

Is there a good way to find out what datum is causing the issue? (Other than knowing the problem and searching input) I haven’t found many examples using dataloaders… – I think I’ve answered my own question and realized that I really should be writing in a better debugging environment, but if you have other suggestions I’d love to hear them.

rasbt · December 12, 2018, 4:32am

While I know I should use debuggers more often, I would simply try to print the last datum since it should still be in memory if you are in an interactive environment.

Alt. you could implement a try-except condition and print the info in the except part if the error occurs

E.g., sth like

try:
    ...
except RuntimeError:
    print the necessary info

kchalk · December 12, 2018, 5:00am

@rasbt Huh… yeah I got a try/except into the data loader and found an empty index list. That was it. Thank you so much for the unsticking. Grad school and finals are hard and help is very much appreciated.