Torch Dataset setting tensor to 0

Raspirat · December 7, 2023, 10:32am

I have the following Dataset:

class MidiDataset(torch.utils.data.Dataset):
    def __init__(self, ndata, tokenizer, batch_size, block_size):
        self.ndata = ndata
        self.tokenizer = tokenizer
        self.ndata_enc = tokenizer.encode(ndata.type(torch.float32), device)
        print("-------NDATAENC0-----")
        print(self.ndata_enc[0])
        self.batch_size = batch_size
        self.block_size = block_size

    def __len__(self):
        return len(self.ndata_enc)

    def __getitem__(self, index):
        print("-------GETITEM_NDATAENC-----")
        print(self.ndata_enc)
        return self.ndata_enc[index]

The init works just fine and outputs

-------NDATAENC0-----
tensor([4752., 4752., 4779., 4779., 4807., 4817., 4831., 4846., 4855., 4872.,
4884., 4884., 4912., 4912., 4912., 4912., 4884., 4985., 4998., 4998.,
5025., 5037., 5057., 5057., 5082., 5082., 5110., 5110., 5057., 5057.,
5057., 5057., 5200., 5213., 5229., 5244., 5261., 5274., 5291., 4872.,
4831., 5343., 4884., 5382., 5401., 5401., 5401., 5401., 4752., 4752.,
5510., 5510., 4779., 4779., 26., 37., 49., 49., 4884., 5382.,
5401., 5401., 5401., 5401.], device=‘cuda:0’)

as it should be. But in the getitem method every value in self.ndata_enc is set to 0:

------GETITEM_NDATAENC-----
tensor([[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
…,
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.]], device=‘cuda:0’)

Does anyone know why this is the case and how to fix it?
Thank you in advance. Any help is appreciated!

Raspirat · December 11, 2023, 3:13pm

I found out that this only happens on device cuda. Why could this be?