I have the following Dataset:
class MidiDataset(torch.utils.data.Dataset):
def __init__(self, ndata, tokenizer, batch_size, block_size):
self.ndata = ndata
self.tokenizer = tokenizer
self.ndata_enc = tokenizer.encode(ndata.type(torch.float32), device)
print("-------NDATAENC0-----")
print(self.ndata_enc[0])
self.batch_size = batch_size
self.block_size = block_size
def __len__(self):
return len(self.ndata_enc)
def __getitem__(self, index):
print("-------GETITEM_NDATAENC-----")
print(self.ndata_enc)
return self.ndata_enc[index]
The init works just fine and outputs
-------NDATAENC0-----
tensor([4752., 4752., 4779., 4779., 4807., 4817., 4831., 4846., 4855., 4872.,
4884., 4884., 4912., 4912., 4912., 4912., 4884., 4985., 4998., 4998.,
5025., 5037., 5057., 5057., 5082., 5082., 5110., 5110., 5057., 5057.,
5057., 5057., 5200., 5213., 5229., 5244., 5261., 5274., 5291., 4872.,
4831., 5343., 4884., 5382., 5401., 5401., 5401., 5401., 4752., 4752.,
5510., 5510., 4779., 4779., 26., 37., 49., 49., 4884., 5382.,
5401., 5401., 5401., 5401.], device=‘cuda:0’)
as it should be. But in the getitem method every value in self.ndata_enc is set to 0:
------GETITEM_NDATAENC-----
tensor([[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
…,
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.]], device=‘cuda:0’)
Does anyone know why this is the case and how to fix it?
Thank you in advance. Any help is appreciated!