Hey everybody, I have these 2 lists as inputs_ids
and labels_ids
; values are token_ids that represent one token:
batch = 1
input_ids:
[[30, 26, 22, 21, 9, 1, 21, 26, 23],
[30, 26, 22, 0, 9, 14, 32, 2, 4, 0],
[30, 26, 22, 6, 24, 9, 1, 6, 24, 26, 31],
[30, 25, 12, 18, 3, 9, 27, 8],
[5, 12, 28, 20, 9, 15, 14, 28, 11],
[5, 12, 28, 10, 9, 19, 14, 28, 11],
[10, 29, 20, 9, 17, 16, 13]]
labels_ids:
[[26, 22, 21, 9, 1, 21, 26, 23, 9],
[26, 22, 0, 9, 14, 32, 2, 4, 0, 9],
[26, 22, 6, 24, 9, 1, 6, 24, 26, 31, 9],
[25, 12, 18, 3, 9, 27, 8, 9],
[12, 28, 20, 9, 15, 14, 28, 11, 9],
[12, 28, 10, 9, 19, 14, 28, 11, 9],
[29, 20, 9, 17, 16, 13, 9]]
I tried to build a dataset class and then a dataloader object:
class TokenDataset(Dataset):
def __init__(self,inputs_ids: List,labels_ids: List) -> None:
self.inputs_ids = inputs_ids
self.labels_ids = labels_ids
def __len__(self):
return len(self.labels_ids)
def __getitem__(self, idx):
input = self.inputs_ids[idx]
label = self.labels_ids[idx]
return input,label
dataset_dclass = TokenDataset(inputs_ids , labels_ids)
dataloader_dclass = DataLoader(dataset=dataset_dclass , batch_size=batch)
The only problem is that all values in each input and label are converted to a tensor
! I’m wonder why and I can’t understand:
dataset_dclass[0]
:
- Output:
([30, 26, 22, 21, 9, 1, 21, 26, 23], [26, 22, 21, 9, 1, 21, 26, 23, 9])
next(iter(dataloader_dclass))
- Output:
[[tensor([30]),tensor([26]),tensor([22]),tensor([21]),tensor([9]),tensor([1]),tensor([21]),tensor([26]),
tensor([23])],
[tensor([26]),tensor([22]),tensor([21]),tensor([9]),tensor([1]),tensor([21]),tensor([26]),tensor([23]),tensor([9])]]
- I expect to get this as an output for
next(iter(dataloader_dclass))
:
[ tensor([30, 26, 22, 21, 9, 1, 21, 26, 23]) , tensor([26, 22, 21, 9, 1, 21, 26, 23, 9]) ]