Is there a way to the DataLoader
machinery with unlabeled data?
Yes, DataLoader
doesn’t have any conditions on the number of outputs of your Dataset
as seen here:
class MyDataset(Dataset):
def __init__(self):
self.data = torch.randn(100, 1)
def __getitem__(self, index):
x = self.data[index]
return x
def __len__(self):
return len(self.data)
dataset = MyDataset()
loader = DataLoader(
dataset,
batch_size=5,
num_workers=2,
shuffle=True
)
for data in loader:
print(data.shape)
Hello, I am working with DataLoader for the first time and have some problems.
I have my data in json file like {“doc_id_1”: [sentence1, sentence2, …], …} and want to create a DataLoader with these data to get then a sentence embedding. The problem is that, there are no labels (doc_id is not a true label, it is just the id of the document).
I have defined the class straightforward like in the example above:
class DocDataset(Dataset):
def __init__(self, json_file):
self.data = json.load(open(json_file))
def __getitem__(self, index):
x = self.data[index]
return x
def __len__(self):
return len(self.data)
json_file = 'dataset.json'
dataset = DocDataset(json_file)
loader = DataLoader(dataset, batch_size=5, num_workers=2, shuffle=True)
I might have a problem in getitem function, because I have a “KeyError”.
Thank you an advance!
Hello again, I think, I could solve the problem with KeyError:
def __getitem__(self, index):
for i, sent_list in enumerate(self.data.values()):
x = list(self.data.values())[i]
return x
I can’t get data.shape like in your example, since I’m working with list. And according to the next Error I’ve got, this must not be a list. So I defined it still wrong. Could you please tell me, where I get wrong?
train_dataloader = DataLoader(dataset, batch_size=5, num_workers=2, shuffle=True)
train_loss = losses.CosineSimilarityLoss(model)
#Tune the model
model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100)
AttributeError: 'list' object has no attribute 'texts'
I’m not sure where the new error is raised, as I cannot see any usage of the text
attribute. Could you check, which function is trying to access this attribute and make sure it’s using the right object?