lst = [1, 3, 4, 5, 6, 7, 8, 9, 10]
d = SquadDataset(encodings = lst)
for _ in range(10):
e = next(iter(d))
I get only a long list of zeros.
Shouldn’t idx be a random number?
Shouldn’t the method getitem call call before generating the number? Using a debugger I realized, that this is not the case (it does not call call but getitem directly)
No, your approach won’t solve the actual issue of recreating the iterator as shown in my example and you are still returning the same samples defined by the batch_size, i.e. you are missing the last samples.
If you reduce the batch_size to 1, you are seeing the same issue:
data = DataLoader(SquadDataset(encodings = lst), batch_size = 1)
for _ in range(10):
e = next(iter(data))
# 0
# 0
# 0
# 0
# 0
# 0
# 0
# 0
# 0
# 0
Your solution is using a DataLoader would only work if batch_size >= num_samples, i.e. if you are returning all samples in a single next(iter(loader)) call. In your example you are never receiving the last list entry (10) and you can use my code snippet to see that recreating the iterator inside the loop is causing the issue.
The __len__ attribute is used to initialize the sampler used in a DataLoader. I’m currently unsure if you are trolling or just show no interest in understanding why recreating the iterator inside a nested loop will not sample from the entire dataset. I’ve already posted code snippets which are executable and which you can copy/paste to run. With that being said, feel free to stick to your approach. For other users, please don’t use it as you approach will miss samples and make sure to create the iterator outside the sampling loop.