I have a dataset whose labels are from 0 to 39. And i wrap it using torch.utils.data.DataLoader, if i set num_workers to be 0, everything works fine, However if it is set to be 2, then the labels batch (a 1-D byte tensor)it loads at some epoch always seems to be bigger than 39, which seems to be 255. what causes this problem? any help? ( P.S. my dataset is .h5 file).
maybe your cpu only have 1 worker
hi, guy, what do you mean by one worker? i used to run on the same machine using 2 workers with other project, and it is fine. By the way, the code works fine using 2 workers util some random epoch of trianing when it output labels with value 255 to stop my training. I guess the code here may cause the problem. any idea
below is my code.
from __future__ import print_function import torch.utils.data as data import os import os.path import errno import torch import json import h5py from IPython.core.debugger import Tracer debug_here = Tracer() import numpy as np import sys import json class Modelnet40_V12_Dataset(data.Dataset): def __init__(self, data_dir, image_size = 224, train=True): self.image_size = image_size self.data_dir = data_dir self.train = train file_path = os.path.join(self.data_dir, 'modelnet40.h5') self.modelnet40_data = h5py.File(file_path) if self.train: self.train_data = self.modelnet40_data['train']['data'] self.train_labels = self.modelnet40_data['train']['label'] else: self.test_data = self.modelnet40_data['test']['data'] self.test_labels = self.modelnet40_data['test']['label'] def __getitem__(self, index): if self.train: shape_12v, label = self.train_data[index], self.train_labels[index] else: shape_12v, label = self.test_data[index], self.test_labels[index] return shape_12v, label def __len__(self): if self.train: return self.train_data.shape else: return self.test_data.shape if __name__ == '__main__': print('test') train_dataset = Modelnet40_V12_Dataset(data_dir='path/data', train=True) print(len(train_dataset)) test_dataset = Modelnet40_V12_Dataset(data_dir='path/data', train=False) print(len(test_dataset)) train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True, num_workers=2) total = 0 # debug_here() # check when to cause labels error for epoch in range(200): print('epoch', epoch) for i, (input_v, labels) in enumerate(train_loader): total = total + labels.size(0) # labels can be 255, what is the problem?? if labels.max() > 40: debug_here() print('error') if labels.min() < 1: debug_here() print('error') labels.sub_(1) # minus 1 in place if labels.max() >= 40: debug_here() print('error') if labels.min() < 0: debug_here() print('error') print(total)
can someone give some help ??
if your data is numpy.array, you can try like this
self.train_data = torch.from_numpy(self.modelnet40_data['train']['data'].value)
yes, my data is indeed numpy array, i will try it and see if it works
Thank you very much, it solves my problem.
hi, this could partly solve my problem. because this method loads all the data into the memory. However, when the dataset is big(in
.h5 file), this it is impractical to load all the data to the memory. donot it?? And the problem still exists.
yes, this method loads all the data into memory. If the data is large, I guess you can do this way. (I don’t try this)
def __getitem__(self, index): if self.train: shape_12v, label = self.modelnet40_data['train']['data'][index], self.modelnet40_data['train']['label'][index]
I don’t know if it works, you can tell me if you try.
hi, i tried this one, but it still doesnot work. And I suspect this issue is related to the multi-thread synchronization issues in dataloader class.
sorry I don’t know how to solve this one
I have been seeing similar problems with DataLoader when num_workers is greater than 1. My per sample label is [1, 0 …, 0] array. When loading a batch of samples, most of the labels are OK, but I could get something like [70, 250, …, 90] in one row. This problem does not exist when num_workers=1.
Any solution or suggestions?
I have also met similar problems. Does anyone can figure out how to solve it? Thanks a lot!
This is always the case if you are using Windows (in my computer).
Try it from the command line, not from Jupyter.
Thanks. But I use pytorch in Linux(Archlinux), and the version of pytorch is 0.2-post2.
From the command line or from Jupyter/PyCharm?
Hi @QuantScientist, I run my code from command line.
Can you share your full source code so that I can try it and see that it works on my system?
I have the same problem! How do you solve the problem?
Besides, seemingly there is little anserwers about that.
This might be related to those two issues. What version of PyTorch are you using? Perhaps updating to 0.4.1 might help.