data.Dataset inheritance to write my own dataset read class

Today, when I was trying to deal with my own dataset with one-hot label, something strange happened. My snippet is like this:
`class LCZDataset(data.Dataset):
def init(self,path):
self.path = path
self.file = h5py.File(self.path)
self.sen1 = self.file[‘sen1’]
self.sen2 = self.file[“sen2”]
self.label = self.file[“label”]

def __getitem__(self,index):
    index_sen1 = torch.tensor(np.array(self.sen1[index])).permute(2,0,1)
    index_sen2 = torch.tensor(np.array(self.sen2[index])).permute(2,0,1)
    
    label = int(np.argwhere(self.label[index]==1))
   
    sen_tensor = torch.cat((index_sen1,index_sen2),0)  
    
    return sen_tensor,label 

def __len__(self):
    return self.sen1.shape[0] `

Then, it errors " only size-1 arrays can be converted to Python scalars". I tried to print the self.label[index], and it printed outone-hot . However, the label of my dataset is absolutely in one-hot form. Have any one of you meet the same problem? Do you know why? Thank you.

Could you post the stack trace with the line of code the error is thrown?

Thank you for your reply. The stack trace is like this.

It looks like np.argwhere returns more then one position.
Could you check, that each label is really one-hot encoded?
Probably at some index you have more then a single 1 in your label.
Printing the index and running into the error again might help to locate the problematic label.

My data label is absolutely in one-hot form. I finally find the problem. Before, the num_workers in data.Dataloader was set as 8. Then, I unintentionally changed it to 1, everything became right, label prints norm and training becomes norm. So, could you please tell me what caused my problem? Why num_workers is 1 is all right?

h5py (and underlying HDF5 library) does not work with Python multiprocessing. It is a known issue.

This thread has some discussion and possible workarounds: HDF5 Multi Threaded Alternative

1 Like

Thank you very much.