data.Dataset inheritance to write my own dataset read class

freelv · December 20, 2018, 1:55pm

Today, when I was trying to deal with my own dataset with one-hot label, something strange happened. My snippet is like this:
`class LCZDataset(data.Dataset):
def init(self,path):
self.path = path
self.file = h5py.File(self.path)
self.sen1 = self.file[‘sen1’]
self.sen2 = self.file[“sen2”]
self.label = self.file[“label”]

def __getitem__(self,index):
    index_sen1 = torch.tensor(np.array(self.sen1[index])).permute(2,0,1)
    index_sen2 = torch.tensor(np.array(self.sen2[index])).permute(2,0,1)
    
    label = int(np.argwhere(self.label[index]==1))
   
    sen_tensor = torch.cat((index_sen1,index_sen2),0)  
    
    return sen_tensor,label 

def __len__(self):
    return self.sen1.shape[0] `

Then, it errors " only size-1 arrays can be converted to Python scalars". I tried to print the self.label[index], and it printed out one-hot . However, the label of my dataset is absolutely in one-hot form. Have any one of you meet the same problem? Do you know why? Thank you.

ptrblck · December 20, 2018, 2:35pm

Could you post the stack trace with the line of code the error is thrown?

freelv · December 21, 2018, 3:03am

Thank you for your reply. The stack trace is like this.

ptrblck · December 21, 2018, 12:01pm

It looks like np.argwhere returns more then one position.
Could you check, that each label is really one-hot encoded?
Probably at some index you have more then a single 1 in your label.
Printing the index and running into the error again might help to locate the problematic label.

freelv · December 23, 2018, 7:21am

My data label is absolutely in one-hot form. I finally find the problem. Before, the num_workers in data.Dataloader was set as 8. Then, I unintentionally changed it to 1, everything became right, label prints norm and training becomes norm. So, could you please tell me what caused my problem? Why num_workers is 1 is all right?

smth · December 23, 2018, 7:29am

h5py (and underlying HDF5 library) does not work with Python multiprocessing. It is a known issue.

This thread has some discussion and possible workarounds: HDF5 Multi Threaded Alternative

freelv · December 23, 2018, 10:39am

Thank you very much.