Dataset class seems to work properly, but next(iter(DataLoader)) throws an error

Capo · March 26, 2023, 11:30am

Hello,

Could anybody help me understand why I have the following error:

ValueError: cannot reshape array of size 262144 into shape (1,1024,1024)

The Dataset class looks as follows:

class MyDataset(Dataset):
    def __init__(self, paths, L, n, n_cut, transforms_=None):
        self.n = n
        self.n_cut = n_cut
        self.L = L
        self.transforms = transforms_
        self.dir_data = paths[0]
        self.dir_labels1 = paths[1]
        self.dir_labels2 = paths[2]
         
    def __len__(self):
        return self.L
    
    def __getitem__(self, i):
        
        data_ = pd.read_csv(self.dir_data,delimiter=",", skiprows=i*4, nrows=4)
        labels1 = pd.read_csv(self.dir_labels1, skiprows=1+self.n*i, nrows=self.n)
        labels2 = pd.read_csv(self.dir_labels2, skiprows=1+self.n*i, nrows=self.n)
        labels = np.array([
            *np.asarray(labels1.iloc[:,3]).astype(np.float).reshape(self.n,),
            *np.asarray(labels1.iloc[:,4]).astype(np.float).reshape(self.n,),
            *np.asarray(labels2.iloc[:,3]).astype(np.float).reshape(self.n,),
            *np.asarray(labels2.iloc[:,4]).astype(np.float).reshape(self.n,)
            ])
                
        data = np.asarray( data_).astype(np.float).reshape(1,self.n,self.n)
        
        if self.transforms:
            data = self.transforms(data).numpy().astype(np.float).reshape(1,self.n,self.n)
            
        y = labels
        return (data, y)

Then I create the dataset as follows:

images_dataset_file = os.path.join( "data","data.csv")
labels1_dataset_file = os.path.join( "data","labels1.csv")
labels2_dataset_file = os.path.join( "data","labels2.csv")

paths_ = [images_dataset_file,labels1_dataset_file,labels2_dataset_file]

n = 1024
n_data = 10000

data_all = MyDataset(paths_,n_data, n, n, None)

To test if data is retrieved properly I plot it as follows:

sample_number = 1

figure, ax1 = plt.subplots()
ax1.plot(data_all.__getitem__(sample_number)[1][0:n])
ax2 = ax1.twinx()
ax2.plot(data_all.__getitem__(sample_number)[1][n:2*n])

figure, ax1 = plt.subplots()
ax1.plot(data_all.__getitem__(sample_number)[1][2*n:3*n])
ax2 = ax1.twinx()
ax2.plot(data_all.__getitem__(sample_number)[1][3*n:4*n])

plt.figure()
grid = data_all.__getitem__(sample_number)[0][:].reshape((n, n)).T
plt.imshow(grid,interpolation='nearest')

which seems to produce correct plots.

Finally, I want to find the mean and std of the images as follows:

MyLoader = DataLoader(data_all, batch_size=len(data_all), num_workers=0)
data = next(iter(MyLoader)) # This line gives the error stated above
mean_value = data[0].mean()
std_value = data[0].std()

and I get the error, which I am not sure where it comes from.

The full error message:

MyLoader = DataLoader(data_all, batch_size=len(data_all), num_workers=0)
data = next(iter(MyLoader))
Traceback (most recent call last):

  File "<ipython-input-4758-6b21b2bd7609>", line 2, in <module>
    data = next(iter(MyLoader))

  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 363, in __next__
    data = self._next_data()

  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 403, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration

  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "...", line 1565, in __getitem__
    data = np.asarray( data_).astype(np.float).reshape(1,self.n,self.n)

ValueError: cannot reshape array of size 262144 into shape (1,1024,1024)

eqy · March 26, 2023, 7:26pm

Could it be that not all samples have the same number of values in the “grid”? e.g.,

data_all.__getitem__(sample_number)[0][:].reshape((n, n))

would work if the grid has 1024*1024 elements, but would fail if there are only 512*512 as you have set n to a fixed value of 1024 here.

Capo · March 29, 2023, 12:16pm

I doubt it, I tried getting several different subsets of the dataset, and the error persists