Hi all,
I’m trying to train a NN with multiple channels as inputs. Each channel is approximately 6 GB. It works when I only use 5 channels but failed with 8 channels because of the memory limit. The code for my dataset is the following:
n_features = len(forcings)
f_data = xa.open_dataarray(path+forcings[0]+'.nc')
self.x_d = np.zeros((
f_data.shape[0], n_features, f_data.shape[1], f_data.shape[2]))
for i, forcing in enumerate(tqdm(forcings, desc='Forcings')):
f_data = xa.open_dataarray(path+forcing+'.nc').data
print(forcing, ' loaded')
f_data[np.isnan(f_data)] = 0
self.x_d[:, i] = f_data
del f_data
self.x_d, self.y_d = torch.from_numpy(self.x_d.astype('float32')), \
torch.from_numpy((self.y_d.astype('float32')))
And getitem is simple:
def __getitem__(self, index):
return self.x_d[index], self.x_s, self.y_d[index]
Is it possible to do a ‘lazy’ loading with different channels? And is there any way to speed up the io process? For the first four channels, it takes 30 s to load one channel and the time increases to 15 mins for the rest channels.
Thanks!