Dataloader initiating data set upon call of each batch

I am trying to load data into my Dataset class in the get_item function rather than init function because of the data is very large and it cannot be loaded all at once to memory. Since the index of DataLoader keeps I am keeping a record of length of previously loaded part of data but this length goes to 0 which was initiated in init every time a new batch is loaded. Is there a way to not call init function of DataSet?

`	def __init__(self, data_path, graph_args={}, train_val_test='train'):
		train_val_test: (train, val, test)
		self.data_path = data_path
		self.path_list = sorted(glob.glob(os.path.join(self.data_path,'*.txt')))
		self.all_mean_xy=[] = 0
		#total_num = len(self.all_feature)
		# equally choose validation set
	def __getitem__(self, idx):
		# C = 11: [frame_id, object_id, object_type, position_x, position_y, position_z, object_length, pbject_width, pbject_height, heading] + [mask]
			now_feature = self.all_feature[idx-self.prev].copy()
			path = self.path_list[] =
			self.all_feature, self.all_adjacency, self.all_mean_xy = generate_data(path)
			self.prev = self.feature_num
			self.feature_num = self.feature_num+len(self.all_feature)	
			now_feature = self.all_feature[idx-self.prev].copy()`

Be careful with trying to manipulate the Dataset in a DataLoader, if you are using multiple workers.
Each worker will use a clone of the Dataset, so that changes to the internal states of the Dataset will not be reflected.

Could this be the case for your issue?

1 Like

1.7 release and current master will have reset functionality, see

Thank you so much, making the num_workers to 0 solved the issue, But I guess Iā€™m restricting the capabilities of machine by loading the data sequentially.