I have two issues that I believe are related.
I have my dataloader specified to work with numerous workers but once I created a custom random sampler class, the loading to the gpu became single threaded and when I view nvidia-smi -l there are never more spawned to load to the gpu. When loading tensors sequentially by idx then the multiple workers are spawned and perform well. I am using a connection pool to provide database connections to the multiple workers.
How do I use a custom random sampler with numerous workers in my dataloader?
My dataloader works for a single batch but when I place the loader inside of an epoch loop I get a CUDA error: initialization error from my call of label_stack = torch.stack(label_list).to(‘cuda’) in the getitem method.
Since the getitem plays a role in both of these issues I figured it is an issue with my getitem method which I posted below.
Is the issue because I return an entire batch of tensors from my getitem method?
How do I avoid this runtime error to allow for multiple epochs of training?
def __getitem__(self, idx): query = """SELECT ls.taxonomic_id, it.tensor FROM genomics.tensors2 AS it INNER JOIN genomics.labeled_sequences AS ls ON ls.accession_number = it.accession_number WHERE (%s) <= it.index AND CARDINALITY(tensor) = 89 LIMIT (%s) OFFSET (%s)""" shuffle_query = """ SELECT ls.taxonomic_id, it.tensor FROM genomics.tensors2 AS it INNER JOIN genomics.labeled_sequences AS ls ON ls.accession_number = it.accession_number WHERE (%s) <= it.index AND CARDINALITY(tensor) = 89 LIMIT (%s) """ batch_size = 500 query_data = (idx, batch_size, batch_size) shuffle_query_data = (idx, batch_size) result = None results = None conn = self.conn_pool.getconn() try: conn.set_session(readonly=True, autocommit=True) cursor = conn.cursor() cursor.execute(shuffle_query, shuffle_query_data) results = cursor.fetchall() self.conn_pool.putconn(conn) print(idx) except Error as conn_pool_error: print('Multithreaded __getitem__ query error') print(conn_pool_error) label_list =  sequence_list =  for (i,result) in enumerate(results): if result is not None: result = self.create_batch_stack_element(result) if result is not None: label_list.append(result) sequence_list.append(result) label_stack = torch.stack(label_list).to('cuda') sequence_stack = torch.stack(sequence_list).to('cuda') #print('label_stack.size') #print(label_stack.size()) #print('sequence_stack.size') #print(sequence_stack.size()) return (label_stack, sequence_stack)
E: I think I may have realized the errors in my way and am going to attempt to wrote a custom collate_fn to create the batches instead of performing this in the getitem method.
E2: The problem for the second issue was the .to(‘cuda’) statement. I moved it outside of the function and to the training loop on the label and sequences and I got a single thread loading to the GPU with 2 epochs and correct batch sizes.
E3: The problem for the first issue was solved by using torch.multiprocessing to call multiple trains which each call a single dataloader instead of performing the multiprocessing at the dataloader level.