I have two issues that I believe are related.
First One:
I have my dataloader specified to work with numerous workers but once I created a custom random sampler class, the loading to the gpu became single threaded and when I view nvidia-smi -l there are never more spawned to load to the gpu. When loading tensors sequentially by idx then the multiple workers are spawned and perform well. I am using a connection pool to provide database connections to the multiple workers.
How do I use a custom random sampler with numerous workers in my dataloader?
Second One:
My dataloader works for a single batch but when I place the loader inside of an epoch loop I get a CUDA error: initialization error from my call of label_stack = torch.stack(label_list).to(‘cuda’) in the getitem method.
Since the getitem plays a role in both of these issues I figured it is an issue with my getitem method which I posted below.
Is the issue because I return an entire batch of tensors from my getitem method?
How do I avoid this runtime error to allow for multiple epochs of training?
def __getitem__(self, idx):
query = """SELECT ls.taxonomic_id, it.tensor
FROM genomics.tensors2 AS it
INNER JOIN genomics.labeled_sequences AS ls
ON ls.accession_number = it.accession_number
WHERE (%s) <= it.index
AND CARDINALITY(tensor) = 89
LIMIT (%s) OFFSET (%s)"""
shuffle_query = """
SELECT ls.taxonomic_id, it.tensor
FROM genomics.tensors2 AS it
INNER JOIN genomics.labeled_sequences AS ls
ON ls.accession_number = it.accession_number
WHERE (%s) <= it.index
AND CARDINALITY(tensor) = 89
LIMIT (%s)
"""
batch_size = 500
query_data = (idx, batch_size, batch_size)
shuffle_query_data = (idx, batch_size)
result = None
results = None
conn = self.conn_pool.getconn()
try:
conn.set_session(readonly=True, autocommit=True)
cursor = conn.cursor()
cursor.execute(shuffle_query, shuffle_query_data)
results = cursor.fetchall()
self.conn_pool.putconn(conn)
print(idx)
except Error as conn_pool_error:
print('Multithreaded __getitem__ query error')
print(conn_pool_error)
label_list = []
sequence_list = []
for (i,result) in enumerate(results):
if result is not None:
result = self.create_batch_stack_element(result)
if result is not None:
label_list.append(result[0])
sequence_list.append(result[1])
label_stack = torch.stack(label_list).to('cuda')
sequence_stack = torch.stack(sequence_list).to('cuda')
#print('label_stack.size')
#print(label_stack.size())
#print('sequence_stack.size')
#print(sequence_stack.size())
return (label_stack, sequence_stack)
E: I think I may have realized the errors in my way and am going to attempt to wrote a custom collate_fn to create the batches instead of performing this in the getitem method.
E2: The problem for the second issue was the .to(‘cuda’) statement. I moved it outside of the function and to the training loop on the label and sequences and I got a single thread loading to the GPU with 2 epochs and correct batch sizes.
E3: The problem for the first issue was solved by using torch.multiprocessing to call multiple trains which each call a single dataloader instead of performing the multiprocessing at the dataloader level.