Random Sampler prevents multiple workers from spawning

I have two issues that I believe are related.

First One:

I have my dataloader specified to work with numerous workers but once I created a custom random sampler class, the loading to the gpu became single threaded and when I view nvidia-smi -l there are never more spawned to load to the gpu. When loading tensors sequentially by idx then the multiple workers are spawned and perform well. I am using a connection pool to provide database connections to the multiple workers.

How do I use a custom random sampler with numerous workers in my dataloader?

Second One:

My dataloader works for a single batch but when I place the loader inside of an epoch loop I get a CUDA error: initialization error from my call of label_stack = torch.stack(label_list).to(‘cuda’) in the getitem method.

Since the getitem plays a role in both of these issues I figured it is an issue with my getitem method which I posted below.

Is the issue because I return an entire batch of tensors from my getitem method?

How do I avoid this runtime error to allow for multiple epochs of training?

def __getitem__(self, idx):

    query = """SELECT ls.taxonomic_id, it.tensor
                FROM genomics.tensors2 AS it
                INNER JOIN genomics.labeled_sequences AS ls
                ON ls.accession_number = it.accession_number
                WHERE (%s) <= it.index
                AND CARDINALITY(tensor) = 89
                LIMIT (%s) OFFSET (%s)"""

    shuffle_query = """
                       SELECT ls.taxonomic_id, it.tensor
                       FROM genomics.tensors2 AS it
                       INNER JOIN genomics.labeled_sequences AS ls
                       ON ls.accession_number = it.accession_number
                       WHERE (%s) <= it.index
                       AND CARDINALITY(tensor) = 89
                       LIMIT (%s)
                       """


    batch_size = 500

    query_data = (idx, batch_size, batch_size)
    shuffle_query_data = (idx, batch_size)

    result = None
    results = None

    conn = self.conn_pool.getconn()

    try:
        conn.set_session(readonly=True, autocommit=True)
        cursor = conn.cursor()
        cursor.execute(shuffle_query, shuffle_query_data)
        results = cursor.fetchall()
        self.conn_pool.putconn(conn)
        print(idx)
    except Error as conn_pool_error:
        print('Multithreaded __getitem__ query error')
        print(conn_pool_error)

    label_list = []
    sequence_list = []

    for (i,result) in enumerate(results):

        if result is not None:
            result = self.create_batch_stack_element(result)

            if result is not None:
                label_list.append(result[0])
                sequence_list.append(result[1])

    label_stack = torch.stack(label_list).to('cuda')
    sequence_stack = torch.stack(sequence_list).to('cuda')

    #print('label_stack.size')
    #print(label_stack.size())
    #print('sequence_stack.size')
    #print(sequence_stack.size())

    return (label_stack, sequence_stack)

E: I think I may have realized the errors in my way and am going to attempt to wrote a custom collate_fn to create the batches instead of performing this in the getitem method.

E2: The problem for the second issue was the .to(‘cuda’) statement. I moved it outside of the function and to the training loop on the label and sequences and I got a single thread loading to the GPU with 2 epochs and correct batch sizes.

E3: The problem for the first issue was solved by using torch.multiprocessing to call multiple trains which each call a single dataloader instead of performing the multiprocessing at the dataloader level.