Get current Batch-ID while train()

reinforce · August 22, 2022, 3:34pm

Hello,

I’m facing an problem of getting the current Batch-ID variable in PytorchIm enumerating over an data_loader with a Batch-Size of 16. My Dataset is therefore divided into 1640 Batches.

In one train()-iteration, one batch will loaded and the loss will be calculated. I would like to read the specific Batch-ID (723 for e.g.) while for-enumerating the data_loader.

How do I get access to that specific value inside the data_loader?

ptrblck · August 23, 2022, 1:05am

The batch index can be created via:

for batch_idx, data in enumerate(loader):

which will sequentially increase batch_idx in the range of len(loader). Is this what you are looking for?

reinforce · August 23, 2022, 11:05am

Thanks!
If shuffle=false I can assume that batch_idx stays the same for all iterations of the batches for all epochs?

azhanmohammed · August 23, 2022, 1:14pm

Yes keeping shuffle=False would mean that your fetched data remains the same for the same batch index for all epochs.

reinforce · August 23, 2022, 2:33pm

Allright thanks!

Is it also possible to define an own list with indices in a specific order for training?

ptrblck · August 23, 2022, 4:38pm

If you want to load the samples via the Dataset.__getitem__ in a specific order you could create a custom sampler and pass it to the DataLoader.
Take a look at these sampler implementations to see how they are written.

reinforce · August 23, 2022, 5:22pm

Thanks for your reply. But I don’t want to shuffle the samples for my own. I just want to calculate a specific order of the loaded batches for the next epoch and say the dataloader to please use my own list of arranged batch-IDs instead of the standard ascending order.
Like: epoch 1: Batch-IDs=[1,2,3,4,5]
epoch 2: Batch-IDs=[3,1,5,2,4]

azhanmohammed · August 23, 2022, 6:06pm

I do not think there is a way to do this using PyTorch’s dataloader, but you can always write a custom data loading function of your own which can do the same. A simple way I can think of to do this is shown below:

'''the loadBatch function below expects three arguments, list containing image paths, list of labels, and a start index, this index can be used to access the batches in the exact order you want, for example index [0:batchSize] can be batch 1, [batchSize:2*batchSize] can be batch 2, and so on'''
batchSize = 20 #assuming sample batch size of 20
def loadBatch(imagePath, imageLabels, batchID):
    images = imagePath[batchID:batchID+batchSize]
    labels = imageLabels[batchID:batchID+batchSize]
    imageFiles = []
    for image in images:
        i = cv2.imread(image, i)
        #insert augmentation and resizing functions here to make sure all images have same size
        i = np.transpose(i, (2,0,1)) #converting HWC to CHW
        imageFiles.append(i)
    imageFiles = np.array(imageFiles)
    imageFiles = torch.from_numpy(imageFiles)
    labels = torch.from_numpy(labels)
    return imageFiles, labels

The above function can now be used load the batches in any order you want given that you provide the batchID properly. For example lets take the following example:
batch 1: start index=0
batch 2: start index = 20 since we assumed batch size of 20
batch 3: start index = 40 and so on
And for the training part we wish to load the batches in the following order:
Epoch 1: batch 1, batch 3, batch 2, batch 4
Epoch 2: batch 3, batch 2, batch 4, batch 1
So we will create a list of lists which has the order of starting indices for these batches,

#you can create a custom order generator as per your own need to create the list
orderOfBatch = [[0, 40, 20, 60], [40, 20, 60, 0]]
for epoch in range(0,2):
    batchindices = orderOfBatch[epoch]
    for batchIndex in batchindices:
        images, labels = loadBatch(imagePath, imageLabels, batchIndex)
        #training code goes here

While this is not the most efficient way, it can surely get the task done.

ptrblck · August 23, 2022, 6:52pm

A custom sampler should also work, as you can directly pass the manually created indices to it and allow it to create the batches through the DataLoader

reinforce · August 24, 2022, 9:52am

Thanks for your answer, but I dont want to create Batches by now.
I want to do the following:
→ Create Batches with shuffle=false, assume I have 64 Batches.
→ After creation of the batches, I will run 1 Epoch with that. This Epoch will load the Batches by Indice

load(batch_1), 
load(batch_2)
.
.
. 
load(batch_64)

→ After one Epoch I want to create a new order of loading the batches and train them for Epoch 2 like for example:

load(batch_9), 
load(batch_27)
.
.
. 
load(batch_33)

The thing I want to pass to the dataloader is a list with valid indices of the order of sequential loaded batches:

batch_order_for_next_episode_to_load=[9,27,35,...,33]