Move data to gpu, Am I doing it correctly?

witl · March 5, 2021, 8:27am

In the code below I’m moving all the data to the GPU.

# Data Loader for easy mini-batch return in training
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True)

# Move data to GPU
batch_x = []
batch_y = []
for x, y in train_loader:
    batch_x.append(Variable(x.view(-1, 28 * 28)).to(device))
    batch_y.append(Variable(x.view(-1, 28 * 28)).to(device))

gpu_data = zip(batch_x, batch_y)

Probably there is a way I can accomplish it without looping through the batches. Can someone point it out?

shivammehta007 · March 5, 2021, 8:39am

Just output from

for x, y in train_loader:
    x  = x.to(device)
    y = y.to(device)
    # Rest of training loop

Is enough to move the data to GPU and train with it, you don’t need to put it into a list and then later use it it is not a right move as all data will be in your memory and the whole point of batches will be ruined!

witl · March 15, 2021, 7:10am

You are not correct when you say that “the whole point of batches will be ruined”. That’s not the case at all. There are many other situations/advantages of using batches.

In terms of performance, you only would iteratively move your batches to your GPU if and only if your dataset does not fit into your GPU’s memory. If it fits, you can just move everything before training so you dont lose time waiting for the data to be moved at every batch iteration.

Moving data before training:

Autoencoder 0, mean loss: 0.049266365027106775, time: 5.8063098430633545
Classifier 0, mean loss: 0.6276001909743748, time: 5.8063098430633545
Autoencoder 1, mean loss: 0.02765635310459747, time: 5.40066180229187
Classifier 1, mean loss: 0.30130706248538836, time: 5.40066180229187
Autoencoder 2, mean loss: 0.025990864579706813, time: 5.43626651763916
Classifier 2, mean loss: 0.24578058288327412, time: 5.43626651763916

Moving one batch at a time

Autoencoder 0, mean loss: 0.05069048443773408, time: 7.177410364151001
Classifier 0, mean loss: 0.5955254633638905, time: 7.177410364151001
Autoencoder 1, mean loss: 0.02769256935619723, time: 6.592367887496948
Classifier 1, mean loss: 0.2981912835336316, time: 6.592367887496948
Autoencoder 2, mean loss: 0.025816337348444504, time: 6.52787971496582
Classifier 2, mean loss: 0.2456918565480948, time: 6.52787971496582

shivammehta007 · March 15, 2021, 9:01am

As per my understanding, the point of batching is when you cannot move your whole population (dataset in this case), so you take a population sample (a batch) and estimate parameters based on this iteratively as the expectation of the sample parameter will give you population parameters.

So you are perfectly right, if you can move all the dataset onto GPU will be faster and even better at estimating parameters, that is the best case but not a practical one

Also, when I say the whole point of batching is ruined maybe I should have reformulated it as, ideally, you only need batches when you cannot fit your dataset onto GPU at one go. If you can, that is the best therefore the whole point of a batch is useless.

ptrblck · March 15, 2021, 9:21am

While the performance might increase in trade of more memory usage on the GPU, I wouldn’t argue that batching is completely “useless”, as @witl would lose all random transformations.
Iteration the dataset once, will apply the transformations (if used) once for each sample and store move it to the GPU.
Of course it depends on the use case, if a performance (accuracy) drop would be seen, but this should be kept in mind when using this “static” approach.

shivammehta007 · March 15, 2021, 9:26am

Aha! That that a good catch, but I have a flowing question If we have the capability to process the data at once. Can’t we first do all the transformations and then put the final data to GPU at once. Will it be any different than the batch approach?

ptrblck · March 15, 2021, 10:22am

You could also perform all transformations once and push the data afterwards to the GPU.
Note however, that also in your approach each sample would be “randomly” transformed only once, since the data on the GPU wouldn’t be transformed anymore.