What is happening

I have the following code

device = "cuda:0"
model.to(device)
for epoch in range(20000):
    for data in dataloader:
      inputs = data.to(device)
      labels = data.to(device)
  
      optimizer.zero_grad()
  
      outputs = model(inputs)
      loss = criterion(outputs, labels)
      loss.backward()
  
      optimizer.step()

I am doing .to(device) every iteration…
I would like to understand internally what Pytorch is doing.

  • In the next epoch, what is going to happen with X and y? will Pytorch send again to gpu or will it ignore?
  • If I have data that is greater then my GPU memory. How Pytorch is handling that?
  • If my GPU memory is greater than my data. How can I send the data to gpu once?

I tried to search in the forum but wasn’t able to find answers.
Thanks.

X and y are undefined so unclear what these variables contain.
inputs and labels will be replaced with the new data (in fact with the same data) and will reuse the same memory if possible.

It will raise an Out of memory error.

I don’t understand this question, since you are handling batches of data. If you want to move the entire dataset to the GPU, you can use .to(device) in the same way as is already done for batches.

Given the following scenario,

device = 'cuda'
for epoch in range(20000):
    for i, data in enumerate(dataloader):
      print(epoch)
      print(i)
      inputs, labels = data
      inputs = inputs.to(device)
      labels = labels.to(device)

When epoch equals 0 and i equals 0, inputs and labels will be copied from cpu to gpu.
When epoch equals 0 and i equals 1, inputs and labels will be copied from cpu to gpu.

When the data that I sent to gpu in epoch equals 0 and i equals 0 will be removed from the gpu?

I’m still unsure if I’m misunderstanding the question, but the variables will be replaced and the previously used memory will be reused via the internal cache if possible:

import torch

data = torch.randn(1024, 1024, device="cpu")
print("allocated {:.3f}, in cache: {:.3f}".format(
    torch.cuda.memory_allocated() / 1024**2, torch.cuda.memory_reserved() / 1024**2))
# allocated 0.000, in cache: 0.000

for i in range(3):
    x = data.to("cuda")

    print("i: {}, allocated {:.3f}, in cache: {:.3f}".format(
        i, torch.cuda.memory_allocated() / 1024**2, torch.cuda.memory_reserved() / 1024**2))

# i: 0, allocated 4.000, in cache: 20.000
# i: 1, allocated 4.000, in cache: 20.000
# i: 2, allocated 4.000, in cache: 20.000

You are not increasing the allocated memory since the previously used tensors are not referenced anymore (unless you explicitly create a reference to them of course).

1 Like

Exactly the answer that I was looking for.
Thanks for the example as well.
I appreciate your patience.

1 Like