I have the following code
device = "cuda:0"
model.to(device)
for epoch in range(20000):
for data in dataloader:
inputs = data.to(device)
labels = data.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
I am doing .to(device)
every iteration…
I would like to understand internally what Pytorch is doing.
- In the next epoch, what is going to happen with X and y? will Pytorch send again to gpu or will it ignore?
- If I have data that is greater then my GPU memory. How Pytorch is handling that?
- If my GPU memory is greater than my data. How can I send the data to gpu once?
I tried to search in the forum but wasn’t able to find answers.
Thanks.
X
and y
are undefined so unclear what these variables contain.
inputs
and labels
will be replaced with the new data
(in fact with the same data) and will reuse the same memory if possible.
It will raise an Out of memory error.
I don’t understand this question, since you are handling batches of data. If you want to move the entire dataset to the GPU, you can use .to(device)
in the same way as is already done for batches.
Given the following scenario,
device = 'cuda'
for epoch in range(20000):
for i, data in enumerate(dataloader):
print(epoch)
print(i)
inputs, labels = data
inputs = inputs.to(device)
labels = labels.to(device)
When epoch
equals 0 and i
equals 0, inputs
and labels
will be copied from cpu to gpu.
When epoch
equals 0 and i
equals 1, inputs
and labels
will be copied from cpu to gpu.
When the data that I sent to gpu in epoch
equals 0 and i
equals 0 will be removed from the gpu?
I’m still unsure if I’m misunderstanding the question, but the variables will be replaced and the previously used memory will be reused via the internal cache if possible:
import torch
data = torch.randn(1024, 1024, device="cpu")
print("allocated {:.3f}, in cache: {:.3f}".format(
torch.cuda.memory_allocated() / 1024**2, torch.cuda.memory_reserved() / 1024**2))
# allocated 0.000, in cache: 0.000
for i in range(3):
x = data.to("cuda")
print("i: {}, allocated {:.3f}, in cache: {:.3f}".format(
i, torch.cuda.memory_allocated() / 1024**2, torch.cuda.memory_reserved() / 1024**2))
# i: 0, allocated 4.000, in cache: 20.000
# i: 1, allocated 4.000, in cache: 20.000
# i: 2, allocated 4.000, in cache: 20.000
You are not increasing the allocated memory since the previously used tensors are not referenced anymore (unless you explicitly create a reference to them of course).
1 Like
Exactly the answer that I was looking for.
Thanks for the example as well.
I appreciate your patience.
1 Like