GPU consumption

Hi,

I have a question I loaded the data in two different ways from numpy array, however there is a one which consumes more GPU memory compared to the another one

First method:

X_train = torch.from_numpy(X_train).cuda() # transform to torch tensor
y_train = torch.from_numpy(y_train).cuda()

X_val = torch.from_numpy(X_val).cuda()
y_val = torch.from_numpy(y_val).cuda()

X_train = X_train.view(-1, 3, 480, 480).float()
X_val = X_val.view(-1, 3, 480, 480).float()

Consumption = 12.4 / 15.0 GB

Second one:

X_train = torch.FloatTensor(X_train).cuda()
X_val = torch.FloatTensor(X_val).cuda()
y_train = torch.LongTensor(y_train).cuda()
y_val = torch.LongTensor(y_val).cuda()

X_train = X_train.view(-1, 3, 480, 480)
X_val = X_val.view(-1, 3, 480, 480)

Consumption = 9.5 / 15.0 GB

May anyone help :thinking: ?

Also after seeing this I am too interested to understand more about the hardware, so may anyone please recommend books\courses regarding the hardware architecture?

Thanks :heart:

Also less RAM :thinking: , I don’t know why

The allocated and used memory will be the same, but the cache might be different (and thus also the peak memory usage) as you are potentially moving the float64 tensors to the GPU first and are transforming them to float32 afterwards in the first example while you are directly using the (deprecated) FloatTensor constructor in the second example and are thus transforming the data on the CPU.
Here is a small example:

import torch
import numpy as np

print(torch.cuda.memory_allocated()/1024**2)
# 0.0
print(torch.cuda.memory_reserved()/1024**2)
# 0.0

X_train = np.random.randn(10, 3, 480, 480)
y_train = np.random.randn(10, 3, 480, 480)
X_val = np.random.randn(10, 3, 480, 480)
y_val = np.random.randn(10, 3, 480, 480)

X_train = torch.from_numpy(X_train).cuda() # transform to torch tensor
y_train = torch.from_numpy(y_train).cuda()

X_val = torch.from_numpy(X_val).cuda()
y_val = torch.from_numpy(y_val).cuda()

X_train = X_train.view(-1, 3, 480, 480).float()
X_val = X_val.view(-1, 3, 480, 480).float()

print(torch.cuda.memory_allocated()/1024**2)
# 158.203125
print(torch.cuda.memory_reserved()/1024**2)
# 244.0


## 

import torch
import numpy as np

print(torch.cuda.memory_allocated()/1024**2)
# 0.0
print(torch.cuda.memory_reserved()/1024**2)

X_train = np.random.randn(10, 3, 480, 480)
y_train = np.random.randn(10, 3, 480, 480)
X_val = np.random.randn(10, 3, 480, 480)
y_val = np.random.randn(10, 3, 480, 480)


X_train = torch.FloatTensor(X_train).cuda()
X_val = torch.FloatTensor(X_val).cuda()
y_train = torch.LongTensor(y_train).cuda()
y_val = torch.LongTensor(y_val).cuda()

X_train = X_train.view(-1, 3, 480, 480)
X_val = X_val.view(-1, 3, 480, 480)

print(torch.cuda.memory_allocated()/1024**2)
# 158.203125
print(torch.cuda.memory_reserved()/1024**2)
# 164.0
1 Like