Hello all,
I have a model and a dataset class. On a fresh boot, the system takes up 1.8GB of ram with no process running. The dataset class upon initialization takes up an additional 2 ~2.5 GB as it stores some variables for further reference. I instantiate the model class and pass it to the training function which looks as follows -
def trainer(model, train_dataloader, val_dataloader, num_epochs):
torch.backends.cudnn.benchmark = True
model.train()
model.cuda()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=0.00009)
criterion = nn.CrossEntropyLoss().cuda()
model.train()
epoch_loss_train = 0
epoch_acc_train = 0
for _, (image, label) in enumerate(train_dataloader):
optimizer.zero_grad()
image = image.cuda()
label = label.cuda()
output = model(image)
loss = criterion(output, label)
loss.backward()
optimizer.step()
del image, label
As seen above, the model is shifted to the GPU and the dataloader returns the image and label which are shifted on to the GPU. The dataloader’s runs on a single thread and the system monitor reflects that.
Once the training loop starts around 2.8 GB of GPU memory is utilized. However RAM gets filled up to 7 GB. I was wondering where is this additional 2.5 GB is coming from ??