Hi I am new to Pytorch (and ML and NN). I need some general guidance.
I have a basic question that I could not find a straight answer for anywhere.
In an ideal case, should CPU RAM usage be increasing with each mini-batch? To give numbers: train_data size is ~6 million. min-batch size=128.
My ram started at like 5% when epoch started and i am at 23000th training step, and it has gone up to 60%. The increase seems linear in number of mini-batches. (about 2.5% increase with each 1000 mini batches)
However once epoch ends, RAM usage comes back down again. But why should it increase by each min-batch? I dont see it.
For context i am running training on a gpu machine and the flow is something like this:
model=mymodel( parameters) #initialize nn model
model.to("cuda:0")
train_dataloader=CustomDataloader(train_features)
for epoch in range(n_epochs):
steps=0
for X,y in train_dataloader():
steps+=1
if steps%1000==0:
gc.collect() # i added this line to see if it makes a difference, it didnt.
X.todevice("cuda:0")
#X is actually a custom object which has both tensor data
# but also some meta data. ".todevice" is a method that send tensors to cuda but doesnt
#touch meta data. Can the problem lie here? it still does not explain why RAM usage
#should increase with each minibatch
y.to("cuda:0" )
yhat=model(X)
criterion = torch.nn.BCEWithLogitsLoss(weight=weight).to("cuda:0")
loss=criterion(logit,target.to(torch.float32))
train_losses.append(loss)
optimizer.zero_grad()
loss.backward()
optimizer.step()
optimizer.zero_grad()
Please not the comment after X.todevice(“cuda:0”).
The dataloader is custom as well
class CustomDataLoader():
def __init__(self,ds,bs):
self.ds,self.bs=ds,bs
def __iter__(self):
for i in range(0,len(self.ds),self.bs):
indices=self.ds[key].data[i:i+self.bs].clone()
batch,y=self.ds.generate_batch(indices)
yield batch,y
train_features is not a dataset. It is a custom object (a UserDict) with a method “generate_batch” which, given indices, generates a “subset” of the same object. By subset I mean an instance of the same class with smaller data but same meta data.
You may wonder why i am doing it this way. Well, if i try to make train_features a dataset with a getitem that produces single data objects (again, with same meta-data as train_features) and go the usual route, i have to write a collate function that takes these single data objects and “stacks” them up. This seem an unnecessary process in which i first extract single data objects, and then out it back together. i can just cut the middle man and directly produce batches from the object itself.
Does Pytorch’s factory Dataloder has something that i will miss doing it this way?
Thank you for taking the time to read my question and I really appreciate your guidance. Let me know if you need further details.
Update: I just iterated over train_dataloader (and sent things to cuda) without any operations on them, and the RAM use didnt increase. So it is not the dataloader but model probably.