Hi I am new to Pytorch (and ML and NN). I need some general guidance.
I have a basic question that I could not find a straight answer for anywhere.
In an ideal case, should CPU RAM usage be increasing with each mini-batch? To give numbers: train_data size is ~6 million. min-batch size=128.
My ram started at like 5% when epoch started and i am at 23000th training step, and it has gone up to 60%. The increase seems linear in number of mini-batches. (about 2.5% increase with each 1000 mini batches)
However once epoch ends, RAM usage comes back down again. But why should it increase by each min-batch? I dont see it.
For context i am running training on a gpu machine and the flow is something like this:
model=mymodel( parameters) #initialize nn model model.to("cuda:0") train_dataloader=CustomDataloader(train_features) for epoch in range(n_epochs): steps=0 for X,y in train_dataloader(): steps+=1 if steps%1000==0: gc.collect() # i added this line to see if it makes a difference, it didnt. X.todevice("cuda:0") #X is actually a custom object which has both tensor data # but also some meta data. ".todevice" is a method that send tensors to cuda but doesnt #touch meta data. Can the problem lie here? it still does not explain why RAM usage #should increase with each minibatch y.to("cuda:0" ) yhat=model(X) criterion = torch.nn.BCEWithLogitsLoss(weight=weight).to("cuda:0") loss=criterion(logit,target.to(torch.float32)) train_losses.append(loss) optimizer.zero_grad() loss.backward() optimizer.step() optimizer.zero_grad()
Please not the comment after X.todevice(“cuda:0”).
The dataloader is custom as well
class CustomDataLoader(): def __init__(self,ds,bs): self.ds,self.bs=ds,bs def __iter__(self): for i in range(0,len(self.ds),self.bs): indices=self.ds[key].data[i:i+self.bs].clone() batch,y=self.ds.generate_batch(indices) yield batch,y
train_features is not a dataset. It is a custom object (a UserDict) with a method “generate_batch” which, given indices, generates a “subset” of the same object. By subset I mean an instance of the same class with smaller data but same meta data.
You may wonder why i am doing it this way. Well, if i try to make train_features a dataset with a getitem that produces single data objects (again, with same meta-data as train_features) and go the usual route, i have to write a collate function that takes these single data objects and “stacks” them up. This seem an unnecessary process in which i first extract single data objects, and then out it back together. i can just cut the middle man and directly produce batches from the object itself.
Does Pytorch’s factory Dataloder has something that i will miss doing it this way?
Thank you for taking the time to read my question and I really appreciate your guidance. Let me know if you need further details.
Update: I just iterated over train_dataloader (and sent things to cuda) without any operations on them, and the RAM use didnt increase. So it is not the dataloader but model probably.