I have a utility code that takes as input the class probabilities as predicted by two image classification models M1
and M2
. The utility code calculates a modified version of the AUC metric. The output of model M1
is passed to the utility code and no error is thrown. However, the output of model M2
when passed to the utility code throws an error as described next.
The jupyter notebook shows ‘kernel has been killed
’. A little bit more digging into the error shows that the error comes due to RAM memory being full (GPU memory is 24GB is less than 80% occupied) and the kernel is getting killed. I have also localized the line of code where the error is actually happening. The kernel is killed i.e. RAM is overflows (it is worth noting that I am using a 128GB RAM machine with pytorch 2.1, ubuntu 20.04) when the model executes the following code
score[~mask]+=bias
I have the following dimensions of the variables and data types
print(score.data_type, score.shape, mask.data_type, mask.shape)
<class 'torch.Tensor'>, torch.Size([7280]), <class 'torch.Tensor'>,torch.Size([7280, 278362]),
Also the output from M1
and M2
are not same. The outputs are dictionaries with class labels as keys and values as probabilities. The output of M1 is as follows,
print(pred_M1[‘class_0’])
tensor(0.5000),
print(pred_M2[‘class_0’])
tensor(0.5000))
However the following output is also observed,
print(pred_M1[‘class_0’].item())
0.5000015497207642
What can be the cause of memory leak in the RAM? Any help in this regard will be highly helpful.