Hello! I have a NN that is trained to predict the output of an equation. It is a very basic fully-connected NN. I used it for weeks perfectly fine, but for some reason today I started to get the CUDA error: out of memory
error. I reduced the batch size to 1 (initially it was working just fine with a batch size of 1024) and I am still getting this error. I am running this on a shared linux machine (however I am the only user most of the time). I logged in and out several times and I am still getting that error. I tried nvidia-smi
and I noticed that the GPU usage was 99-100%, however, later, that value dropped to 0% and I am still getting the error. I am really new to AI and I don’t know anything about GPU’s. Does anyone have any idea what could the problem be? Thank you!
Update: Here is a part of the code:
for filename in os.listdir(pathdir):
print(filename)
for i in range(1):
n_variables = np.loadtxt(pathdir+"/%s" %filename, dtype='str').shape[1]-1
variables = np.loadtxt(pathdir+"/%s" %filename, usecols=(0,))
if n_variables==0:
print("Solved! ", variables[0])
continue
elif n_variables==1:
variables = np.reshape(variables,(len(variables),1))
else:
for j in range(1,n_variables):
v = np.loadtxt(pathdir+"/%s" %filename, usecols=(j,))
variables = np.column_stack((variables,v))
f_dependent = np.loadtxt(pathdir+"/%s" %filename, usecols=(n_variables,))
f_dependent = np.reshape(f_dependent,(len(f_dependent),1))
factors = torch.from_numpy(variables[0:100000])
print(len(factors))
factors = factors.cuda()
It seems that the error appears in the last line, at factors = factors.cuda()
. Factors is a torch tensor of size 4 x 100000. I made it size 4 x 1 and still running out of CUDA memory.