Weird CUDA error: out of memory

Hello! I have a NN that is trained to predict the output of an equation. It is a very basic fully-connected NN. I used it for weeks perfectly fine, but for some reason today I started to get the CUDA error: out of memory error. I reduced the batch size to 1 (initially it was working just fine with a batch size of 1024) and I am still getting this error. I am running this on a shared linux machine (however I am the only user most of the time). I logged in and out several times and I am still getting that error. I tried nvidia-smi and I noticed that the GPU usage was 99-100%, however, later, that value dropped to 0% and I am still getting the error. I am really new to AI and I don’t know anything about GPU’s. Does anyone have any idea what could the problem be? Thank you!

Update: Here is a part of the code:

for filename in os.listdir(pathdir):

    for i in range(1):

        n_variables = np.loadtxt(pathdir+"/%s" %filename, dtype='str').shape[1]-1
        variables = np.loadtxt(pathdir+"/%s" %filename, usecols=(0,))

	    if n_variables==0:
            print("Solved! ", variables[0])
        elif n_variables==1:
            variables = np.reshape(variables,(len(variables),1))
            for j in range(1,n_variables):
                v = np.loadtxt(pathdir+"/%s" %filename, usecols=(j,))
                variables = np.column_stack((variables,v))

        f_dependent = np.loadtxt(pathdir+"/%s" %filename, usecols=(n_variables,))
        f_dependent = np.reshape(f_dependent,(len(f_dependent),1))

        factors = torch.from_numpy(variables[0:100000])
        factors = factors.cuda()

It seems that the error appears in the last line, at factors = factors.cuda(). Factors is a torch tensor of size 4 x 100000. I made it size 4 x 1 and still running out of CUDA memory.

Could you share a screenshot of the output of nvidia-smi right before you are trying to run your code?
Does you machine have multiple GPUs and do you specify in your code to use a certain one?