Memory problems in torch.from_numpy()

Enrique · August 12, 2019, 3:25pm

Hi, this is my first post as often I encouter that all my PyTorch questions have been answered here at some point. Please accept my apologies if it happens this question has been already addressed.

In particular, I found a problem regarding converting variables from numpy to torch; I found that using torch.from_numpy() on big variables results in what appears to be a memory leakage. Here’s a very simple code in which I create a dictionary where each key will hold a 66x2 array. If I create the array beforehand and convert it through torch.from_numpy (Option 1), I find a MemoryError message when trying to dump the variable onto the file. If instead I convert each 66x2 array to torch “on-the-fly”, I find no errors when dumping the files. Does anyone have a clue about why this might be happening?

Many thanks

import pickle, torch, numpy as np
######### ----------- Option 1 ------------ #################
keys = [‘{:06d}’.format(k) for k in range(10000)]
db = {}
points = np.random.randn(len(keys),66,2).astype(‘float’)
points = torch.from_numpy(points).float()
for i,k in enumerate(keys):
db[k] = points[i,:,:]

pickle.dump(db,open(‘fromnumpy.pkl’,‘wb’))

MemoryError

######## ----------- Option 2 --------------##################
keys = [‘{:06d}’.format(k) for k in range(10000)]
db = {}
points = np.random.randn(len(keys),66,2).astype(‘float’)

for i,k in enumerate(keys):
db[k] = torch.from_numpy(points[i,:,:]).float()

pickle.dump(db,open(‘fromnumpy.pkl’,‘wb’))