I trained an autoencoder and now want to embed my images as an average of 100 subsampled embeddings (randomly cropped from much larger images). But when I try to embed these images, I run out of GPU memory…
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generic/THCStorage.cu:58
…but only after hitting ~80 samples.
Given that (1) I was able to train my model on this data, and (2) I am able to load the data and perform several dozen forward passes, I suspect the issue is that my program is holding on to memory rather than letting it go.
I used Soumith’s recommendations to wrap code into functions so variables can be garbage collected and to also explicitly call gc.collect()
. Also, my batch size is only 1. Is there anything else I can change to make this program use less memory?
Here are the two main functions:
def preembed_images(subdir, model):
model.eval()
model = cuda.ize(model)
dataset = GTExImages()
indices = list(range(len(dataset)))
data_loader = DataLoader(dataset=dataset,
batch_size=1,
sampler=SequentialSampler(indices),
num_workers=4,
pin_memory=use_cuda)
Z = torch.Tensor(N_SAMPLES, D_EMBEDDINGS)
for i, x in enumerate(data_loader):
print('Embedded %s-th image.' % i)
Z[i] = embed_one_image(x, model, dataset.subsample, cfg)
gc.collect()
torch.save(Z, '%s/embedded_images.pt' % subdir)
# ------------------------------------------------------------------------------
def embed_one_image(x, model, subsample, cfg):
Z = torch.Tensor(N_Z_PER_SAMPLE, D_EMBEDDINGS)
for i in range(N_Z_PER_SAMPLE):
xi = subsample(x.squeeze(0)).unsqueeze(0)
zi = model(cuda.ize(xi))
Z[i] = zi
return Z.mean(dim=0)