Hi,
I am extracting matrices (patches) from a tensor by using select
and narrow
. I would like to know if those operations are returning view to the same underlying storage or copy.
This question is related to a more complex process where :
1/ all patches positions are generated when my DataLoader is called
2/ each patch is extracted from the tensor only on __getitem__
call
Generating patches positions is not really heavy on memory, but extracting is such an expensive operation, it grows linearly and eventually blows 32 GB of RAM.
I would like to understand what is done wrong. To give you some idea, here is the logic behind the code:
class PatchExtractor(data.Dataset):
def __init__(self, root, patch_size, transform=None,
target_transform=None):
self.root = root
self.patch_size = patch_size
self.transform = transform
self.target_transform = target_transform
# extract all patch positions
self.dataset = make_dataset(root,
patch_size)
if loader is None:
self.loader = Loader()
else:
self.loader = loader
def __getitem__(self, index):
path, args, target = self.dataset[index]
# img is a tensor returned by a succession of narrow and select
img = self.loader.load(path, args, self.patch_size)
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
target = self.target_transform(target)
return img, target
def __len__(self):
return len(self.dataset)
I don’t think I am mistaking on the DataLoader. self.loader.load
returns precisely this kind of results :
self.images[path].select(0, position[0]) \
.narrow(0, y-border_width, patch_size) \
.narrow(1, z-border_width, patch_size)
From my opinion, it seems everytime a patch is loaded and transfered to cuda, it is not freed after usage. I don’t store those values, I use the same train loop as in the imagenet example. The fact that a large bunch of my memory is being freed right after an epoch ends (ie when testing starts) guides me to that observation.
def train(train_loader, model, criterion, optimizer, epoch):
batch_time = AverageMeter()
data_time = AverageMeter()
losses = AverageMeter()
top1 = AverageMeter()
top5 = AverageMeter()
# switch to train mode
model.train()
end = time.time()
for i, (input, target) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
target = target.cuda(async=True)
input_25 = torch.autograd.Variable(input[0]).cuda()
input_51 = torch.autograd.Variable(input[1]).cuda()
input_75 = torch.autograd.Variable(input[2]).cuda()
target_var = torch.autograd.Variable(target)
# compute output
output = model(patch25=input_25, patch51=input_51, patch75=input_75)
loss = criterion(output, target_var)
# debug loss value
# print('raw loss is {loss.data[0]:.5f}\t'.format(loss=loss))
# measure accuracy and record loss
prec1, prec5 = accuracy(output.data, target, topk=(1, 5))
losses.update(loss.data[0], input[0].size(0))
top1.update(prec1[0], input[0].size(0))
top5.update(prec5[0], input[0].size(0))
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % args.print_freq == 0:
print('Epoch: [{0}][{1}/{2}]\t'
'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
epoch, i, len(train_loader), batch_time=batch_time,
data_time=data_time, loss=losses, top1=top1, top5=top5))
Should I del
my variables right after backprop ?
Thank you for the feedback