Memory usage in forward/backward

thyeros · January 12, 2022, 1:27am

I’m playing with torch.autograd.graph.saved_tensors_hooks to compress the tensors. I found during the process, the following two codes look to have different memory usage (assuming x.size()>> b.size()).

case a)
   x = f(a)
   y = matmul(x.T, b)
   z = matmul(x, c)

case b)
   x = f(a)
   y = matmul(b.T, x)
   z = matmul(x, c)

It looks the case b) keeps using the same x in forward of y,z (verified with id()), while the case a) creates a new instance to save x.T (verified with id()). In that sense, can I say b) is more memory efficient?

Also as a related question, it looks pytorch doesn’t duplicate the same tensor for the forward/backward by looking at the python id of tensors. Is this the right way to check if two tensors point to the same memory allocation?

ptrblck · January 13, 2022, 7:22am

No, as the transposes should be views and the memory allocations should thus also be identical as seen via print(torch.cuda.memory_summary()).

thyeros · January 13, 2022, 5:43pm

thanks @ptrblck , but does it hold true for the backward case? Doesn’t autogrid make independent copies of b and b.T? Please correct me if I’m wrong. But if I’m right, then

case a) stores: x, x.T, b, c
case b) stores: x, b.T, b, c

Since x.size()>> b.size(), case a) needs more memory for backward?

ptrblck · January 13, 2022, 9:29pm

No, this should also not be the case and you could profile the memory usage via torch.cuda.memory_summary() to compare both approaches.

print(torch.cuda.memory_summary())
x = torch.randn(1024, 1024).cuda().requires_grad_(True)
b = torch.randn(1024, 1).cuda().requires_grad_(True)
c = torch.randn(1024, 512).cuda().requires_grad_(True)


# y = torch.matmul(x.T, b)
# print(torch.cuda.memory_summary())
# y.mean().backward()
# print(torch.cuda.memory_summary())
# z = torch.matmul(x, c)
# z.mean().backward()
# print(torch.cuda.memory_summary())


y = torch.matmul(b.T, x)
print(torch.cuda.memory_summary())
y.mean().backward()
print(torch.cuda.memory_summary())
z = torch.matmul(x, c)
z.mean().backward()
print(torch.cuda.memory_summary())

Both methods show the identical allocations.

thyeros · January 13, 2022, 10:27pm

@ptrblck really appreciated!