I’m playing with torch.autograd.graph.saved_tensors_hooks to compress the tensors. I found during the process, the following two codes look to have different memory usage (assuming x.size()>> b.size()).
case a) x = f(a) y = matmul(x.T, b) z = matmul(x, c) case b) x = f(a) y = matmul(b.T, x) z = matmul(x, c)
It looks the case b) keeps using the same x in forward of y,z (verified with id()), while the case a) creates a new instance to save x.T (verified with id()). In that sense, can I say b) is more memory efficient?
Also as a related question, it looks pytorch doesn’t duplicate the same tensor for the forward/backward by looking at the python id of tensors. Is this the right way to check if two tensors point to the same memory allocation?