It’s clear that in model training, the memory usage of module parameters like BN layer or Conv layer is not a big problem in GCN or CNN. But when the input data is of great size such as 5M(size of one sample), the memory usage is hard to bear because of the intermediate result to be saved to calculate the gradient for BP. I want to know the appropriately how much memory will be added after the input go through a function such as view(), contiguous(), einsum(), stack(), mul() and many functions in Pytorch. I go to find this in official doc only to find how to use this function. I truly want to know whether there are websites or docs that can help, or what I can do is to indicate that from the doc? Thanks a lot for helping!
fv = fv.permute(0, 4, 3, 1, 2).contiguous().view(N, M * V_node * C, T)
fv = self.data_bn_v(fv)
if the size of fv is 100M, after these two function, how much memory will be used?