Hi all,
I have a function that uses for loop to modify some value in my tensor. However, after some debugging I found that the for loop actually causes GPU to use a lot of memory. Any idea why is the for loop causes so much memory? Or is there a way to vectorize the troublesome for loop?
Many Thanks
def process_feature_map_2(dm):
"""dm should be a (N,C,D,D) tensor, D is my use case is 14, N is 4, C is 80
`a` and `b` are (N,1,D,D) tensor
`c` is same shape as `dm`
Lets say `dm` is (1,3,2,2) Tesnor and the value of last two dim of `b` is
[[0,1],
[1,2]] and `a` is
[[1,2],
[3,4]]
This function will create `c` such that it is
[[[1,0],
[0,0]],
[[0,2],
[3,0]],
[[0,0],
[0,4]]]
In plain English, I want to separate the value in `a` into different channels
and the channel indexes are stored in `b`
"""
a = dm.sum(1, keepdim=True)
b = dm.argmax(1, keepdim=True)
c = torch.zeros(dm.shape, device=dm.device)
for n in range(c.shape[0]):
for i in range(c.shape[1]):
c[n][i][b[n][0] == i] = a[n][b[n] == I]
return c
I tried to comment the for loop and just run the following, out of memory also:(
c[0][0][b[0][0] == 0] = a[0][b[0] == 0]
It turns out indexing with bool array requires a lot of memory (https://github.com/pytorch/pytorch/issues/57515), is there a solution for this?