Why a tensor is not on GPU?

Cannot understand why window_pool is not on GPU. Entire module is on GPU, so wouldn’t window_pool be on GPU as well? Is gradient info copied when I do sliced tensor copying?
How do I make window_pool to be on GPU as well… Would it be correct just to do window_pool.to("cuda")?

def forward(self, input, lens, args):

        window_pool = torch.zeros([lens.shape[0], args.mem_dim*len(self.windows)])
        convolved = conv_model(x)[0].transpose(0, 1) 
        relu_convolved = F.relu(convolved)
        start = 0

        for i in range(lens.shape[0]):
                input_len = relu_convolved[start:start+lens[i]-window+1].shape[0]
                input = relu_convolved[start:start+lens[i]-window+1].permute(1,0).unsqueeze(0)

                start += lens[i].data.cpu().numpy()-window+1

                max_pool1d = F.max_pool1d(input, kernel_size=input_len)

                lo = (window-1)*args.mem_dim
                hi = lo + args.mem_dim

                print("max_pool1d.device ", max_pool1d.device) #GPU

                window_pool[i][lo:hi] = max_pool1d

        print(window_pool.device)  #CPU

You can use
windows_pool=torch.zeros([lens.shape[0], args.mem_dim*len(self.windows)]).to("cuda")
window_pool = torch.zeros([lens.shape[0], args.mem_dim*len(self.windows)]).cuda()

But if you do this you cannot switch to CPU again without changing the code.

You could register it as a buffer to your module and then only expand it for batchsize.

This one solved the issue
window_pool = torch.cuda.FloatTensor(lens.shape[0], args.mem_dim*len(self.windows)).fill_(0)