Repeat repeat function freezing PC

I am using expand function in pytorch to repeat tensors ( cnn feats of size 2048) but somehow my code is slow even though it is running in GPU and on top of that its freezing my whole PC ( graphics becomes laggy). I have also attached a part of my forward method . Below , N= 36( boxes for 1 entry of the batch size ) and B = 16 ( batch size). So the b_i is 16x36x2048 and so is b_j while q_rnn is 16x512.

 def forward(self, box_feats,q_feats,box_coords):        
        enc2,_ = self.QRNN(q_feats.permute(1,0,2))
        q_rnn = enc2[-1]
    
        # add coordinates
        box_feats = torch.cat([box_feats, box_coords],dim=-1)
        
        N =  box_feats.size(1) # number of boxes
        B =  box_feats.size(0) #batch size
        

        qst = q_rnn.unsqueeze(1).expand(-1,N*N,-1)               
        b_i = box_feats.unsqueeze(2).expand(-1,-1,N,-1) 
        b_i = b_i.contiguous().view(B,N**2,-1)
        
        b_j = box_feats.unsqueeze(1).expand(-1,N,-1,-1) 
        b_j = b_j.contiguous().view(B,N**2,-1)      
        #print (b_j.size())               
        # concatenate all together
        b_full = torch.cat([b_i,b_j,qst],-1)

        
        bg  = self.g1(b_full)
        bg  = F.relu(bg)
        bg  = self.g2(bg)
        bg  = F.relu(bg)        

Are you sure the code runs on GPU?
Could you check the .type attribute of the Tensors?
If my machine becomes laggy, it’s most likely I forgot to push the data onto the GPU, my RAM full and I’m using the swap.

If that’s not the issue, could you post a small executable code snippet so that I can check if on my machine?

Sorry for a late reply… So I checked all my CPU, GPU and Memory/swap utilization. None of them are used significantly and GPU is only used about 3 GB out of 12 GB and RAM utilization is low too. I checked the variable types inside the forward function using [var].is_cuda and [var].type … both of them show that they are definitely Cuda variables.