Implementation of a custom loss function for training similar to mse

I am trying to implement a custom loss function for training. It is similar to mse but I am trying to calculate this loss only for specific area in predicted image and not the whole image. I have used a filter (‘wt_matrix’ in code) of the same dimension as image filled with ‘1’ values for locations which are to be considered in loss calculation and ‘0’ for locations that are not to be used. My implementation is as follows.

def build_loss(self, predicted_img, gt_data, wt_matrix):
        wt=np.sum(wt_matrix)
        wt_matrix=Variable(torch.from_numpy(wt_matrix),requires_grad=False)  
        wt_matrix=wt_matrix.cuda()
        gt_data=gt_data.cuda()
        pred_img=predicted_img*wt_matrix
        gdata=wt_matrix*gt_data
        diff_1=(pred_img-gdata)*(pred_img-gdata)
        wt=np.array([wt])
        wt=Variable(torch.from_numpy(wt),requires_grad=False)
        loss=torch.sum(diff_1)/wt.cuda()
        return loss

I have imported Variable form torch.autograd. I am calling backward on loss. Is everything correct in my implementation? It does not seem to work properly and takes too long to run.