Implementing SASNet training loss functions

Other than the lack of original source code for the training and loss functions, I am facing a problem with a recursive part of the code. In the paper : “To Choose or to Fuse? Scale Selection for Crowd Counting”, from this year, the author are using a three fold loss. For the last part they recursively look for “hard pixels” (pixels over predicting compare to the groundtruth). For that they recursively split the prediction into 4 quarters, and for the over-predicted ones they recursively go down to pixel level and return indices of said hard pixels.
I quickly coded something which seem to work but I often get the error :
/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/ATen/native/cuda/Loss.cu:106: operator(): block: [0,0,0], thread: [31,0,0] Assertion `input_val >= zero && input_val <= one` failed. Traceback (most recent call last): ... idxs_hard =np.array(self.prasearch(density.detach().clone().cpu().numpy()[0,0,:,:],gt.detach().clone().cpu().numpy()[0,:,:],(0,0))) ....
I am converting the predicted map and groundtruth to numpy arrays as it seemed to be faster than using the tensors (10 times faster apparently).
Am I messing too much with indices ?
I am using them here :

praLoss2 = self.msesum(density[:,0,idxs_hard[:,0],idxs_hard[:,1]], gt[:,idxs_hard[:,0],idxs_hard[:,1]])

Thank you in advance if you might find the problem with that.
Also here is the recursive function :

def prasearch(self,Dest,Dgt,start):
if(Dest.shape[0]>1):
quarters_sum = [Dest[0:Dest.shape[0]//2,0:Dest.shape[1]//2].sum() - Dgt[0:Dest.shape[0]//2,0:Dest.shape[1]//2].sum(),
Dest[Dest.shape[0]//2:,0:Dest.shape[1]//2].sum() - Dgt[Dest.shape[0]//2:,0:Dest.shape[1]//2].sum(),
Dest[0:Dest.shape[0]//2,Dest.shape[1]//2:].sum() - Dgt[0:Dest.shape[0]//2,Dest.shape[1]//2:].sum(),
Dest[Dest.shape[0]//2:,Dest.shape[1]//2:].sum() - Dgt[Dest.shape[0]//2:,Dest.shape[1]//2:].sum()]
quarters_idxs = [[(0,0),(Dest.shape[0]//2,Dest.shape[1]//2)],
[(Dest.shape[0]//2,0),(Dest.shape[0],Dest.shape[1]//2)],
[(0,Dest.shape[1]//2),(Dest.shape[0]//2,Dest.shape[1])],
[(Dest.shape[0]//2,Dest.shape[1]//2),(Dest.shape[0],Dest.shape[1])]
]
idfix = np.array(quarters_sum)>0

        pixs = []
        for a,b in enumerate(idfix):
            if(b):
                pixs =  pixs +  self.prasearch(Dest[quarters_idxs[a][0][0]:quarters_idxs[a][1][0],quarters_idxs[a][0][1]:quarters_idxs[a][1][1] ],
                                               Dgt[quarters_idxs[a][0][0]:quarters_idxs[a][1][0],quarters_idxs[a][0][1]:quarters_idxs[a][1][1] ],
                                               quarters_idxs[a][0])   
        return pixs
    else:
        if(Dest.shape[0]==1 and Dest.shape[1]==1):
            return [ [start[0],start[1]] ]
        else:## if the input is not even on both axes we need to do that here
            sub = Dest-Dgt
            idmax = sub.argmax()#we take the id of the maximum positive difference in array
            return [ [start[0],start[1]+idmax.item()] ]