Given the samples $\vec{x_i} \in \mathbb{R}^d, i \in [1,…,l]$ where $l$ is the number of training samples, $d$ is the number of input features, the related target values $y_i \in \mathbb{R}$, and the $l \times l$ matrix defined below:
$$ S_{i,j} = e^{-\gamma_S ||\vec{x}_i - \vec{x}_j||^2} = e^{-\gamma_S (\vec{x}_i’ \vec{x}_i -2 \vec{x}_i’ \vec{x}_j + \vec{x}_j’\vec{x}_j)}
$$
where $i \in [1,…,l], j \in [1,…,l]$, and $\gamma_S$ is another hyper-parameter, we would like to use with PyTorch the following custom loss for a regression task:
$$\sum_{i=1}^l \sum_{j=1}^l \sqrt{|p_i-y_i|}\sqrt{|p_j-y_j|} S_{i,j}$$
where $p_i$ is the $i$-th estimation.
Our loss is implemented with this code:
def ourLoss(out, lab):
global stra, sc
abserr = torch.abs(out - lab).flatten().float()
serr = torch.sqrt(abserr)
bm = stra[sc : sc + out.shape[0], sc : sc + out.shape[0]].float()
loss = torch.dot(serr, torch.matmul(bm, serr))
return loss
where ‘stra’ is $S$, sc is a counter used for batch evaluations, and then the Adam optimizer returns a nan loss value…