Hi @ptrblck

I’m implementing a custom loss function, which has a term that involves the gram matrix of a Gaussian RBF kernel.

Say, for each training iteration, I get a mini-batch (batch size 128) of predicted probabilities for K=5 classes. So the predicted probability tensor has shape=(128,5). Now I wish to compute the Gram matrix (128 by 128) of the Gaussian RBF kernel exp(-||p-q||^2) where p and q are the predicted probability vectors.

I don’t know if there’s a way of doing this without looping through all the 128x128 possible pairs of p and q, and yet preserves the autograd compatibility so I can use it as part of the loss function.

Could you please help me with some code example? Thank you in advance!

Best

1 Like

I’m not sure, but wouldn’t `torch.mm(mat, mat.t())`

calculate the Gram matrix?

PS: as you can clearly see I’m not an expert in this topic, so tagging certain people might demotivate others to answer in your thread.

Thank you very much for your reply!

If `mat`

is the predicted probability matrix with shape (128,5), then `torch.mm(mat,mat.t())`

gives me the 128 by 128 matrix that contains all the inner products between pairs of rows `p,q`

in `mat`

.

But what I’m hoping is to compute a more general function `k(p,q)`

between all pairs of rows `p,q`

in `mat`

and store it in a 128 by 128 matrix. So `torch.mm(mat,mat.t())`

can be seen as a simple case of this where the function `k(p,q)`

is just the inner product `<p,q>`

.

OK, I see. Do you have a specific function in mind for `k`

?

Yes, for example, `k(p,q)=exp(-||p-q||)`

where the norm `||p-q||`

is the L1 norm.

You could add a dummy dimension and use broadcasting for this use case:

```
a = torch.randn(128, 2)
b = torch.randn(128, 2)
res = torch.norm(a.unsqueeze(1)-b, dim=2, p=1)
res_manual = []
for a_ in a:
for b_ in b:
res_manual.append(torch.norm(a_-b_, dim=0, p=1))
res_manual = torch.stack(res_manual)
res_manual = res_manual.view(128, 128)
print((res - res_manual).abs().max())
> tensor(0.)
```

Thank you so much for the code!

Just to check, both methods in your code example are compatible with autograd right? coz I want to use the matrix as part of my loss function.

Yes, Autograd will be able to track these operations.

1 Like