I implemented a new cost function. It is working, but overall full batch loss time exploded (vs. nn.NLLLoss() from 10 ms to 900 ms on MNIST10 dataset).
system: torch 2.5.1, cuda 12, python 3.12.3 on ubuntu 24.04
algo idea (for each x,y entry)
- make x-tensor mean-free by offset m (m = mean within non target entries only),
- calculate norm(x-m)
- a final cosine-sim-like value
( I marked the critical lines by !!! )
def cos_loss(x: tt.Tensor, y: tt.Tensor) -> tt.Tensor: # x.dim=2, y.dim=1
"custom loss : shifted cosine (by A. Kleinsorge + A. Fauck)"
# 1 - cos(x-m, yy), |yy|=1, yy=(1,0,0), cos = nn.CosineSimilarity(dim=)
bs: int = y.numel() # batch_size, !classes==10!
# for i in range(bs): xy[i] = x[i][y[i]] # x[y] entries
xy: tt.Tensor = x[tt.arange(bs), y.int()] # x.index_select(?) !!!!
m: tt.Tensor = (x.sum(dim=1) - xy) * (1.0 / (10 - 1)) # avg all but target
xmn: tt.Tensor = tt.zeros(bs, device=y.device) # create out for next line
for i,(x1,m1) in enumerate(zip(x, m)): xmn[i] = (x1-m1).norm() # (x-m).norm(dim=0) !!!!
return 1.0 - ((xy-m) / (xmn + 1e-6)).mean()
Any idea how to speed up the code?
This algo has some nice mathematical properties, but this is another story.
Thanks for reading.
Alex