Loss function for image classification considering similarity

Hi, I’m not sure that it is appropriate to ask this kind of question here.

If my question is inappropriate for this forum and you know of an appropriate forum for my question, please recommend!

I have a general question about loss function for image classification.

Why my loss function isn’t work well?

I have a CNN model that classify image, and my model was trained well with cross entropy loss. ( answer probability distribution p is given one-hot vector for loss= sum of pi*log(qi).

I want to give a partial score even if model predict incorrectly. In other words, I want to consider similarity between categories for loss calculation.

For example, I want to classify a given image into one of the 4 categories: c1, c2, c3, c4. I have a score table like below.

             c1         c2           c3             c4
 c1         2.0        1.0          0.0           -1.0
 c2         1.0        2.0         -1.0            0.0 
 c3         0.0       -1.0          0.0            1.0
 c4        -1.0        0.0          1.0            2.0

The output of my model is probability vector that belongs to each category, noramlized by softmax function. output : [q1,q2,q3,q4] # q1+q2+q3+q4=1.0

I hope my model return to maximize expectation score, which is calculated with similarity matrix.

When given image is belong to c1, i.e ground truth probability distribution p : [1.0, 0.0, 0.0, 0.0],
I hope to calculate loss for this problem as follows, loss = ( S_11*q1 + S_12*q2 + S_13*q3 + S_14*q4 ) * ( -1) , where S_ij is element of similarity matrix described above.
However, training is really slow. Loss value decreases very slowly. And other observable measure related to performance of model said that training is really slow.

I think that my loss function has same objective with cross entropy loss function : when the image of ci is given, maximize qi.
Why my loss function doesn’t work well??

I already consider loss function form of cross entropy considering similarity, sum of pi*log(qi), where [p1,p2,p3,p4] is not one hot vector anymore. But, it is difficult to obtain a probability distribution (p) from the similarity matrix. And this is not what I want.

Actually, I’m working on physical chemistry and bioinformatics, not classifying image.

Thank you for reading.