To be honest I did not read the entire thing but If you keep that in mind you should be able to get rid of all your for loops.

Since I did not read your entire cost function so, Iâ€™m not sure, but your loss seems WAY too complicated to be a cost function. If you want to do gradient descent on it, make sure it is smooth enough so that you can find a decent solution.

@GuillaumeLeclerc Your answer points out the main problem with my codes.

I am trying to write code to run as following:

fm means feature map, whose shape is [batch_size, channel, height, widt] mask means the marked mask denoting that the same instance has the same tags. Its shape is the same as fm.
`numâ€™ mean how many instances in the corresponding sample in batch, Its shape is [batch_size].

For each sample in the batch, it is intended to pull the value of the pixel in fm with the same tag as close as possible, While pushing the value of the pixel in fm with different tags as far as possible.

I will try another way to implement this kind of function. Would you please share your solution with me if you come up with any idea to solve this problem using vectorized operation. Thanks

To be honest I donâ€™t quite get your explanation but I think it is good that you try to experiment by yourself.

A few additional insights:

It seems that your batches are completely independant, therefore it is almost certain that you can vectorize it

(refer_h - cur_select) ** 2 looks like the squared distance to me. When you encounter a common operation always use the pytorch implementation instead of rewriting yourself. You can use torch.nn.functional.mse_loss here. It is implemented by the backend and may be significantly faster

Again let me answer again. Are you sure this is a â€ślearnableâ€ť loss. Can you provide a mathematical formulation of it so I can have a better sense of what it is suppsed to do ?

This loss did help my model learn to assign the similar tag value for different parts of the same instance but assign distinctly different tag value for all parts from any two different instances.

the formula is written as follows:
The reference tag value for each instance is denoted as h. Assume that each instance has K parts( where K is not a fixed number, each image has N instances. n denotes each instance and nâ€™ denotes each instance except for nth instance. L_g is the loss.

The main challenge is the N and the K are not fixed.

Assume that we have a feature map, denoted as fm. Its shape is [channels, height, width] when we only consider one batch. What we need to do is to pull some value on this feature map as close as possible while pushing some value on this feature map as far as possible. So I use a mask to denote which pixel belongs to the same instance.