I’m implementing a custom loss function in Pytorch 0.4. Reading the docs and the forums, it seems that there are two ways to define a custom loss function:

Extending Function and implementing forward and backward methods.

Extending Module and implementing only the forward method.

With that in mind, my questions are:

Can I write a python function that takes my model outputs as inputs and use torch.* functions to compute my loss function (without extending Function or Module)? If not, why?

This is a quite simple implementation of custom loss functions while there are not extra parameters. Could you please share some solutions to fix this problem?
For example, in keras, you can implement weighted loss by following:

Depending on your loss function, you could just multiply the positive and negative losses with your weights.
Maybe nn.BCEWithLogitsLoss might fit your use case providing pos_weight.

Could you further explain the weight in nn.CrossentropyLoss() and nn.BCELoss, pos_weight in nn.BCEWithLogitsLoss()?

weight in CrossentropyLoss is a Tensor of size C, but why does it should have the size of nbatch in nn.BCELoss()? And it seems that weight in BCELoss does not work for unbalanced data, right? ( because the weight is related to nbatch )

Does pos_weight has the same effect with weight in nn.CrossentropyLoss?

The weight argument in nn.BCE(WithLogits)Loss has the shape of the input batch, since the loss functions take floating point targets, which does not correspond to a class weighting schema. pos_weight on the other side is closer to a class weighting, as it only weights the positive examples. Furthermore, you can balance the recall and precision changing the pos_weight argument.

i mean, for each batch the input of the loss function is a list of all predictions and labels in the current batch, and the loss is built for input of only one prediction and label

so how it should be implement?

and another thing - how the backward() of costume function should be implemented?

i mean, for each batch the input of the loss function is a list of all predictions and labels in the current batch, and the loss is built for input of only one prediction and label

so how it should be implement?

and another thing - how the backward() of costume function should be implemented?

@netaglazer
I believe if you are worried about the first dimension being the Batch index, pytorch automatically extracts the individual predictions and accumulated the loss as batch loss. So, you can write your loss function assuming your batch has only one sample. @ptrblck could you please correct me if my understanding about loss function above is wrong?

my_cross_entropy is implemented as a simple function so you can just call it.
You could of course wrap it in an nn.Module and put the operations in the forward method, if that’s more convenient or if you need to store some internal states.