# MultiLabel Problem

Hi,

I’m trying to solve a multi label problem
I have a tensor of around 400*2000 values
The 2000 are zeros and ones, but the vectors have in average only 10 of the 2000 values with a one the others are zeros.
A one should have more importance than the zeros.
I standardize the values with a mean square algorithm.
So this is my first question. Is this good in that case?

I also have output tensors with the size of 60 classes, which are not mutual exclusive. There are always 10 classes one and the others zero.

This is my network

``````network = torch.nn.Sequential(
torch.nn.Linear(len(self.getVector()), 250),
torch.nn.ReLU(),
torch.nn.Linear(250, 150),
torch.nn.ReLU(),
torch.nn.Linear(150, 60),
)

loss_function = torch.nn.MultiLabelSoftMarginLoss()

network.train()
for i in range(500):
predicted_value = network(test_input_tensor)
loss = loss_function(predicted_value, test_output_tensor)
print(i, loss.item())
loss.backward()
optimizer.step()

network.eval()
output = network(prognostic_input_tensor)
``````

As I have not much experience in machine learning, I want to know
if you have some advice, if this is a good approach for a multi label probelm with the
features mentioned above?
It seems to me that it predicts a lot of negative values, what I dont understand.

Hi Log!

Because the output of your model is the output of your last `Linear`
layer, you are predicting raw-score logits. A logit value that is less
than zero corresponds to a predicted probability less than one half.
Typically a probability of less than one half for the “1” state when
interpreted as a hard “yes-no” prediction would be taken to be a
“0”-state prediction (and greater than one half would be the “1” state).

Many more of your “output-tensor” `target` values are `0`'s than are
`1`'s, so if you weight each individual `target` value equally in loss
function, your model can train to do a good job on the loss function
by preferentially predicting `1`'s (that is, predicting negative logits),
regardless of the input data.

The common approach to addressing this is to weight your
less-frequent `1` `target` values more heavily in your loss function.

Note that BCEWithLogitsLoss is essentially the same as
`MultiLabelSoftMarginLoss` but has a `pos_weight` argument
that you can pass to its constructor.

You say that you have 60 classes, and that any given sample `target`
has 10 classes in the `1` state and 50 in the `0` state. If all of your
classes are about equally likely to be in the `1` state, you could use
the same `pos_weight` for all of them. A reasonable value would be
`pos_weight = n_negative / n_positive`. So:

``````loss_function = torch.nn.BCEWithLogitsLoss (pos_weight = torch.tensor ([5.0]))
``````

If the likelihood of your different classes having `target` value `1` are
not all broadly similar, then you would pass in a tensor of length 60
for your `pos_weight`, that is, a different `pos_weight` value for each
class.

Best.

K. Frank

1 Like

but for the input tensors, do I have to normalize them before passing them to the network,
or can I input tensors consisting of ones and zeros?

Hi Log!

Passing in the “raw” tensors should be fine. Being ones and zeros,
they are already close to being normalized. Changing them to, say,
`-1` and `1` so (if they were fifty-fifty) they would have a mean of 0 and
a standard deviation of 1 wouldn’t affect things much. (Try it both
ways – I doubt you’ll see any difference.)

(In contrast, think about a 16-bit grayscale image as input to a
network. The pixel values run from zero to about 65,000, so they
can be rather large. Normalizing the pixel values so that they are
of order one makes like easier for the network.)

Best.

K. Frank