Hello Andrew!
First, a general comment:
We will be able to give you advice that is more
likely to be useful to you if you give us some
concrete detail about the problem you are working
on.
How big are your images? How many will you be
training on? What do they look like? What is
the typical distribution of your two attributes?
What is the conceptual meaning of your attributes?
I don’t believe that nn.MSELoss
has a built-in
way to include these relative weights. There are
a number of straightforward approaches to including
such weights.
Myself, I would just write my own loss function,
something like this:
import torch
# define weighted loss function
def wtSqErr (pred, targ, wts):
return (wts * (pred - targ)**2).mean()
# construct some sample data
# use a batch size of 10
# y_targ are the actuals, y_pred are the predictions
# which, for this example, are the actuals plus noise
y_targ = torch.tensor ([1000.0, 2.0]) * torch.randn (10, 2) + torch.tensor ([2000.0, 3.0])
y_targ
y_pred = y_targ + torch.tensor ([100.0, 0.15]) * torch.randn (10, 2)
y_pred.requires_grad = True
y_pred
# set up the weights for the loss
wtA = 1.0 / 1000.0**2
wtB = 1.0 / 2.0**2
wtAB = torch.tensor ([wtA, wtB])
wtAB
# calculate loss
loss = wtSqErr (y_pred, y_targ, wtAB)
loss
# show that autograd works
print (y_pred.grad)
loss.backward()
print (y_pred.grad)
Here is the output of the above script:
>>> import torch
>>>
>>> # define weighted loss function
...
>>> def wtSqErr (pred, targ, wts):
... return (wts * (pred - targ)**2).mean()
...
>>> # construct some sample data
... # use a batch size of 10
... # y_targ are the actuals, y_pred are the predictions
... # which, for this example, are the actuals plus noise
...
>>> y_targ = torch.tensor ([1000.0, 2.0]) * torch.randn (10, 2) + torch.tensor ([2000.0, 3.0])
>>> y_targ
tensor([[2.3612e+03, 2.4401e+00],
[2.2880e+03, 7.0144e+00],
[1.2435e+02, 4.6300e+00],
[3.7007e+03, 1.4845e+00],
[1.7911e+03, 2.0490e+00],
[2.6058e+03, 2.2381e+00],
[6.1270e+02, 2.1648e+00],
[6.9680e+02, 1.4656e+00],
[1.2903e+03, 2.8559e+00],
[1.6696e+03, 5.5197e+00]])
>>> y_pred = y_targ + torch.tensor ([100.0, 0.15]) * torch.randn (10, 2)
>>> y_pred.requires_grad = True
>>> y_pred
tensor([[2.6065e+03, 2.5329e+00],
[2.3034e+03, 7.2111e+00],
[2.9170e+02, 4.4378e+00],
[3.7426e+03, 1.4848e+00],
[1.8188e+03, 2.2676e+00],
[2.8676e+03, 2.3148e+00],
[5.6415e+02, 2.1441e+00],
[7.7348e+02, 1.4650e+00],
[1.2437e+03, 2.9639e+00],
[1.5545e+03, 5.5731e+00]], requires_grad=True)
>>>
>>> # set up the weights for the loss
...
>>> wtA = 1.0 / 1000.0**2
>>> wtB = 1.0 / 2.0**2
>>>
>>> wtAB = torch.tensor ([wtA, wtB])
>>> wtAB
tensor([1.0000e-06, 2.5000e-01])
>>>
>>> # calculate loss
...
>>> loss = wtSqErr (y_pred, y_targ, wtAB)
>>> loss
tensor(0.0111, grad_fn=<MeanBackward1>)
>>>
>>> # show that autograd works
...
>>> print (y_pred.grad)
None
>>> loss.backward()
>>> print (y_pred.grad)
tensor([[ 2.4533e-05, 2.3200e-03],
[ 1.5451e-06, 4.9169e-03],
[ 1.6735e-05, -4.8059e-03],
[ 4.1961e-06, 8.9884e-06],
[ 2.7656e-06, 5.4653e-03],
[ 2.6174e-05, 1.9158e-03],
[-4.8546e-06, -5.1618e-04],
[ 7.6680e-06, -1.5900e-05],
[-4.6640e-06, 2.7002e-03],
[-1.1507e-05, 1.3345e-03]])
Note that if you use pytorch tensors to do your
calculations, autograd will work for you without
your having to do anything special.
Pytorch naturally works with batches. The first
index of your input data, predictions, and target
data tensors is the index that indexes over samples
in the batch.
In the above example, you can understand the
generated data to be a batch of 10 samples.
(In fact, pytorch loss functions require batches,
even if the batch size is only 1. Following the
above example, for batch-size = 1, a “batch” of,
say, predictions would then have a shape of
y_pred.shape = torch.Size ([1, 2]).
)
If your training data fits in memory (We don’t
know – you’ve told us nothing concrete about
your problem.), you can read it all into one
tensor, and then use “indexing” or “slicing”
to get your batches.
import torch
all_data = torch.ones (10, 3)
first_batch_of_two = all_data[0:2]
second_batch_of_two = all_data[2:4]
Doing this does not create new tensors with their
own storage – it just sets up a view into the
existing all_data
tensor.
Good luck.
K. Frank