Hello Andrew!

First, a general comment:

We will be able to give you advice that is more

likely to be useful to you if you give us some

concrete detail about the problem you are working

on.

How big are your images? How many will you be

training on? What do they look like? What is

the typical distribution of your two attributes?

What is the conceptual meaning of your attributes?

I don’t believe that `nn.MSELoss`

has a built-in

way to include these relative weights. There are

a number of straightforward approaches to including

such weights.

Myself, I would just write my own loss function,

something like this:

```
import torch
# define weighted loss function
def wtSqErr (pred, targ, wts):
return (wts * (pred - targ)**2).mean()
# construct some sample data
# use a batch size of 10
# y_targ are the actuals, y_pred are the predictions
# which, for this example, are the actuals plus noise
y_targ = torch.tensor ([1000.0, 2.0]) * torch.randn (10, 2) + torch.tensor ([2000.0, 3.0])
y_targ
y_pred = y_targ + torch.tensor ([100.0, 0.15]) * torch.randn (10, 2)
y_pred.requires_grad = True
y_pred
# set up the weights for the loss
wtA = 1.0 / 1000.0**2
wtB = 1.0 / 2.0**2
wtAB = torch.tensor ([wtA, wtB])
wtAB
# calculate loss
loss = wtSqErr (y_pred, y_targ, wtAB)
loss
# show that autograd works
print (y_pred.grad)
loss.backward()
print (y_pred.grad)
```

Here is the output of the above script:

```
>>> import torch
>>>
>>> # define weighted loss function
...
>>> def wtSqErr (pred, targ, wts):
... return (wts * (pred - targ)**2).mean()
...
>>> # construct some sample data
... # use a batch size of 10
... # y_targ are the actuals, y_pred are the predictions
... # which, for this example, are the actuals plus noise
...
>>> y_targ = torch.tensor ([1000.0, 2.0]) * torch.randn (10, 2) + torch.tensor ([2000.0, 3.0])
>>> y_targ
tensor([[2.3612e+03, 2.4401e+00],
[2.2880e+03, 7.0144e+00],
[1.2435e+02, 4.6300e+00],
[3.7007e+03, 1.4845e+00],
[1.7911e+03, 2.0490e+00],
[2.6058e+03, 2.2381e+00],
[6.1270e+02, 2.1648e+00],
[6.9680e+02, 1.4656e+00],
[1.2903e+03, 2.8559e+00],
[1.6696e+03, 5.5197e+00]])
>>> y_pred = y_targ + torch.tensor ([100.0, 0.15]) * torch.randn (10, 2)
>>> y_pred.requires_grad = True
>>> y_pred
tensor([[2.6065e+03, 2.5329e+00],
[2.3034e+03, 7.2111e+00],
[2.9170e+02, 4.4378e+00],
[3.7426e+03, 1.4848e+00],
[1.8188e+03, 2.2676e+00],
[2.8676e+03, 2.3148e+00],
[5.6415e+02, 2.1441e+00],
[7.7348e+02, 1.4650e+00],
[1.2437e+03, 2.9639e+00],
[1.5545e+03, 5.5731e+00]], requires_grad=True)
>>>
>>> # set up the weights for the loss
...
>>> wtA = 1.0 / 1000.0**2
>>> wtB = 1.0 / 2.0**2
>>>
>>> wtAB = torch.tensor ([wtA, wtB])
>>> wtAB
tensor([1.0000e-06, 2.5000e-01])
>>>
>>> # calculate loss
...
>>> loss = wtSqErr (y_pred, y_targ, wtAB)
>>> loss
tensor(0.0111, grad_fn=<MeanBackward1>)
>>>
>>> # show that autograd works
...
>>> print (y_pred.grad)
None
>>> loss.backward()
>>> print (y_pred.grad)
tensor([[ 2.4533e-05, 2.3200e-03],
[ 1.5451e-06, 4.9169e-03],
[ 1.6735e-05, -4.8059e-03],
[ 4.1961e-06, 8.9884e-06],
[ 2.7656e-06, 5.4653e-03],
[ 2.6174e-05, 1.9158e-03],
[-4.8546e-06, -5.1618e-04],
[ 7.6680e-06, -1.5900e-05],
[-4.6640e-06, 2.7002e-03],
[-1.1507e-05, 1.3345e-03]])
```

Note that if you use pytorch tensors to do your

calculations, autograd will work for you without

your having to do anything special.

Pytorch naturally works with batches. The first

index of your input data, predictions, and target

data tensors is the index that indexes over samples

in the batch.

In the above example, you can understand the

generated data to be a batch of 10 samples.

(In fact, pytorch loss functions *require* batches,

even if the batch size is only 1. Following the

above example, for batch-size = 1, a “batch” of,

say, predictions would then have a shape of

`y_pred.shape = torch.Size ([1, 2]).`

)

If your training data fits in memory (We don’t

know – you’ve told us nothing concrete about

your problem.), you can read it all into one

tensor, and then use “indexing” or “slicing”

to get your batches.

```
import torch
all_data = torch.ones (10, 3)
first_batch_of_two = all_data[0:2]
second_batch_of_two = all_data[2:4]
```

Doing this does not create new tensors with their

own storage – it just sets up a view into the

existing `all_data`

tensor.

Good luck.

K. Frank