# Receiving `nan` for losses during training

I have a training set with 43 variables and 7471 observations. The target has 6 outputs for each.

The input, denoted by `X`, has as shape of `(7471, 43)`, and the output, denoted by `y` , has a shape of `(7471, 6)`.

I want to implement a supervised regression model. Todo so a build a neural network based on the tutorial here.

All I did is change the input shape, denoted by `D_in`, the shape of the hidden layer, denoted by `H`, and the output shape, denoted by `D_out`.

In the end my code looks like this

``````class DynamicNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we construct three nn.Linear instances that we will use
in the forward pass.
"""
super(DynamicNet, self).__init__()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)

def forward(self, x):
"""
For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
and reuse the middle_linear Module that many times to compute hidden layer
representations.

Since each forward pass builds a dynamic computation graph, we can use normal
Python control-flow operators like loops or conditional statements when
defining the forward pass of the model.

Here we also see that it is perfectly safe to reuse the same Module many
times when defining a computational graph. This is a big improvement from Lua
Torch, where each Module could be used only once.
"""
h_relu = self.input_linear(x).clamp(min=0)
for _ in range(random.randint(0, 3)):
h_relu = self.middle_linear(h_relu).clamp(min=0)
y_pred = self.output_linear(h_relu)
return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.

D_in = input_x_train.shape # (7471, 43)

H, D_out = 512, input_y_train.shape # (7471, 6)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)

losses = []
record_loss = losses.append

x = torch.from_numpy(input_x_train).float()
y = torch.from_numpy(input_y_train).float()

for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)

# Compute and print loss
loss = criterion(y_pred, y)
record_loss(loss.item())
print(t, loss.item())

# Zero gradients, perform a backward pass, and update the weights.
loss.backward()
optimizer.step()
``````

However, the result i get while training was something like this

``````0 7943298220032.0
1 nan
2 nan
3 nan
4 nan
5 nan
6 nan
7 nan
8 nan
9 nan
10 nan
11 nan
12 nan
13 nan
14 nan
15 nan
16 nan
``````

I’m not really sure if there was something wrong with my implementation. Does anybody knows what I am missing? Thanks in advance.

I think your loss might grow pretty quickly, as you are summing the sample losses for a big batch of `7471` samples.
Try to use `reduction='elementwise_mean'` or maybe a smaller batch size and see, if you also get `nan` values.