Hello, I am trying to train a convolutional neural network:
class ConvAutoencoder(nn.Module):
def __init__(self):
super(ConvAutoencoder, self).__init__()
# Encoder
self.conv1 = nn.Conv2d(1, 16, 3, padding=1) # batch_size, channels, height, width
# 32/64, 2, 10,8
self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
# Decoder
self.t_conv1 = nn.ConvTranspose2d(4, 16, 2, stride=2)
self.t_conv2 = nn.ConvTranspose2d(16, 1, 2, stride=2)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(x)
x = F.relu(self.conv2(x))
x = self.pool(x)
x = F.relu(self.t_conv1(x))
x = F.sigmoid(self.t_conv2(x))
return x
With the train function:
n_epochs = 10000
optimizer = torch.optim.Adam(model.parameters(), lr=0.0002)
criterion = nn.MSELoss()
train_matrix_array = get_matrix(train_dataset, img_height=8, batch_size=2)
test_matrix_array = get_matrix(test_dataset, img_height=8, batch_size=2)
for epoch in range(1, n_epochs + 1):
# monitor training loss
train_loss = 0.0
# Training
for data in train_matrix_array:
images = data
images = images.to(device)
optimizer.zero_grad()
outputs = model(images) # 32x3 tensors
# print("outputs: ", outputs , " images: ", images)
loss = criterion(outputs, images)
loss.backward()
optimizer.step()
train_loss += loss.item() * images.size(0)
train_loss = train_loss / len(train_matrix_array)
The problem is that, in the input there are a lot of tensors with negative values, but the output only gets positive values, these are getting smaller and smaller, I understand that because they want to go to negative values, but they never become negative. I understand that this also affects the values that are positive and also in their predictions the tensors are very small.
After a bit of training the values of the real input and the predictions look like this:
Prediction: tensor([1.4951e-20, 1.7295e-21, 1.8791e-31, 1.5755e-20, 2.2758e-29, 3.1556e-31,
0.0000e+00, 1.6123e-28], grad_fn=<SelectBackward0>)
Real values: tensor([-0.3655, 0.1866, -0.0160, -0.0876, 0.5151, -1.2745, -0.0676, -2.1106])
I assume it has to do with the last layer of the model but I don’t know which is more appropriate.