# Understanding tesors content

Hi, I’m new here and trying to learn pytorch. I tried to write the simplest test script and prepare single layer conv net with single fully connected output layer. I picked three directions (up, right, diagonal) and tried to make a net that recognises them and outputs 3-item vector, where each element means one of the directions.

``````import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

learning_rate = 0.001

inp_tens = torch.from_numpy(np.array([[[[0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]],

[[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]],

[[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]]], dtype=np.float32))

tgt_tens = torch.from_numpy(np.array([[[[1, 0, 0]]],

[[[0, 1, 0]]],

[[[0, 0, 1]]]], dtype=np.float32))

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

class ConvNet(nn.Module):
def __init__(self):
super(ConvNet, self).__init__()
self.layer1 = nn.Sequential(
nn.BatchNorm2d(5),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6, stride=6))
self.fc = nn.Linear(1*1*5, 3)

def forward(self, x):
out = self.layer1(x)
out = out.reshape(out.size(0), -1)
out = self.fc(out)
return out

model = ConvNet().to(device)

criterion = nn.MSELoss(reduction='sum')

for epoch in range(1000):

outputs = model(inp_tens)
print(outputs)
loss = criterion(outputs, tgt_tens)

loss.backward()
optimizer.step()
print(loss)
``````

the problem is, instead of converging to specific target like [1, 0, 0], it converges to the average of [1/3, 1/3, 1/3].
It does seemingly work if I separate the inputs/targets into 3 tensors with one item in each, but then when I try to run all three together, or slightly modified original inputs (shifting the values) the results are again not what I’d expect.

That’s probably some trivial problem, that should be possible to find, but I couldn’t find the answer to that neither here nor on SO. Possibly, it would be possible to use some dataloaders, but my goal here is to also understand the input, so I’d prefer to create everything manually.

Edit: ok, the problem with shift was due to the fact that I forgot to maxpool it, so it had to be different when going in the FC layer.