Hello, I have some time series in input, that are of shape (batch_size, num_channels, input_size).

I want to process each channel indipendently in some linear layers that share the same weights for each channel. Then I need to take these outputs, concatenate them, put them through some other linear layers and compute the loss, optimize it etc…

Here’s a simplified version of what I’m asking with only two inputs:

```
class network(nn.Module):
def __init__(self):
super(network, self).__init__()
self.shared_linear = nn.Linear(4, 3)
self.final_linear = nn.Linear(6,3)
def forward(self, x):
for i in range(x.shape[1]):
out_i = self.shared_linear(x[:,i,:]).unsqueeze(1)
if i == 0:
shared_out = out_i
else:
shared_out = torch.cat((shared_out, out_i), 1)
concatenated_out = shared_out.view(shared_out.shape[0], -1)
return self.final_linear(concatenated_out)
model = network()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
optimizer.zero_grad()
target = torch.tensor([0])
x = torch.randn(1, 2, 4)
out = model(x)
loss = criterion(out, target)
loss.backward()
optimizer.step()
```

Is this approach correct? Does backward automatically accumulate the gradients or shall I do something else?