Multiple inputs in shared weight layers

pizza_party · March 27, 2024, 4:40pm

Hello, I have some time series in input, that are of shape (batch_size, num_channels, input_size).
I want to process each channel indipendently in some linear layers that share the same weights for each channel. Then I need to take these outputs, concatenate them, put them through some other linear layers and compute the loss, optimize it etc…
Here’s a simplified version of what I’m asking with only two inputs:

class network(nn.Module):
    def __init__(self):
        super(network, self).__init__()
        self.shared_linear = nn.Linear(4, 3)

        self.final_linear = nn.Linear(6,3)

    def forward(self, x):
        for i in range(x.shape[1]):
            out_i = self.shared_linear(x[:,i,:]).unsqueeze(1)
            if i == 0:
                shared_out = out_i
            else:
                shared_out = torch.cat((shared_out, out_i), 1)
        
        concatenated_out = shared_out.view(shared_out.shape[0], -1)
        return self.final_linear(concatenated_out)

model = network()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
optimizer.zero_grad()

target = torch.tensor([0])
x = torch.randn(1, 2, 4)
out = model(x)

loss = criterion(out, target)
loss.backward()
optimizer.step()

Is this approach correct? Does backward automatically accumulate the gradients or shall I do something else?

ptrblck · March 28, 2024, 2:27pm

Your approach looks alright and the backward pass should work as no detaching operation was used.
You can also verify it by checking the .grad attributes of trainable parameters before and after the backward call.