A new idea about optimizer running and neural network design

Hi all, now I have many datasets with different feature number but same feature type. For example, d1=1000200, d2=1000400.
Therefore, I intend to design a three layer neural network while for each dataset, they will share the later two layers but not the first layer. For the first layer, each dataset has their own firsrt layer dataset.

Therefore, my optimization is like:
model = model()
opt = opt(model.parameters())

opt.zero_grad()
model.layer1 = layer[i]

pre = model(x)
loss = (true, pre)

loss.backward()
opt.step()

Is it a correct design? I think the first layer will also be optimized.

I don’t know where the layer list is defined but I would recommend to avoid manipulating the model after it’s creation since the parameter set could change and your optimizer might not have valid references to them. Since the shape of your input tensors differs maybe you could add a condition to the forward and use the appropriate layer for the current input?

Thanks a lot. But since I did not observe any erros in my training and indeed I learnt something based on my model, it is still pretty confusing for me to understand the difference between your solution and my plan. I will try to explore it. Thanks a lot.