Building a Modular Model: State dict vs Chaining Models

Hi all,

I’m working on a distributed learning system where I’m splitting a big model into smaller parts.
I call this big network a chain consisting of smaller networks called links.
The use case is that the links can be a client in a distributed learning system.

Currently I’m initializing separate models, and forwarding the output of each link to the next link in the chain, this works well.

When I would like to replace a link in the chain for another, I’m currently simply calling a different model. For implementation sake, it would be easier to just replace the state dict of several layers in the chain by the state dict of another link.

My question: Does the state dict of layers encompass all the layers properties/attributes/links?
(I’m familiar with most, weights, biases, gradients, but I’ve switched to PyTorch quite recently, and would not know what other information might stay linked.)