I’m working on a distributed learning system where I’m splitting a big model into smaller parts.
I call this big network a chain consisting of smaller networks called links.
The use case is that the links can be a client in a distributed learning system.
Currently I’m initializing separate models, and forwarding the output of each link to the next link in the chain, this works well.
When I would like to replace a link in the chain for another, I’m currently simply calling a different model. For implementation sake, it would be easier to just replace the state dict of several layers in the chain by the state dict of another link.
My question: Does the state dict of layers encompass all the layers properties/attributes/links?
(I’m familiar with most, weights, biases, gradients, but I’ve switched to PyTorch quite recently, and would not know what other information might stay linked.)