General Questions on DCGs

  1. What is the semantic meaning of varying the number of hidden layers? Is the idea similar to weight sharing?
  2. How are gradient updates made with DCGs? Say, you increased the width of the hidden layer dynamically one iteration and then decreased the width of the same hidden layer the next iteration, what happens to the weights in that layer? Are the common subset of weights preserved? Are they reinitialized each iteration?
  3. Does it make sense to manually store and retrieve learned layers dynamically or does PyTorch natively do this?
  4. If I wanted to make dynamic predictions, how would I handle a dynamic output layer?