How to train two independent networks

estebarb · March 2, 2025, 5:19pm

I’m trying to implement a simple char by char language model using Test Time Training. I started with the code from [2407.04620] Learning to (Learn at Test Time): RNNs with Expressive Hidden States .

The network is actually divided in two: 1) a memory side that is trained on a self supervised task, where the task is generated by two learned projections. 2) the language model.

I’m having problems implementing it, as pytorch mentions that the weights of (1) were changed. I’m not sure about what I should look as a possible fix:
How I can instruct PyTorch to create two backpropagation graphs? Freeze weights and do several passes?

ptrblck · March 3, 2025, 8:12pm

Unsure if I understand the issue correctly, but I assume PyTorch fails during the backward pass claiming some parameters were already updated? Depending on the logic you want to apply to train your model you might need to detach the forward pass if you don’t want to backpropagate through the first part of your model anymore.

J_Johnson · March 4, 2025, 3:35pm

DCGAN might align with some of what you’re attempting to do and could be worth a look.

With a DCGAN, you have two networks, a discriminator and a generator. It may help motivate how to properly handle the detach in your training pipeline. Here is a tutorial:

https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html