Split network, clone and autograd

I’ve followed the few posts here about splitting networks and autograd, but if I plot my weights/parameters it suggests split layers are not learning properly.

My setup:

input = Variable(torch.FloatTensor(data_batch))
target = Variable(torch.FloatTensor(target_batch))

x = input[:,:-1]	 # last column of input needed at end
y = input[:, -1].reshape(batch, -1)

z = x.clone()

z = f(z) # common layer

z1 = z.clone() # branch 1
z1 = f(z1)

z2 = z.clone() # branch 2
z2 = f(z2)

output = z1 + y*z2 # output

Gradients (red) and weights for ten epochs, shared layer:

For branch 1:

For branch 2:

I suspect I’m not splitting properly, looks to me like the split layers are simply not learning. I have validated that cloning creates a new variable object in memory, but I’m missing something here.

Any insight appreciated.

Hi,

You don’t actually need to clone here. You can re-use z as the input for f multiple times with no issues.
Your weights do seem to change in both branches. Do you have any issue with the loss value?

Got it, thanks. You are right that the weights change a bit across branches. And yes loss decreases extremely slow and not by much.