Split network, clone and autograd

qwerty1024 · April 27, 2020, 6:18pm

I’ve followed the few posts here about splitting networks and autograd, but if I plot my weights/parameters it suggests split layers are not learning properly.

My setup:

input = Variable(torch.FloatTensor(data_batch))
target = Variable(torch.FloatTensor(target_batch))

x = input[:,:-1]	 # last column of input needed at end
y = input[:, -1].reshape(batch, -1)

z = x.clone()

z = f(z) # common layer

z1 = z.clone() # branch 1
z1 = f(z1)

z2 = z.clone() # branch 2
z2 = f(z2)

output = z1 + y*z2 # output

Gradients (red) and weights for ten epochs, shared layer:

For branch 1:

For branch 2:

I suspect I’m not splitting properly, looks to me like the split layers are simply not learning. I have validated that cloning creates a new variable object in memory, but I’m missing something here.

Any insight appreciated.

albanD · April 27, 2020, 8:16pm

Hi,

You don’t actually need to clone here. You can re-use z as the input for f multiple times with no issues.
Your weights do seem to change in both branches. Do you have any issue with the loss value?

qwerty1024 · April 27, 2020, 8:59pm

Got it, thanks. You are right that the weights change a bit across branches. And yes loss decreases extremely slow and not by much.