Are combined views of multiple tensors possible?

cdhernandez · January 16, 2020, 7:57am

I need to add an arbitrary penalization to the weights of my network but the normal way of doing this, i.e.

pen=0
for p in net.parameters():
pen+=penaltyfunc ( p )

is a bit too slow at high layer counts

to solve this I want to create a view(-1) of all the weight/bias matrices and then combine them into a new tensor. From there I can the whole penalization in one step. Normally with a view, the value of the view changes along with whatever you are viewing. However I’m not sure how to combine all these views into a single view. if I use the cat command, it returns a snapshot of the values of the views, i.e. it won’t change as the actual weights change.

Is there a way to do this or do i just need to execute the cat command each time I want to calculate the penalty?

I’ve tried searching for view multiple tensors, multiview, view+cat, but nothing really relevant comes up.

albanD · January 16, 2020, 3:27pm

Hi,

We used to do this in torch7: you had a specific function on the model that was making all the weights a view into a single big buffer. The short version is that this is a nightmare to maintain properly.
Anything you will do that copy a Tensor will break that link and then your optimizer (or penalty function here) will look at different versions of your weights and everything silently break.

In general, doing this for-loop should not be too slow. Is you model very different from general CNN?

cdhernandez · January 16, 2020, 7:49pm

No its a combination of convolutions, pools and fully connected layers so fairly straightforward, but its likely to get weirder once we get the base working and we try new things. The issue is this:

My benchmarks:
slowpen
1000 loops, best of 3: 1.88 ms per loop
100 loops, best of 3: 1.82 ms per loop
fastpen
1000 loops, best of 3: 258 µs per loop
1000 loops, best of 3: 443 µs per loop
create
10000 loops, best of 3: 132 µs per loop

So currently I’m currently creating a list of views and then using the stack command to combine them and take a snapshot each time i want to calculate the penalty. The speed for the forward/backward steps is listed above. Its already about 7 times faster than the for loop method and this only gets worse as the network gets deeper.
More importantly, that cat step takes like half the calculation time (‘create’) for the forward pass.

Further, this is only for a <10 layer network, the time only gets worse when I added more layers (making them wider doesn’t make a huge difference though).

All I need is a way to combine 2 view objects into a new view object without breaking their respective links. If that command exists thats all I need but thus far haven’t been able to find it (if it exists). I don’t need to copy the whole buffer, just do a series of straightforward calculations on it like various norms or means.