Problem when combining two neural networks

I have two trained neural networks (NNs) that I want to combine to create a new neural network (with the same structure) but whose weights are a combination of the previous two neural networks’ weights.

The two NNs have an accuracy of ~97%, but when I combine them I obtain a value of around 47%. The problem is not that one (combining non-linear things not always works), but instead that when I run the code below (I have it in a Jupyter cell) several times, the accuracy increases to around ~97%.

Any ideas what is happening?

The code that I have is this:

# The interpolation parameter    
beta = 0.5

# Get the networks' parametes
params1 = net1.named_parameters()
params2 = net2.named_parameters()

dict_params = dict(params2)

# Do the linear combination of the 2 NNs
for name1, param1 in params1:
    if name1 in dict_params:
        dict_params[name1].data.copy_(beta*param1.data + (1.0 - beta)*dict_params[name1].data)

# Create a new NN with the parameters
net_combined = Net().to(device)
net_combined.load_state_dict(dict_params)

If you execute the update logic a few times you would be converging the value towards params1 as seen here:

params1 = {'a': torch.empty(1).uniform_(-100, 100).item()}
dict_params = {'a': torch.empty(1).uniform_(-100, 100).item()}
beta = 0.5

print('target {}\nother {}'.format(params1, dict_params))

for _ in range(20):
    # Do the linear combination of the 2 NNs
    for name1 in params1:
        param1 = params1[name1]
        if name1 in dict_params:
            dict_params[name1] = (beta*param1 + (1.0 - beta)*dict_params[name1])
    
    print(dict_params[name1])

E.g for random values:

target {'a': 10.199392318725586}
other {'a': -97.52051544189453}
-43.66056156158447
-16.730584621429443
-3.2655961513519287
3.4668980836868286
6.833145201206207
8.516268759965897
9.357830539345741
9.778611429035664
9.989001873880625
10.094197096303105
10.146794707514346
10.173093513119966
10.186242915922776
10.192817617324181
10.196104968024883
10.197748643375235
10.19857048105041
10.198981399887998
10.199186859306792
10.199289589016189

target {'a': 14.445125579833984}
other {'a': 57.088409423828125}
35.766767501831055
25.10594654083252
19.775536060333252
17.110330820083618
15.777728199958801
15.111426889896393
14.778276234865189
14.611700907349586
14.528413243591785
14.486769411712885
14.465947495773435
14.45553653780371
14.450331058818847
14.447728319326416
14.4464269495802
14.445776264707092
14.445450922270538
14.445288251052261
14.445206915443123
14.445166247638554

target {'a': -5.63586950302124}
other {'a': 13.969194412231445}
4.1666624546051025
-0.7346035242080688
-3.1852365136146545
-4.410553008317947
-5.023211255669594
-5.329540379345417
-5.482704941183329
-5.559287222102284
-5.597578362561762
-5.616723932791501
-5.626296717906371
-5.6310831104638055
-5.633476306742523
-5.6346729048818815
-5.635271203951561
-5.635570353486401
-5.63571992825382
-5.63579471563753
-5.635832109329385
-5.635850806175313

Thanks @ptrblck !
But if I run the piece of code in a Jupyter cell, the values of params1 are reset to the original ones, aren’t they? Since every time I’m creating the variable from the original values and operating with it (like wrapping all your code inside a for loop that repeats the code execution).

If you are manipulating the values inplace (via the unsuppoted .data attribute and .copy_) the manipulations would be reflected in the model.
You can run a quick check and just print the values.

Thanks @ptrblck !

Yes, as you said, the problem was manipulating the values in place. Manipulating the dictionaries as regular python values is much easier.

Another solution (in case .copy_ and .data are needed) would be by deep-copying the dictionary so the values are not linked internally when iterating in the loop.

import copy

[...]
dict_params = copy.deepcopy(dict(params2))
[...]