I am currently looking for the optimal learning rate when training a GAN. Therefore I generated a generator and a discriminator model and copied both models tree times to evaluate four different learning rates.
For copying I tried both
copy.deepcopy(module)
and
module_copy = copy.deepcopy(module)
module_copy.load_state_dict(module.state_dict())
However both approaches yield strange results when training: The results highly indicate that training the second GAN does not start from scratch but continues where the training of the first model ended. The third GAN continues where the second ended etc.
I checked that the models do not share parameters. After training, different model’s parameters have different values. I do not have a clue what the problem is.
These are the modules of the generator, the discriminator is very similar:
0: MFCCGenerator(
(model_before): Sequential(
(0): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=174, out_features=50, bias=True)
(1): LayerNorm(torch.Size([50]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
(1): AlwaysDropout(p=0.5)
(2): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=50, out_features=15, bias=True)
(1): LayerNorm(torch.Size([15]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
)
(model_after): Sequential(
(0): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=23, out_features=20, bias=True)
(1): LayerNorm(torch.Size([20]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
(1): AlwaysDropout(p=0.5)
(2): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=20, out_features=16, bias=True)
(1): LayerNorm(torch.Size([16]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
(3): Linear(in_features=16, out_features=13, bias=True)
)
)
1: Sequential(
(0): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=174, out_features=50, bias=True)
(1): LayerNorm(torch.Size([50]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
(1): AlwaysDropout(p=0.5)
(2): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=50, out_features=15, bias=True)
(1): LayerNorm(torch.Size([15]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
)
2: CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=174, out_features=50, bias=True)
(1): LayerNorm(torch.Size([50]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
3: Sequential(
(0): Linear(in_features=174, out_features=50, bias=True)
(1): LayerNorm(torch.Size([50]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
4: Linear(in_features=174, out_features=50, bias=True)
5: LayerNorm(torch.Size([50]), eps=1e-05, elementwise_affine=True)
6: LeakyReLU(negative_slope=0.01)
7: AlwaysDropout(p=0.5)
8: CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=50, out_features=15, bias=True)
(1): LayerNorm(torch.Size([15]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
9: Sequential(
(0): Linear(in_features=50, out_features=15, bias=True)
(1): LayerNorm(torch.Size([15]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
10: Linear(in_features=50, out_features=15, bias=True)
11: LayerNorm(torch.Size([15]), eps=1e-05, elementwise_affine=True)
12: LeakyReLU(negative_slope=0.01)
13: Sequential(
(0): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=23, out_features=20, bias=True)
(1): LayerNorm(torch.Size([20]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
(1): AlwaysDropout(p=0.5)
(2): CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=20, out_features=16, bias=True)
(1): LayerNorm(torch.Size([16]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
(3): Linear(in_features=16, out_features=13, bias=True)
)
14: CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=23, out_features=20, bias=True)
(1): LayerNorm(torch.Size([20]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
15: Sequential(
(0): Linear(in_features=23, out_features=20, bias=True)
(1): LayerNorm(torch.Size([20]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
16: Linear(in_features=23, out_features=20, bias=True)
17: LayerNorm(torch.Size([20]), eps=1e-05, elementwise_affine=True)
18: LeakyReLU(negative_slope=0.01)
19: AlwaysDropout(p=0.5)
20: CombinedLinear(
(layers): Sequential(
(0): Linear(in_features=20, out_features=16, bias=True)
(1): LayerNorm(torch.Size([16]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
)
21: Sequential(
(0): Linear(in_features=20, out_features=16, bias=True)
(1): LayerNorm(torch.Size([16]), eps=1e-05, elementwise_affine=True)
(2): LeakyReLU(negative_slope=0.01)
)
22: Linear(in_features=20, out_features=16, bias=True)
23: LayerNorm(torch.Size([16]), eps=1e-05, elementwise_affine=True)
24: LeakyReLU(negative_slope=0.01)
25: Linear(in_features=16, out_features=13, bias=True)
Thanks for every idea you might have!