PyTorch Wavenet Model loss is not decreasing (Help)

yoyololicon · December 12, 2020, 3:16pm

Just some mistakes I have saw:

The padding in CausalConv1d is not causal. You’re going to pad (kernel_size - 1) * dilation on both side of the input’s last dimension. The correct way should be padding the input before the Conv1d layer in the forward call and set padding to 0 in Conv1d.

...
    self.conv = nn.Conv1d(in_channels, out_channels, kernel_size, padding=0, dilation=dilation, **kwargs)

def forward(self, x):
    x = F.pad(x, (self.pad, 0))       # only pad on the left side
    return self.conv(x)

The skip_size param doesn’t make any sense to me, especially in this line:

skip = skip[:,:,-self.skip_size:] #dim control for adding skips

You set skip_size to 256 to match the one-hot vector of y, which represent the value of the sample just next to 1024 input samples. But you take 256 values from the last axis which represent time, so you’re somehow mapping 256 different time position values to 256 different amplitude values, looks weird to me .
Those 256 hidden values should be taken from the second axis, which represent hidden channels.

You directly cast x to float type, which make the quantization completely useless. In my opinion, the shape of x should be (batch, 256, 1024) with one-hot vector along the second axis, and the shape of y should be (batch, 256, 1) with one-hot vector along the second axis.

I think you might have some misunderstanding of WaveNet, but this is normal, because the original paper didn’t write everything very clearly (to me), so I recommend you to take a look at others’ implementation to get a wider view of WaveNet.

The wavenet model I have implemented:

github.com

yoyololicon/wavenet-like-vocoder/blob/3591786b6507173323b7b657a98b9bdffabbee46/model/model.py#L23


        dilations = radix ** torch.arange(n_layers)
        dilations = dilations.tolist()
        if descending:
            dilations = dilations[::-1]
        self.dilations = dilations * n_blocks
        self.cls = classes
        self.r_field = sum(self.dilations) + 1
        self.rdx = radix
class WaveNet(_BaseModel):
    def __init__(self,
                 n_blocks=4,
                 n_layers=10,
                 classes=256,
                 radix=2,
                 descending=False,
                 aux_channels=80,
                 dilation_channels=256,
                 residual_channels=256,
                 skip_channels=256,