PyTorch Wavenet Model loss is not decreasing (Help)

I now understand how the embedding layers function in your implementation, but in what sense would feeding the quantized values directly “lose complexity on the input presentation”? What sort of considerations for model performance lead you to that conclusion?

Also, in the more efficient training method you proposed, are the dimensions you are assuming for x, (batch, 256, seq_len)? Is it normal to create a one-hot-encoding by having the dimensions be (num_classes, 1) for each encoded vector? I ask because in PyTorch I thought that this would mean that we have 256 channels, each with a value representing the 0,1 class value - or in the case of the output the probability that the next item belongs to each class.

The reason I ask the above question is that I am currently using CrossEntropyLoss which was described in the Wavenet paper. According to the PyTorch documentation, this expects input with dimensions (minibatch, num_classes) and a target describing a class index (in my case: [0,255]). Given this, I would think I should make the one-hot-encoded vectors such that the input has dimensions (batch, seq_len, 256) and the output should be (batch, 1, 256) or (batch, 256).

How would using the above dimensions affect the model, given that it is using convolutional layers? Would they still work? Am I misunderstanding CrossEntropyLoss?

Thank you for the help! I’ve been able to make more progress in the last week conceptually and in code than I have over the past few months with your help. I really appreciate it!