I want to implement a denoising autoencoder for typos correction tasks, which works on byte-level (utf-8 encoded strings), not on charter level. As a baseline, I want to create a simple dense autoencoder (see the picture below) with multi-softmax layer at the output. Each softmax represents a one byte of output - one-hot encoded. Can I achieve this behavior using CrossEntropyLoss() or do I need to do it diffirently?
CrossEntropyLoss is strictly a combination of LogSoftmax and NLLLoss, so it’ll neither help nor hinder you.
If the softmaxes all are over the same number of elements, you could use the k-dimensional variant, though.
Thank you. I was reading about k-dimentional variant but I’m not quite sure how it works. Can you give me some example? In the output I have the vector of size (batch, n_bytes, max_bytes_in_string), where n_bytes=256 and max_bytes_in_string=28. I want each byte of output being processed by separate softmax. How can I achieve that?