I’m building a model that generates sequences. Internally, the sequences are a
(N,1, length, W) tensor of width-
W one-hots, but part of the loss function involves converting sequences to strings (using a dictionary) and passing them as an argument to another PyTorch neural network that returns a scalar. Furthermore, certain one-hots are (deterministically) ignored during this conversion.
Is this okay to do? It isn’t clear to me how gradients would be backpropagated properly, but I’m not getting any errors.