If you expect that some of the parameters in the model are not used, then you can pass allow_unused=True as suggested in the error message.
Otherwise you want to make sure that you only do differentiable ops with Tensors that can require gradients. In particular, only floating point Tensors can require gradients
How do I pass a tensor output which is with respect to one tokenizer(BART) to another tokenizer(XLNet) without converting it to string because this is where the model is turning out to be non-differentiable?
I have output from the BART model which has to be passed to the XL-Net model. But to do so I have to change the output from BART to the format needed by XL-Net, so I have to convert it to string and pass it through an XL-Net tokenizer. However, when using autograd w.r.t BART model it is becoming a non-differentiable unit.
If your op is not differentiable, there isn’t much we can do here.
If for your particular use case, you want to specify a special backward for that part that is not differentiable, you can do so with a custom Function: Extending PyTorch — PyTorch 1.7.1 documentation
I found a simple way around for my task, and I have to use argmax in the loss function.
However, I need to find a way to include a differentiable argmax operator. Can you help me in this regard?
Well it is not differentiable either so you either have to use a “soft” version of it if you can like softmax. Or a custom function as I mentionned above.