Checkpoint with no grad requiring inputs PROBLEM

A small update. I’m reading a tutorial on checkpointing models from the original author @Priya_Goyal and this part seems to be interesting for your use case:

NOTE: In case of checkpointing, if all the inputs don’t require grad but the outputs do, then if the inputs are passed as is, the output of Checkpoint will be variable which don’t require grad and autograd tape will break there. To get around, you can pass a dummy input which requires grad but isn’t necessarily used in computation.