Loss in accuracy due to checkpointing

I am training the googlenet using the checkpointing feature (in torch.utils.checkpoint) and I can see a really surprising drop in accuracy (about 40 percent). One of my segment has only one layer (which is the convolution at the start of the network). Is this the problem? Can someone point me to a resource that explains what are the constraints on segment formation? Or any documentation that describes what happens in such exceptional cases when we make segments containing a single layer?

Could you check this example of the checkpoint util?
If you don’t find any obvious differences/errors, could you post your (simplified) code?