Training with gradient checkpoints (torch.utils.checkpoint) appears to reduce performance of model

When I do that I appear to get a new error:

[4]<stderr>:    loss.backward()
[4]<stderr>:  File "/pythonhome_pypi/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
[4]<stderr>:    torch.autograd.backward(self, gradient, retain_graph, create_graph)
[4]<stderr>:  File "/pythonhome_pypi/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
[4]<stderr>:    allow_unreachable=True)  # allow_unreachable flag
[4]<stderr>:  File "/pythonhome_pypi/lib/python3.6/site-packages/torch/autograd/function.py", line 77, in apply
[4]<stderr>:    return self._forward_cls.backward(self, *args)
[4]<stderr>:  File "/pythonhome_pypi/lib/python3.6/site-packages/torch/utils/checkpoint.py", line 99, in backward
[4]<stderr>:    torch.autograd.backward(outputs, args)
[4]<stderr>:  File "/pythonhome_pypi/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
[4]<stderr>:    allow_unreachable=True)  # allow_unreachable flag
[4]<stderr>:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

To be specific I added .requires_grad_():

res2, res3, res4, res5 = checkpoint.checkpoint(self.resnet_backbone, data['data'].requires_grad_())