Use CLIP as a pretrained module -- Should/Can I use `requires_grad=False`?

I’m interested in using OpenAI’s CLIP as a pre-trained module, part of a network with other trainable module. Input will be first passed through CLIP (e.g., image encoder) for feature extraction, which will subsequently be used as input to a trainable module. The network needs to be trained using back-propagation under some criterion and so on.

My question is, whether it suffices to use CLIP in eval mode (.eval()) or whether I need to set its parameters requires_grad to false as well. More importantly, do I necessarily NEED to have CLIP’s parameters with requires_grad=True in order to back-propagate and update the trainable network’s weights? I’m asking because if I don’t set requires_grad=False I get error during loss.backward() call, where it suggests to use retain_graph=True etc. It feels that I don’t need to use retain_graph=True in the backwards call, but I might be wrong obviously.

Thank you!