Change first model without change the second but still using the grad of the second

Hello, I am working on a pipeline with two models, model A and model B.

Model B is a frozen model, and model A is a model that changes the input for model B.

So, the forward pipeline is: input_model_A → model_A → output_model_A (input_model_B) → model_B → output_model_B.

For the backward, I want to use the information of the model_B to help change the parameters of the model A without changing the model B. For that, when I calculate the loss part, I do this to freeze the model_B:

        for param in model_B.parameters():
            param.requires_grad = False

So, the model_B does not change its weights and makes the batch norm and dropout deactivate, and also, the optimizer is using only the model_A params.

I checked the model B metrics during the training with the original input, and it is fixed (good signal because the model_B is frozen), and the metrics for the new input are increasing (a signal that something is being optimized). Also, the graph shows that the full pipeline is connected. Does this loss make sense? Or do I should try to make the model_B on .train(), change only the BN and dropout layers to false? (because the optimizer is already on the model_A params)?