Let’s say *I* is a domain of images and f : *I* -> *E* and g : *E* -> *F* where *E* and *F* are embedding spaces.

There are two cases

- freeze f
- unfreeze f

Question is : when training g(f(*I*)), I expected that training time of the case 1 is more faster than that of the case 2. But from my experience, the training time difference is around 1:1.5 even though the number of parameters of f is way larger(100x) than g. I wanna save the time when freezing f. Any suggestion?

If there’s a way to save f(A) for all A in *I*, and I can only forward/backward propagate them, please let me know. Thanks in advance.