Let’s say I is a domain of images and f : I -> E and g : E -> F where E and F are embedding spaces.
There are two cases
- freeze f
- unfreeze f
Question is : when training g(f(I)), I expected that training time of the case 1 is more faster than that of the case 2. But from my experience, the training time difference is around 1:1.5 even though the number of parameters of f is way larger(100x) than g. I wanna save the time when freezing f. Any suggestion?
If there’s a way to save f(A) for all A in I, and I can only forward/backward propagate them, please let me know. Thanks in advance.