I use @autocast() to decorate my module forward functions and I wonder that, when inference(so there are no weights update), model’s half-weights are reused or calculated every forward? (i.e. will float2half for each weight be called every time?)
You could wrap the entire inference evaluation into autocast
, which would then use the internal caching to avoid transforming the parameters every time.
So, if i use @autocast() to decorate a forword function (without wraping the entire model), float2half will be called every time when calling that forward?
Yes, I think exiting the outermost autocast
context would clear the cache.