When inference(so there are no weights update), model's half-weights are reused or calculated every forward

I use @autocast() to decorate my module forward functions and I wonder that, when inference(so there are no weights update), model’s half-weights are reused or calculated every forward? (i.e. will float2half for each weight be called every time?)

You could wrap the entire inference evaluation into autocast, which would then use the internal caching to avoid transforming the parameters every time.

So, if i use @autocast() to decorate a forword function (without wraping the entire model), float2half will be called every time when calling that forward?

Yes, I think exiting the outermost autocast context would clear the cache.

@ptrblck Thank you for your quick reply, it helps a lot :smiley: