I’m reading the documentation for the Adamw optimizer. In the updates for the m_t and v_t terms, the beta values have a superscript of t. I thought these beta values are fixed. Any insights what this superscript means much appreciated.
Check the paper referenced in the docs in Algorithm 2:
beta_1^t
is taken to the power oft
beta_2^t
is taken to the power oft