I’m reading the documentation for the Adamw optimizer. In the updates for the m_t and v_t terms, the beta values have a superscript of t. I thought these beta values are fixed. Any insights what this superscript means much appreciated.

Check the paper referenced in the docs in Algorithm 2:

`beta_1^t`

is taken to the power of`t`

`beta_2^t`

is taken to the power of`t`