Hello! AdamW has a
foreach parameter, which states :
foreach (bool, optional) – whether foreach implementation of optimizer is used (default: None)
I tried search for what the “foreach implementation” is, but couldn’t find it. Could anyone explain what this is?
Thank you in advance!
there are some functions with the prefix of
_foreeach_ such as
torch._foreach_add that take one or more lists of tensors. They apply some counterpart native function such as
torch.add to each element of input tensor(s). If a certain condition is met such as tensors are hosted on the same device and of the same dtype then we can expect much less CUDA kernel calls than just iterating over the input lists and call torch functions.
I see! Thank you for your classification.