following on from the pytorch tutorials for amp here:
Here is how I apply the amp
scaler = GradScaler()
for data, label in data_iter:
optimizer.zero_grad()
# Casts operations to mixed precision
with autocast():
loss = model(data)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
for data, label in data_iter_valid:
# ??
I was wondering if/when to apply mixed precision onto the validation data?? Do you need to use scaler on it?
Or is that even necessary? Wouldn’t it be needed if you wanted to keep the peak memory requirements the same during training/validation?