Apex AMP with torch.cuda.amp

Kaykay · September 16, 2021, 11:10pm

What are the downsides to using both mixed precision packages together? Currently, my script is something like this to choose one or the other:

with autocast(enabled=device.type == 'cuda' and apex is None):
	logps = model(images)
	loss = criterion(logps, labels)
	if apex is None:
		scaler.scale(loss).backward()
		scaler.step(optimizer)
		scaler.update()
	else:
		with amp.scale_loss(loss, optimizer) as scaled_loss: # type: torch.FloatTensor
			scaled_loss.backward()
		optimizer.step()

Is this conditional split necessary? Is there a specific reason not to mix them together, or is it relatively benign? It would make my code cleaner to remove the apex is None checks if unneeded but I’m not an expert on mixed precision so I don’t know the potential issues this may cause. I am considering something like this:

with autocast(enabled=device.type == 'cuda'):
	logps = model(images)
	loss = criterion(logps, labels)
	with amp.scale_loss(loss, optimizer) as scaled_loss:
		scaler.scale(scaled_loss).backward()
	scaler.step(optimizer)
	scaler.update()

ptrblck · September 16, 2021, 11:22pm

I’m not sure why you would like to mix them, as torch.cuda.amp is also providing the GradScaler class so there is no need to use the deprecated apex.amp one.

Kaykay · September 16, 2021, 11:31pm

I was under the impression that they were slightly different, is that not the case? I’ve been going back and forth to see if I found more favorable results using one or other, and that’s why my program has those conditional checks to make it easier to alternate instead of commenting out portions over and over again

ptrblck · September 16, 2021, 11:34pm

apex.amp is deprecated so you should stick to the native implementation. This post explains it in more detail.