student_result shape is 4,7,128,128,128
label shape is 4,1,128,128,128
both of them in GPU
student_result comes from model final layer which only has a Conv3d() that changes channel to 7
I have tried to use torch.softmax(student_result,dim=1) to fix it, but it did not work.
When i put both of student_result and label on cpu and comment out with autucast(), it will work fine. So, what should i do to make this code work correctly in gpu and fp16 mode.
amp_grad_scaler = GradScaler()
for epoch in range(begin_epoch,end_epoch):
student_module.train()
for i,batch in enumerate(DataLoader):
data=batch['data'].to(torch.float32).cuda()
hard_target=batch['hard_target'].to(torch.long).cuda()
with torch.no_grad():
teacher_result=teacher_module(data)
with autocast():
student_result=student_module(data)
loss=LossFunction(teacher_result,student_result,hard_target,Temperature,0.7)
amp_grad_scaler.scale(loss).backward()
amp_grad_scaler.step((optimizer))
amp_grad_scaler.update()
sum_loss+=loss
Error messgae:
Traceback (most recent call last):
File "/home/XXX/Code_Wrap/Distilling_NNUNET-main/main.py", line 133, in main
loss=LossFunction(teacher_result,student_result,hard_target,Temperature,0.7)
File "/home/XXX/Code_Wrap/Distilling_NNUNET-main/main.py", line 53, in LossFunction
loss1=weight*loss_function(torch.softmax(student_result/T,dim=1),torch.softmax(softtarget/T,dim=1),)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Although this message tell us loss1 may be wrong, the unable to get repr occured in loss2 that i have pasted above.