I was working on the project ,however it just doesn’t work out, the situation is that I fit the model and then test it with same data , the difference of loss is quite huge…I have no idea why this is happening…I would provide as much as information I can,Thanks for all your patience firstly!
The whole picture of my train.py
:
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.485,0.456,0.406), (0.229,0.224,0.225))])
#in order to have same dataset so I put the same data path for train and test.
training_set = Dataloader(root = args.root,list_file = args.files,transform =transform,input_size=600)
training_set_loader = torch.utils.data.DataLoader(training_set,batch_size = args.batch_size ,shuffle=False,collate_fn = training_set.collate_fn)
test_set = Dataloader(root =args.root ,list_file = args.files ,transform =transform,input_size=600)
test_set_loader = torch.utils.data.DataLoader(test_set ,batch_size = args.batch_size ,shuffle = False,collate_fn = test_set.collate_fn)
#some model setting
model = Retina_Net(args.num_class)
weights = torch.load(join("weights","retinet.pth"))
model.load_state_dict(weights)
model.cuda()
def train(epoch,mode=True,beta=0.99):
model.train(mode)
global training_iterations
global avg_loss
for num_batch,(inputs,loc_targets,cls_targets) in enumerate(training_set_loader):
inputs = inputs.cuda()
loc_targets = loc_targets.cuda()
cls_targets = cls_targets.cuda()
optimizer.zero_grad()
optimizer.param_groups[0]["lr"] = lr_distr[training_iterations]
optimizer.param_groups[0]["betas"] = (mom_distr[training_iterations].item(),0.999)
loc_preds,cls_preds = model(inputs)
loss = loss_function(loc_preds, loc_targets, cls_preds, cls_targets)
#back prop
loss.backward()
#update parameters
optimizer.step()
avg_loss = loss.item()*(1-beta)+avg_loss*beta
smooth_loss = avg_loss/(1-(beta**(training_iterations+1)))
training_iterations+=1
def test(epoch,mode = False,beta = 0.99):
model.eval()
global avg_test_loss
global testing_iterations
with torch.no_grad():
for num_batch,(inputs,loc_targets,cls_targets) in enumerate(test_set_loader):
inputs = Variable(inputs.cuda())
loc_targets = Variable(loc_targets.cuda())
cls_targets = Variable(cls_targets.cuda())
loc_preds,cls_preds = model(inputs)
loss= loss_function(loc_preds, loc_targets, cls_preds, cls_targets)
avg_test_loss = loss.item()*(1-beta) + avg_test_loss*beta
smooth_test_loss = avg_test_loss/(1-beta**(1+testing_iterations))
And I take care of Batch normalization layer really carefully ,I freeze the pre-trained model Resnet50’s Batch normalization layers.So I rewrite the train()
function,where in this case fpn is Resnet50.
def train(self,mode):
super().train(mode)
for m in self.fpn.modules():
if isinstance(m,nn.BatchNorm2d):
m.weight.requires_grad= False
m.bias.requires_grad= False
m.eval()
elif isinstance(m,nn.Conv2d):
m.weight.requires_grad= False
And I train 10 epochs as a little test…However the result is just unexpectedly terrible…
epochs 1/10 [728/726 (100%)] loss: 0.2716
testing [728/726 (100%)] loss: 70.851
epochs 2/10 [728/726 (100%)] loss: 0.2685
testing [728/726 (100%)] loss: 389.855
epochs 3/10 [728/726 (100%)] loss: 0.1624
testing [728/726 (100%)] loss: 148.729
epochs 4/10 [728/726 (100%)] loss: 0.0772
testing [728/726 (100%)] loss: 305.008
epochs 5/10 [728/726 (100%)] loss: 0.0387
testing [728/726 (100%)] loss: 211.030
epochs 6/10 [728/726 (100%)] loss: 0.0283
testing [728/726 (100%)] loss: 174.605
epochs 7/10 [728/726 (100%)] loss: 0.0249
testing [728/726 (100%)] loss: 157.776
epochs 8/10 [728/726 (100%)] loss: 0.0234
testing [728/726 (100%)] loss: 148.920
epochs 9/10 [728/726 (100%)] loss: 0.0224
testing [728/726 (100%)] loss: 145.561
epochs 10/10 [728/726 (100%)] loss: 0.0217
testing [728/726 (100%)] loss: 144.784
Because the training loss is decreasing ,so I assume the model and the algorithm is correct.I suspect tht there’s are some issues with Batch normalization layer…If there’s information that is needed to be provided to work this out ,let me know.This really bothers me for a long time …Thanks in advance!