The pytorch gpu model is loaded into the CPU that do different results each time in test

spmimi · June 30, 2019, 5:28am

TEST
print(‘Loading model…’)
#-net = RetinaNet()
net = RetinaNet(backbone=cfg.backbone, num_classes=len(cfg.classes),pretrained=False)
map_location = lambda storage, loc: storage
ipdb.set_trace()
#net.load_state_dict(torch.load(’.store\ckpt6.pth’)[‘net’])
net.load_state_dict(torch.load(’.store/ckpt6.pth’,map_location=‘cpu’)[‘net’],strict=False)

print(torch.load(’.store/ckpt6.pth’,map_location=map_location)[‘lr’])
print(torch.load(’.store/ckpt6.pth’,map_location=map_location)[‘epoch’])
#z=net.named_modules()
#for x,xx in subnet4.named_parameters():

print(x,xx)

ipdb.set_trace()

subnet1=net.resnet
print(net.resnet.training)

subnet2=net.feature_pyramid
print(net.feature_pyramid.training)
subnet3=net.subnet_boxes
print(net.subnet_boxes.training)
subnet4=net.subnet_classes
print(net.subnet_classes.training)
subnet1.eval()
print(net.resnet.training)
for x in subnet1._modules:
print(subnet1._modules[x].training)

for x in subnet2._modules:
subnet2._modules[x].eval()
for x in subnet3._modules:
subnet3._modules[x].eval()
for x in subnet4._modules:
subnet4._modules[x].eval()

subnet2.eval()
print(net.feature_pyramid.training)
subnet3.eval()
print(net.subnet_boxes.training)
subnet4.eval()
print(net.subnet_classes.training)

TRAIN

net = RetinaNet(backbone=cfg.backbone, num_classes=len(cfg.classes))
net = torch.nn.DataParallel(net, device_ids=range(torch.cuda.device_count()))
net.cuda()

def save_checkpoint(loss, net, n):
global best_loss
loss /= n
if loss < best_loss:
print(‘Saving…’)
state = {
‘net’: net.state_dict(),
‘loss’: loss,
‘epoch’: epoch,
‘lr’: lr
}
# ckpt_path = os.path.join(‘ckpts’, args.exp)
ckpt_path = ‘.store’
if not os.path.isdir(ckpt_path):
os.makedirs(ckpt_path)
torch.save(state, os.path.join(ckpt_path, ‘ckpt.pth’))
best_loss = loss

spmimi · June 30, 2019, 10:41am

有人吗，中国的道友们，
Hello, Chinese friends, my English is not good

spmimi · June 30, 2019, 12:53pm

please, help me,thank you!!!

ptrblck · June 30, 2019, 2:00pm

After creating an instance of your model, you can just call model.eval() and all submodules will also be set to eval mode, so you don’t need to call .eval() on each submodule separately.
Could you post a link to the model implementation?

PS: If you would like to post code snippets, you can simply wrap it in three backticks ```

spmimi · June 30, 2019, 3:03pm

TEST
import torch
import torchvision.transforms as transforms
import exps.voc.config2 as cfg
from torch.autograd import Variable
import pandas as pd
from retinanet import RetinaNet
from encoder import DataEncoder
from PIL import Image, ImageDraw
try:
import ipdb
except:
import pdb as ipdb

print(‘Loading model…’)

net = RetinaNet(backbone=cfg.backbone, num_classes=len(cfg.classes),pretrained=False)
map_location = lambda storage, loc: storage
net.load_state_dict(torch.load(’.store/ckpt6.pth’,map_location=‘cpu’)[‘net’],strict=False)
print(torch.load(’.store/ckpt6.pth’,map_location=map_location)[‘lr’])
print(torch.load(’.store/ckpt6.pth’,map_location=map_location)[‘epoch’])
net.eval()
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.485,0.456,0.406), (0.229,0.224,0.225))
])
print(‘Loading image…’)
img = Image.open(’./image/0.jpg’)
w =img.size[0]
h=img.size[1]
img = img.resize((w,h))
x = transform(img)
x = x.unsqueeze(0)
x = Variable(x, volatile=True)
loc_preds, cls_preds = net(x)

encoder = DataEncoder()

cls_preds1=cls_preds.data.squeeze()
score, labels = cls_preds1.max(1)
xxx = pd.DataFrame(score.tolist())
print(xxx.describe())
yyy=pd.DataFrame(labels.tolist())
print(yyy.describe())
yy=labels>0
print(yy.numel())
print(yy.data.long().sum())
FIRST

count 56088.000000
mean 0.014430
std 0.033506
min -0.333214
25% -0.000789
50% 0.010297
75% 0.027038
max 0.603817
0
count 56088.000000
mean 0.770807
std 0.420318
min 0.000000
25% 1.000000
50% 1.000000
75% 1.000000
max 1.000000
56088
tensor(43233)
SECOND
0
count 56088.000000
mean 0.022885
std 0.063939
min -0.526022
25% -0.011785
50% 0.007780
75% 0.041688
max 0.933956
0
count 56088.000000
mean 0.550260
std 0.497472
min 0.000000
25% 0.000000
50% 1.000000
75% 1.000000
max 1.000000
56088
tensor(30863)

spmimi · June 30, 2019, 3:07pm

The above is what I modified according to what you said. This is a target detection binary task to distinguish background and foreground. I used retinanet network structure. The score and labels for cls_preds change each time you tes

ptrblck · June 30, 2019, 3:46pm

Could you post a link to the implementation you are using for RetinaNet, as I would like to run the code.

spmimi · July 1, 2019, 9:33am

spmimi · July 1, 2019, 9:36am

https://github.com/jiaojiechu/retinanet1.git this is my code

spmimi · July 2, 2019, 1:39am

Does anyone have a similar problem，自己顶

ptrblck · July 2, 2019, 10:58am

Thanks for the code!
After removing the hard coded checkpoint paths, I get the same outputs for the model in eval mdoe:

net = RetinaNet(backbone='resnet34', num_classes=10, pretrained=False)
net.eval()
x = torch.randn(2, 3, 224, 224)
output1a, output1b = net(x)
output2a, output2b = net(x)

print((output1a==output2a).all())
> tensor(1, dtype=torch.uint8)
print((output1b==output2b).all())
> tensor(1, dtype=torch.uint8)

spmimi · July 2, 2019, 12:58pm

Thank you, but I don’t understand what you mean. Is there something wrong with the model I saved, or is my loading method inappropriate?

ptrblck · July 2, 2019, 1:08pm

I’m not sure how you’ve saved the model, but you’ve hardcoded the checkpoint paths here, which are used, even if pretrained=False is passed, so I had to remove them.

After doing so, you’ll see that the outputs are consistent.
I’m also not sure, how you’ve created the output for FIRST and SECOND, but apparently something between these calls was changed.

spmimi · July 2, 2019, 2:54pm

I think that my problems is like

can you see that? thanks

spmimi · July 2, 2019, 2:56pm

I think that my problems is like

can you see that? thanks

spmimi · July 3, 2019, 1:24am

I found ， the data was consistent on the same run, but then changed again