Hello,
I’m trying to retrain last layer of Squeezenet model downloaded from caffe2’s github model-zoo repo. Now I’m in phase, where I can retrain this model using CPU… (mainly by following this tutorial… but without GPU… Now I want to move on and train the model on GPU. There are some issues I have come across… First, I was not able to set up DeviceOption
(from caffe2_pb2
namespace) becouse
caffe2_pb2.DeviceOption
doesnt have cuda_gpu_id
prop (I found in pytorch/caffe2/proto/caffe2.proto
PB definition that DeviceOption
has device_id
) but I later then switched to core.DeviceOption
but I dont know if its wrong or not… But after this another Error showed up… (or rather some errors were randomly showing). 1. error is problem with protobuf structure of model.net
… CHECK failed: (index) < (current_size_)
or Segmentation fault (core dumped)
I have been playing with use of with core.DeviceScope
statements which were wrapping NetDef
's initialization and last layer’s reinitialization and I tried to insert DeviceOption
to NetDef.device_option.CopyFrom
and all that in combination with model.RunAllOnGpu
.
But with no luck.
Is there any recommended aproach for this?
Thanks in advance.
Code:
# classCount - number of labels
def LoadAndTranslateSqueezenetModelv2(name,
lmdbPath, classCount, batchSize, imageDimension,
initNetPath, predictNetPath, deviceOpts,
learningRate=10**-2):
# with core.DeviceScope(deviceOpts):
model = model_helper.ModelHelper(name, arg_scope={
'order': 'NCHW',
'use_cudnn': True
})
predNetPb = caffe2_pb2.NetDef()
with open(predictNetPath, 'rb') as f:
predNetPb.ParseFromString(f.read())
initNetPb = caffe2_pb2.NetDef()
with open(initNetPath, 'rb') as f:
initNetPb.ParseFromString(f.read())
# model.RunAllOnGPU()
for op in initNetPb.op:
if op.output[0] in ['conv10_w', 'conv10_b']:
tag = (ParameterTags.WEIGHT if op.output[0].endswith('_w') else ParameterTags.BIAS)
# create params inside model
model.create_param(op.output[0], op.arg[0], initializers.ExternalInitializer(), tags=tag)
# remove conv10_w and conv10_b ops from protobuf - ids -> 50,51
# these ops were added to the model in for loop above (cannot add them again)
initNetPb.op.pop(50)
initNetPb.op.pop(50)
model.param_init_net = core.Net(initNetPb)
model.param_init_net.XavierFill([], 'conv10_w', shape=[classCount, 512, 1, 1])
model.param_init_net.ConstantFill([], 'conv10_b', shape=[classCount])
model.net = core.Net(predNetPb)
model.Squeeze("softmaxout", "softmax", dims=[2, 3])
# creates x-entropy, avarage-loss, builds sgd for every param of model
ScaffoldModelTrainingOperatorsSqueezenet(model, 'softmax', 'label', 0.1)
# lines like model.net.Proto().device_option.CopyFrom and reassigning it back to model..
# for param_init_net and net
# InscribeDeviceOptionsToModel(model, deviceOpts)
return model