Deepali
(Deepali Patel)
March 12, 2019, 6:22am
1
I am executing resnet50_trainer
example on MNIST dataset:
The following command works:
python resnet50_trainer.py --train_data ~/mnist_train_lmdb --num_gpus 4 --batch_size 64
but adding num_channels
leads to error:
python resnet50_trainer.py --train_data ~/mnist_train_lmdb --num_gpus 4 --batch_size 64 --num_channels 1
...
...
INFO:ResNe(X)t_trainer:Starting epoch 0/1
[E net_async_base.cc:377] [enforce fail at conv_op_cudnn.cc:555] filter.dim32(1) == C / group_. 1 vs 3
Error from operator:
input: "gpu_1/data" input: "gpu_1/conv1_w" output: "gpu_1/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "order" s: "NCHW" } arg { name: "enable_tensor_core" i: 0 } arg { name: "stride" i: 2 } arg { name: "pad" i: 3 } arg { name: "exhaustive_search" i: 1 } arg { name: "ws_nbytes_limit" i: 67108864 } device_option { device_type: 1 device_id: 1 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*)```
Oli
(Olof Harrysson)
March 12, 2019, 7:03am
2
Can you link to the source code? Perhaps the first convlayer expects the input to have 3 channels
Deepali
(Deepali Patel)
March 12, 2019, 7:15am
3
Yes, am guessing the same, but not sure where exactly is the issue.
The code snippet where it fails is:
N = X.dim32(0);
C = X.dim32(1);
H = X.dim32(2);
W = X.dim() > 3 ? X.dim32(3) : 1;
D = X.dim() > 4 ? X.dim32(4) : 1;
H_out = Y->dim32(2);
W_out = Y->dim() > 3 ? Y->dim32(3) : 1;
D_out = Y->dim() > 4 ? Y->dim32(4) : 1;
CAFFE_ENFORCE_EQ(filter.dim32(1), C / group_); //fails here
for (int i = 0; i < kernel_.size(); ++i) {
CAFFE_ENFORCE_EQ(filter.dim32(i + 2), kernel_[i]);
}
Oli
(Olof Harrysson)
March 12, 2019, 7:39am
4
Ouch, can’t help you with caffe. But on the line where it crashes, it does say enforce_equal which is like a python assert I’m guessing. Try printing those two values beforehand to see what the values are and maybe that helps you
Deepali
(Deepali Patel)
March 13, 2019, 7:11am
5
Sure will check. Thanks for the help.