Resnet50_trainer example with num_channels

I am executing resnet50_trainer example on MNIST dataset:

The following command works:
python resnet50_trainer.py --train_data ~/mnist_train_lmdb --num_gpus 4 --batch_size 64

but adding num_channels leads to error:

python resnet50_trainer.py --train_data ~/mnist_train_lmdb --num_gpus 4 --batch_size 64 --num_channels 1
...
...
INFO:ResNe(X)t_trainer:Starting epoch 0/1
[E net_async_base.cc:377] [enforce fail at conv_op_cudnn.cc:555] filter.dim32(1) == C / group_. 1 vs 3
Error from operator:
input: "gpu_1/data" input: "gpu_1/conv1_w" output: "gpu_1/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "order" s: "NCHW" } arg { name: "enable_tensor_core" i: 0 } arg { name: "stride" i: 2 } arg { name: "pad" i: 3 } arg { name: "exhaustive_search" i: 1 } arg { name: "ws_nbytes_limit" i: 67108864 } device_option { device_type: 1 device_id: 1 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*)```

Can you link to the source code? Perhaps the first convlayer expects the input to have 3 channels

Yes, am guessing the same, but not sure where exactly is the issue.

The code snippet where it fails is:

      N = X.dim32(0);
      C = X.dim32(1);
      H = X.dim32(2);
      W = X.dim() > 3 ? X.dim32(3) : 1;
      D = X.dim() > 4 ? X.dim32(4) : 1;
      H_out = Y->dim32(2);
      W_out = Y->dim() > 3 ? Y->dim32(3) : 1;
      D_out = Y->dim() > 4 ? Y->dim32(4) : 1;
      CAFFE_ENFORCE_EQ(filter.dim32(1), C / group_);  //fails here
      for (int i = 0; i < kernel_.size(); ++i) {
        CAFFE_ENFORCE_EQ(filter.dim32(i + 2), kernel_[i]);
      }

Ouch, can’t help you with caffe. But on the line where it crashes, it does say enforce_equal which is like a python assert I’m guessing. Try printing those two values beforehand to see what the values are and maybe that helps you

Sure will check. Thanks for the help.