Hello, pretty new to PyTorch. Was wondering if there were any rule-of-thumbs on picking in_channels and out_channels for class torch.nn.Conv2d.
How do these two arguments relate to the images im working with. For example if I want to create a 6 layer CNN to work 224x224 sized 3 channel images. How do I decide on these arguments for each layer?
So the hard rule is input channels = output channels of your last layer (unless you pool across channels which would be less common).
Beyond that, I’d recommend looking at e.g. the vgg models for inspiration. Usually you increase sharply from 3 (RGB) to 50 or so and then, say, double that near the pooling. But there are many philosophies about that, and you should use what best suits your needs.
Thanks for the reply @tom
Are there any best practices or good ways to decide what best suits my needs?
I don’t want to be just using whatever VGG uses cause it works without knowing why it works.
So in more detail, I’d say
- do what has worked well for experienced people - this is what using vgg-like is. Many people advocate using the weights as well (ie do finetuning)
- In his excellent fast.ai courses, Jeremy Howard recommends using a lot of capacity to overfit and then reduce from there. I can wholeheartedly recommend the course
- Experiment, experiment, experiment. People have built automatic architecture searches, that probably is because you don’t know before you tried.