I am using to A6000 x 2 GPUS. I made a gist of the code, but if prefered I can post it here. Simple Distrbuted CNN · GitHub . The goal is to create a simple CNN that can detect the illuminated light on traffic lights. I feel like the training data is not being split across the GPUs but all data is being trained on both GPUs simultaneuously, can anyone provide assistance? I am open to criticism if you have any tips on the layout or functionality of the code.
Also i feel like there could be a general rule of thumb (or at least a ‘general’ range) that an image size of (x,x) and CNN with a certain amount of conv layers (or number of paramters) will require at least X amount of GB. What Ive noticed before is that I trained a Magic the Gathering Card classifier with CNN (distrbuted with the same code and it works) and I can never seem to build over 3 layers before reaching out of memory following the 32/64 conv layer input/output format and image size 72x72 which i know is extremely small
I’m seeing a few things that seem unusual. Are you trying to train separate models or a single model? For the single model case I would expect that you would want to use DDP/Distributed Data Parallel in conjunction with the distributed sampler.
For a reference implementation using the components I would recommend checking out the ImageNet example:
Additionally the model used looks highly unusual, in particular:
which is an unusually large linear layer for a classification model.
Im trying to train a single model. I was trying to build a model resembling machine-learning-book/ch14_part2.ipynb at main · rasbt/machine-learning-book · GitHub which uses a 4 convolution layers to train a classifier. I see in the example that the linear layer is not as large as the model I created. I switched to using batchnorm instead of dropout, but Ill see why the linear layer is so large and go from there. Thanks for taking a look.