I’m stuck with one very strange problem: I work with recently released Scene Graph Benchmark and made it train on GQA, but I have one issue with that. It trains as expected when I use the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" (it uses
batch_size = 2). But when I decide to use another command with torch.distributed.launch (
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 10025 --nproc_per_node=1 tools/relation_train_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml") I’m getting
RuntimeError: CUDA out of memory( it still has
batch_size = 2). Initially I wanted to train it on 4 GPUs with
batch_size = 8, but figured out about this problem. What can be the problem? And what should I do in order to properly train it on 4 GPUs?
My set up includes 4 2080ti so It has a plenty of memory.