I met an interesting problem. When I ran two replicates of my model on two GPUs with CUDA_VISIBLE_DEVICES macro and initialized them with same random seed, the speed of the two training process deteriorates after several iterations.
This phenomenon appears when I use ROI pooling layer implemented by longcw’s faster RCNN.
I guess it may be related the implementation of the ROI pooling layer. When initialized with the same seed, some GPU operations will lead conflicts.