I studied a lot of forum posts on this topic, but I can not find a good answer. I can’t understand why at bacht 50, 100, 400 I get the same speed on one epoch. I value num_workers me, but was getting even more work time. I have one video card. I want to buy 2 or 4 video cards, but I don’t understand if I will have acceleration or not? What determines the speed of work? I have a lot of data and one era lasts 25 minutes, it is very long, I need to make 5000 eras at least.
Hi, if I have understood this correctly, the problem you are facing is due to your data i/o being slower than the gpu ops. This could be happening due to a variety of reasons, including expensive read operation due to on the fly decompression, slow storage device, other programs using up cpu resources. You could try to check the speed of your data i/o without passing any data to your model, to see if it stays constant across batch sizes.
Hope this gives you the right direction to solve this problem.
I read data from RAM. (dataload = np.load(‘data.npz’), dtl = torch.from_numpy(data[‘arr_0’])
I made timings on every line of my code. Time is spent only on forward pass and backward.
Since I have reinforcement learning, I do forward pass first, get action and reward, then load the data into the second dataload, then forward pass and backward again. But it bothers me that changing batch does not speed things up. I think this is related to the size of my network.