Procedural generation dataset: GPU vs CPU?

The dataset is generated by simple code (for, if, random). 2d matrices of 0 and 1 for input (20 by 10) and target (10 by 2):

[ [0,1,1,...],
  [1,1,1,...], ...]

I train LSTM (number of trainable parameters = 2778240) and it finds patterns and loss (cross entropy loss) decreases.

Can you please suggest how to speed up the process?

  1. What should I choose: GPU or CPU?
  2. How to configure data loader: batch_size and num_workers? batch_size=maximum_size and num_workers=4*os.cpu_count()?
  3. How to profile and find what is a bottleneck: data generation or training?

NOTW this setup is similar to reinforcement learning when you spend time to collect data each epoch

I don’t know how complex your data are but take into account that most of the pytorch gens (torch.rand for example) can directly allocate the data into the gpu.

That way you save the time required to move data from cpu to gpu.

You may find this useful How to measure time in PyTorch
This way you can profile cpu vs gpu

1 Like

Do you mean: input = torch.randint(0,2, (20,10), device="cuda")?
My dataset is generated by CPU code: 2d matrices of 0 and 1 for input (20 by 10) and target (10 by 2). There is some logic behind: for, if,… Should I move this to GPU? I read that for simple operations CPU if better.

I mean, I cannot really reply you, just told you how to profile you to see what’s better. But for example if you generate tons of data but you have to allocate it, it’s also time demanding.