Procedural generation dataset: GPU vs CPU?

odats · August 12, 2020, 12:18pm

The dataset is generated by simple code (for, if, random). 2d matrices of 0 and 1 for input (20 by 10) and target (10 by 2):

[ [0,1,1,...],
  [1,1,1,...], ...]

I train LSTM (number of trainable parameters = 2778240) and it finds patterns and loss (cross entropy loss) decreases.

Can you please suggest how to speed up the process?

What should I choose: GPU or CPU?
How to configure data loader: batch_size and num_workers? batch_size=maximum_size and num_workers=4*os.cpu_count()?
How to profile and find what is a bottleneck: data generation or training?

NOTW this setup is similar to reinforcement learning when you spend time to collect data each epoch

JuanFMontesinos · August 12, 2020, 1:28pm

Hi,
I don’t know how complex your data are but take into account that most of the pytorch gens (torch.rand for example) can directly allocate the data into the gpu.

That way you save the time required to move data from cpu to gpu.

You may find this useful How to measure time in PyTorch
This way you can profile cpu vs gpu

odats · August 12, 2020, 2:33pm

Do you mean: input = torch.randint(0,2, (20,10), device="cuda")?
My dataset is generated by CPU code: 2d matrices of 0 and 1 for input (20 by 10) and target (10 by 2). There is some logic behind: for, if,… Should I move this to GPU? I read that for simple operations CPU if better.

JuanFMontesinos · August 13, 2020, 10:13am

I mean, I cannot really reply you, just told you how to profile you to see what’s better. But for example if you generate tons of data but you have to allocate it, it’s also time demanding.