Hi,
I don’t know how complex your data are but take into account that most of the pytorch gens (torch.rand for example) can directly allocate the data into the gpu.
That way you save the time required to move data from cpu to gpu.
Do you mean: input = torch.randint(0,2, (20,10), device="cuda")?
My dataset is generated by CPU code: 2d matrices of 0 and 1 for input (20 by 10) and target (10 by 2). There is some logic behind: for, if,… Should I move this to GPU? I read that for simple operations CPU if better.
I mean, I cannot really reply you, just told you how to profile you to see what’s better. But for example if you generate tons of data but you have to allocate it, it’s also time demanding.