Using torch.save() to save big file is very slow

It spent a lot of time for I to save file by torch.save().
The file size is about 3.5G, and it costs 2hr 49m totally. Is it make sense?

I preprocess the data by running preprocess.py here.
https://github.com/OpenNMT/OpenNMT-py

It doesn’t use GPU
If I use GPU, it will run faster?

Using the GPU probably won’t help because you’re just writing a file to disk and not performing any transformations (unless you’re transforming some data and then writing it disk, and measuring the time it takes for that?)

3.5G with 2hr 49 min is around 450 kilobytes / second, which is slow for a disk. You can check if the problem is that your disk is slow via watching the output of iotop when you’re saving.

Also, what version of pytorch are you using?

It is slow when I use pytorch 0.2.0
And I update to 0.4.0, it is also slow

I found that it is also slow when I process in SSD…

This is the code to process

I check the iotop.
DISK READ is about 1~2 M/s and no DISK WRITE.
I have no idea.

I tried running the following:

x = torch.zeros(3500000000//4)
torch.save(x, “tmp.txt”)

And it is the iotop result

time python3 tmp.py
result: 1.51s user 12.60s system 5% cpu 4:06.74 total

Oh okay. So it sounds like your processing code is slow? (the last example there shows that saving 3.5 gb isn’t the limiting factor)

Right. It cost 4 mins to save the torch.zeros(3500000000//4)

But it is weird that why it spent most of time in DISK READ when the program executes torch.save(train, open(opt.save_data + '.train.pt', 'wb'))?

I guess when it executes this line(torch.save(train,...)), it will call train = onmt.IO.ONMTDataset(...) and process the data.
But I have no idea