Using torch.save() to save big file is very slow

11129 · December 4, 2017, 1:10pm

It spent a lot of time for I to save file by torch.save().
The file size is about 3.5G, and it costs 2hr 49m totally. Is it make sense?

I preprocess the data by running preprocess.py here.
https://github.com/OpenNMT/OpenNMT-py

It doesn’t use GPU
If I use GPU, it will run faster?

richard · December 4, 2017, 3:37pm

Using the GPU probably won’t help because you’re just writing a file to disk and not performing any transformations (unless you’re transforming some data and then writing it disk, and measuring the time it takes for that?)

3.5G with 2hr 49 min is around 450 kilobytes / second, which is slow for a disk. You can check if the problem is that your disk is slow via watching the output of iotop when you’re saving.

richard · December 4, 2017, 3:50pm

Also, what version of pytorch are you using?

11129 · December 5, 2017, 12:36am

It is slow when I use pytorch 0.2.0
And I update to 0.4.0, it is also slow

11129 · December 5, 2017, 8:07am

I found that it is also slow when I process in SSD…

This is the code to process

github.com

OpenNMT/OpenNMT-py/blob/master/preprocess.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import argparse
import os
import glob
import sys

import torch

import onmt.io
import opts


def check_existing_pt_files(opt):
    # We will use glob.glob() to find sharded {train|valid}.[0-9]*.pt
    # when training, so check to avoid tampering with existing pt files
    # or mixing them up.
    for t in ['train', 'valid', 'vocab']:
        pattern = opt.save_data + '.' + t + '*.pt'

This file has been truncated. show original

11129 · December 5, 2017, 8:13am

I check the iotop.
DISK READ is about 1~2 M/s and no DISK WRITE.
I have no idea.

11129 · December 5, 2017, 8:26am

I tried running the following:

x = torch.zeros(3500000000//4)
torch.save(x, “tmp.txt”)

And it is the iotop result

time python3 tmp.py
result: 1.51s user 12.60s system 5% cpu 4:06.74 total

richard · December 5, 2017, 2:25pm

Oh okay. So it sounds like your processing code is slow? (the last example there shows that saving 3.5 gb isn’t the limiting factor)

11129 · December 5, 2017, 2:27pm

Right. It cost 4 mins to save the torch.zeros(3500000000//4)

11129 · December 5, 2017, 2:28pm

But it is weird that why it spent most of time in DISK READ when the program executes torch.save(train, open(opt.save_data + '.train.pt', 'wb'))?

11129 · December 5, 2017, 2:31pm

I guess when it executes this line(torch.save(train,...)), it will call train = onmt.IO.ONMTDataset(...) and process the data.
But I have no idea