Pytorch reading tensors from file of tensors

I have some really big input tensors and I was running into memory issues while building them, so I read them one by one into a .pt file. As I run the script that generates and saves the file, the file gets bigger and bigger, so I am assuming that the tensors are saving correctly. Here is that code:

with open(a_sync_save, "ab") as f:
     print("saved"), dim=0), dim=0), f)

I want to read a certain amount of these tensors from the file at a time, because I do not want to run into a memory issue again. When I try to read each tensor saved to the file I can only manage to get the first tensor.

with open(a_sync_save, "rb") as f:
    for tensor in torch.load(f):

The output here is the shape of the first tensor, then quits peacefully.


This should work yes.
But this is not used very often and it appears that we broke it in some very recent versions. The issue tracking the fix is here:

1 Like

I will post my solution for this in two days or so when I get it done, but it will involve pickling the tensors and then just reading them into a batch then running through my model. Saving tensors to a pickle file does bring up a warning that pickle is not going to be supported in 1.5.

def stream_training(filepath, epochs=100):
    :param filepath: file path of pkl file
    :param epochs: number of epochs to run
    def training(train_dataloader, model_obj, criterion, optimizer):
        for j, data in enumerate(train_dataloader, start=0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data
            inputs, labels = inputs.cuda(), labels.cuda()
            outputs = model_obj(inputs.float())
            outputs = torch.flatten(outputs)
            loss = criterion(outputs, labels.float())
            # zero the parameter gradients
            torch.nn.utils.clip_grad_norm_(model_obj.parameters(), max_norm=1)

    tensors = []
    expected_values = []
    model= Model(1000, 1, 256, 1)
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.00001, betas=(0.9, 0.99999), eps=1e-08, weight_decay=0.001,
    for i in range(epochs):
        with (open(filepath, 'rb')) as openfile:
            while True:
                    data_list = pickle.load(openfile)
                    if len(tensors) % BATCH_SIZE == 0:
                        tensors =, dim=0)
                        tensors = torch.reshape(tensors, (tensors.shape[0], tensors.shape[1], -1))
                        print("dataset_shape", tensors.shape)
                        train_loader = make_dataset(tensors, expected_values) # makes a dataloader for the batch that comes in
                        training(train_loader, model, criterion, optimizer)  #Performs forward and back prop
                        tensors = [] # washes out the batch to conserve memory on my computer.
                        expected_values = []
                except EOFError:
                    print("This file has finished training")

If you are interested in my model here it is. Feel free to tell me what I’ve done wrong. I have been transplanted into pytorch.

class Model(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(Model, self).__init__()
        # dimensions
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        #Define the layers
        self.gru = nn.GRU(input_size, hidden_dim, n_layers, batch_first=True)
        self.fc1 = nn.Linear(hidden_dim, hidden_dim)
        self.bn1 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.bn2 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        self.bn3 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc4 = nn.Linear(hidden_dim, hidden_dim)
        self.bn4 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc5 = nn.Linear(hidden_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_size)

    def forward(self, x):
        x = x.float()
        x = F.relu(self.gru(x)[1])
        x = x[-1,:,:] # eliminates first dim
        x = F.dropout(x, 0.5)
        x = F.relu(self.bn1(self.fc1(x)))
        x = F.dropout(x, 0.5)
        x = F.relu(self.bn2(self.fc2(x)))
        x = F.dropout(x, 0.5)
        x = F.relu(self.bn3(self.fc3(x)))
        x = F.dropout(x, 0.5)
        x = F.relu(self.bn4(self.fc4(x)))
        x = F.dropout(x, 0.5)
        x = F.relu(self.fc5(x))
        print(" ")
        return torch.sigmoid(self.output(x))

    def init_hidden(self, batch_size):
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
        return hidden

The model gets a new data loader for every batch. Do I have to consider any continuation training? Because I have no code that is considering any continuation training.

1 Like


The training code looks good to me. You might want to use and torch.load to avoid the warning but it should work otherwise.

1 Like

Am I totally missing the mark? Is there already some sort of pytorch item that does something like this but more efficiently? My fear is that every time a batch finishes the model doesn’t “remember” the last batches training because this is not a traditional dataset. Is this fear real?

Ho sorry I though you were just asking about the loading of the data from disk.

For the learning, then it is hard to say.
If your model sees each sample only once, it might overfit on the last samples it has seen.
I am not a specialist of that domain though and you most likely want to see the literature on this kind of learning and which tricks you can use to ensure things are not “forgotten”.

1 Like

hmm, okay I guess the question that I have formed here is:

If I have a data set that is so big that I cannot load it onto my machines RAM, can I stream it batch by batch to a model object as if the stream were all one dataloader in RAM? Or do models need to be trained off of one giant dataloader?

I’ll go look around.

Ho for that, the answer is definitely yes for the first and no for the second.
Our dataloaders under the hood do the exact same thing of loading things from the disk actually in some cases.

So the fact that memory is in ram or is read on the fly does not change at all how the training is going to behave.

1 Like

Great that answers my question. Thanks for your time! At a glance, does anything look wrong with my model declaration? If you don’t have the time to look no worries!

From a quick look, it looks fine in terms of have the batchnorm/non linearity at the right place.

But for depth/size of the different layers, that would be very task specific so I don’t know :smiley:

1 Like

Right, awesome thank you!