How to save trained images without a burden on network

I am training a convolutional neural network in pytorch and want to save trained images. I append each trained image in a data loader loop to save all the trained images into numpy file (train_pred in code below) and works properly. But it is a huge burden on the network and increase running time. Is there any other way to do this?

for epoch in range(epochs):
    mse_train_losses= []
    N_train = []
    
    train_pred=[]     
    model.train()
    for data in train_loader:
     
        x_train_batch, y_train_batch = data[0].to(device, 
            dtype=torch.float), data[1].to(device, dtype=torch.float)  

        y_train_pred = model(x_train_batch)          
        mse_train_loss = criterion(y_train_batch, y_train_pred, x_train_batch, mse)  
        
        optimizer.zero_grad()                   
        mse_train_loss.backward()                         
        optimizer.step()                       
        
        mse_train_losses.append(mse_train_loss.item())
       
        N_train.append(len(x_train_batch))
        
        train_pred.append(y_train_pred)
        train_pred_de=torch.stack(train_pred).cpu().detach().numpy()

Since you are appending the model output (y_train_pred) inside the training loop directly, the computation graph would also be stored, so you might want to detach() it before appending it to the list.

Also, the cpu() operation is synchronizing your code which might result in a slowdown.
There is unfortunately no workaround, since the values have to be already calculated if you want to push them to the CPU and convert them to a numpy array.