How to save nn.Sequential as a model?

Fernando_Gallego · July 14, 2020, 8:39am

Hi, I am trying to decompose ResNet into three different devices, for this, I would need to be able to save their nn.sequential as a separate model. Training them all together but being able to load their models separately on each device.

Do any of you know how to save nn.sequential as a model? I 'm working as every morning

iffiX · July 14, 2020, 11:07am

if you won’t change your model-device mapping, you can just save your model directly using t.save and load it with t.load.
If you really want to save nn.sequential, you can also save it direcly using t.save and load it with t.load.

Fernando_Gallego · July 15, 2020, 8:00am

But if, for instance, I have a neural network with a structure like this:

layers1 = nn.sequential(…)
layers2 = nn.sequential(…)
layers3 = nn.sequential(…)

And I would like to save the model so that device1 loads layers1, device2 layers2…

iffiX · July 15, 2020, 8:07am

for example, suppose you initialize a model like:

class YourModel(nn.Module):
    def __init__(self, dev_list=["cpu", "cuda:1", "cuda:0"]):
        self.fc1 = nn.sequential(nn.Linear(5, 5).to(dev_list[0]))  # on device cpu
        self.fc2 = nn.sequential(nn.Linear(5, 5).to(dev_list[1]))  # on device "cuda:1"
        self.fc3 = nn.sequential(nn.Linear(5, 5).to(dev_list[2]))  # on device "cuda:0"
        self.dev = dev_list

    def forward(x):
        x = self.fc1(x).to(self.dev[1])
        x = self.fc2(x).to(self.dev[1])
        return self.fc3(x).to("cpu")

then:

t.save(YourModel(), "model.pt")
model = t.load("model.pt")

Device mapping will be saved along with your model, don’t worry about it

Fernando_Gallego · July 15, 2020, 2:45pm

So, if I wanted to map only fc1 on device 1, could I select it on the load?

iffiX · July 15, 2020, 2:51pm

See https://pytorch.org/docs/stable/torch.html?highlight=load#torch.load, especially the map_location part

Fernando_Gallego · July 15, 2020, 3:43pm

But in this solution, the devices must be connected and assigned. My goal is to deploy the neural network, in a distributed way, decomposing each sequential in different devices and sending the inference through the network.

iffiX · July 15, 2020, 3:54pm

Could you please clarify your design a liitle bit more? your description of “distributed” and “decomposing” is pretty vague.

Is it a multi-process application? How do you map your gpus to your processes? How do you map your model to your devices? I am really sorry that I cannot help you more if you don’t give a clear definition of the architecture you would like to achieve. It would be better if you can draw a graph or show your code.

If you just want to split layer1, layer2 and layer3 to different devices, you can simply save them individually with torch.save and torch.load, torch will take the charge to pickle whatever passed to it, including parameters and your custom attributes such as the ones set by self.some_attr in __init__.

Fernando_Gallego · July 15, 2020, 4:02pm

It’s an architecture, fog, edge and cloud. Based on the example above:

class YourModel(nn.Module):
def init(self):
self.fc1 = nn.sequential()
self.fc2 = nn.sequential()
self.fc3 = nn.sequential()
I intend to save fc1, fc2 and fc3 separately. In this way I could make a first prediction in the device that has the model fc1 and sending the inference to the second device, make the next prediction with greater accuracy. The third one would work in the same way.

iffiX · July 15, 2020, 4:28pm

I see, one more question, will you move the model around, such as on a different machine with different gpu number, or are you loading the whole model on the same devices?

If you don’t, and you really want to save them seperately to different files, maybe for better inspection or archive perpose, then:

def save(your_model):
    torch.save(your_model.fc1, "fc1.pt")
    torch.save(your_model.fc2, "fc2.pt")
    torch.save(your_model.fc3, "fc3.pt")

If you do, then you will have to decide which device each part of your model would locate on, eg: suppose on your training machine you have 3 gpus, and on your inference machine you have 1 gpu.
def save(your_model):

def save(your_model):
    torch.save(your_model.fc1, "fc1.pt")
    torch.save(your_model.fc2, "fc2.pt")
    torch.save(your_model.fc3, "fc3.pt")

def map(your_model):
    your_model.fc1 = torch.load("fc1.pt", map_location=torch.device('cuda:0'))
    your_model.fc2 = torch.load("fc2.pt", map_location=torch.device('cuda:0'))
    your_model.fc3 = torch.load("fc3.pt", map_location=torch.device('cuda:0'))

by the way,

Maybe you have some wrong idea, there is not such a “connected device” concept in pytorch, you can perform a complex forward() operation or a simple add() operation on some input x locating on device cuda:[number] or cpu simply because the operands (tensors) locates on the same device, if torch needs to fetch it somewhere else, it will complain and throw an error.

About saving the model

There are many ways to save your model, typically you will want to save the OrderedDict returned by model.state_dict(), the keys are your parameter names such as “linear.weight” or “linear.bias”, and values are nn.Parameter, its .data attribute is just a Tensor. You may load a state dict into your model like:

def prep_load_state_dict(model: nn.Module,
                         state_dict: Any):
    """
    Automatically load a **loaded state dictionary**

    Note:
        This function handles tensor device remapping.
    """
    for name, param in model.named_parameters():
        state_dict[name].to(param.device)
    model.load_state_dict(state_dict)

About torch.save and torch.load

If you know the pickle concept in python, then you will get what torch.save does. pickle serialize a object into binary string:

buffer = io.BytesIO()
t.save(t.zeros([5]), buffer)
print(buffer.getvalue())

will yield:

b'\x80\x02\x8a\nl\xfc\x9cF\xf9 j\xa8P\x19.\x80\x02M\xe9\x03.\x80\x02}q\x00(X\n\x00\x00\x00type_sizesq\x01}q\x02(X\x03\x00\x00\x00intq\x03K\x04X\x04\x00\x00\x00longq\x04K\x04X\x05\x00\x00\x00shortq\x......

you can serialize whatever you like into this, cuda tensor will essentially be saved as “raw data” + “device descriptor cuda:0”.

Fernando_Gallego · July 15, 2020, 4:34pm

Thanks a lot, I think the solution were:

def save(your_model):
torch.save(your_model.fc1, “fc1.pt”)
torch.save(your_model.fc2, “fc2.pt”)
torch.save(your_model.fc3, “fc3.pt”)

I’ll try then.

iffiX · July 15, 2020, 4:39pm

Great! post your issues if you have any

Brando_Miranda · July 15, 2020, 7:10pm

I’m also trying to do something similar but in my scenario I construct a whole model using only nn.Sequential and then I just want to save it. I don’t have a class defined for it so something like https://stackoverflow.com/questions/42703500/best-way-to-save-a-trained-model-in-pytorch won’t work for me.

My current attempt uses pickle but I keep getting warning for using pickle:

FutureWarning: pickle support for Storage will be removed in 1.5. Use `torch.save` instead
  warnings.warn("pickle support for Storage will be removed in 1.5. Use `torch.save` instead", FutureWarning)

I think they just want us to use torch.save and torch.load. I stopped getting warning when I did that.

My (full) code:


# creating data and running through a nn and saving it

import torch
import torch.nn as nn

from pathlib import Path
from collections import OrderedDict

import numpy as np

import pickle

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

num_samples = 3
Din, Dout = 1, 1
lb, ub = -1, 1

x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples, Din))

f = nn.Sequential(OrderedDict([
    ('f1', nn.Linear(Din,Dout)),
    ('out', nn.SELU())
]))
y = f(x)

# save data torch to numpy
x_np, y_np = x.detach().cpu().numpy(), y.detach().cpu().numpy()
np.savez(path / 'db', x=x_np, y=y_np)

print(x_np)
# save model
with open('db_saving_seq', 'wb') as file:
    pickle.dump({'f': f}, file)

# load model
with open('db_saving_seq', 'rb') as file:
    db = pickle.load(file)
    f2 = db['f']

# test that it outputs the right thing
y2 = f2(x)

y_eq_y2 = y == y2
print(y_eq_y2)

db2 = {'f': f, 'x': x, 'y': y}
torch.save(db2, path / 'db_f_x_y')

print('Done')

db3 = torch.load(path / 'db_f_x_y')
f3 = db3['f']
x3 = db3['x']
y3 = db3['y']
yy3 = f3(x3)

y_eq_y3 = y == y3
print(y_eq_y3)

y_eq_yy3 = y == yy3
print(y_eq_yy3)

did you try that? Is there a reason why that’s not enough for you?

cross-posted: https://stackoverflow.com/questions/62923052/how-does-one-save-torch-nn-sequential-models-in-pytorch-properly