Converting tensorflow model to pytorch

Greetings,

My data consists of time-series samples with 100 steps, each containing 2 features. In other words, my data is shaped as (samples, steps, features).

The model I’m currently implementing works in TensorFlow, but I’m having trouble properly implementing it in PyTorch.

class KnownDetector(Model):
    def __init__(self):
        super(KnownDetector, self).__init__()
        self.TCP = tf.keras.Sequential([
            layers.Conv1D(filters=32, kernel_size=3, activation="relu", input_shape=(100, 2)),  # 100 packets, 2 features
            layers.MaxPool1D(pool_size=3, padding='same'),
            layers.Conv1D(filters=32, kernel_size=3, activation="relu"),  # 100 packets, 2 features
            layers.MaxPool1D(pool_size=3, padding='same'),
            layers.Conv1D(filters=32, kernel_size=3, activation="relu"),  # 100 packets, 2 features
            layers.Flatten(),
            layers.Dense(128),
            layers.Dense(num_classes, activation='softmax') # num_classes = 36 in this example
        ])

    def call(self, x):
      return self.TCP(x)
fx = KnownDetector()
fx.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), optimizer='adam', metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
fx.fit(train_data, train_labels, epochs=30, validation_data=(val_data, val_labels))

My understanding is that the input should be reshaped from (steps, input_dim) to (input_dim, steps).

PyTorch equivalent:

class ModelKnown(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.TCP = torch.nn.Sequential(torch.nn.Conv1d(in_channels=2, out_channels=32, kernel_size=3),
                                       torch.nn.ReLU(),
                                       torch.nn.MaxPool1d(kernel_size=3),
                                       torch.nn.Conv1d(in_channels=32, out_channels=32, kernel_size=3),
                                       torch.nn.ReLU(),
                                       torch.nn.MaxPool1d(kernel_size=3),
                                       torch.nn.Conv1d(in_channels=32, out_channels=32, kernel_size=3),
                                       torch.nn.Flatten(),
                                       torch.nn.Linear(in_features=256, out_features=128),
                                       torch.nn.Linear(in_features=128, out_features=36),
                                       torch.nn.Softmax(1))
    def forward(self, x):
        return self.TCP(x)

PyTorch doesn’t have a compile the same way TF does, so I’ve been doing my best following pytorch documentation:

x = torch.from_numpy(data)
y = torch.from_numpy(labels.to_numpy())
x.requires_grad=True

# Construct our model by instantiating the class defined above
model = ModelKnown()

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-6)
for t in range(20):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)
    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

This outputs:

0 3.5832407474517822
1 3.5832395553588867
... # and so on
  1. What am I doing incorrectly here?
  2. In TensorFlow, I’m passing data for evaluation, is there a way to do so for PyTorch?

Thanks in advance for any help!

You are missing an nn.ReLU() module in PyTorch after the last conv layer and should also remove the nn.Softmax(1) layer as nn.CrossEntropyLoss expects raw logits and will internally apply F.log_Softmax and F.nll_loss.

Thanks! I added the missing reLU layer and removed the softmax.
If it does the loss internally is there no point computing the loss? I’m a bit confused there. I’ve changed the classes to only 8 to see if the results are accurate(should >95%)

Model looks like this:

class ModelKnown(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.TCP = torch.nn.Sequential(torch.nn.Conv1d(in_channels=2, out_channels=32, kernel_size=3),
                                       torch.nn.ReLU(),
                                       torch.nn.MaxPool1d(kernel_size=3),
                                       torch.nn.Conv1d(in_channels=32, out_channels=32, kernel_size=3),
                                       torch.nn.ReLU(),
                                       torch.nn.MaxPool1d(kernel_size=3),
                                       torch.nn.Conv1d(in_channels=32, out_channels=32, kernel_size=3),
                                       torch.nn.ReLU(),
                                       torch.nn.Flatten(),
                                       torch.nn.Linear(in_features=256, out_features=128),
                                       torch.nn.Linear(in_features=128, out_features=8))

    def forward(self, x):
        return self.TCP(x)
import torch.optim as optim

model = ModelKnown()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

x = torch.from_numpy(data)
y = torch.from_numpy(labels.to_numpy())
# x.requires_grad=True

for epoch in range(5):  # loop over the dataset multiple times

    optimizer.zero_grad()
    outputs = model(x)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()

I’m fairly certain I’m doing something wrong, as I’m not getting accurate results, and my softmax values are negative?

preds = model(x)
preds
tensor([[-0.5422, -0.6145,  0.7050,  ...,  0.4405, -0.3194, -0.3350],
        [-0.5422, -0.6145,  0.7050,  ...,  0.4405, -0.3194, -0.3350],
        [-0.5422, -0.6145,  0.7050,  ...,  0.4405, -0.3194, -0.3350],
        ...,

No, nn.CrossEntropyLoss uses F.log_softmax and F.nll_loss internally and thus calculates the loss in:

criterion = nn.CrossEntropyLoss()
...
loss = criterion(output, target)

However, it expects logits (unbound values) instead of probabilities (outputs of softmax) since the log probabilities will be calculated internally via F.log_softmax.

These are not softmax outputs (probabilities), but logits and are expected to have postive as well as negative values.

You are also using a very high learning rate, so I would probably lower it and use the same as you are using in TF/Keras.

As I understood it, logits are the raw predictions that are fed into the softmax function which produces the probabilities for each class - if so, why is the final output of the model logits?

When you say F.log_softmax is calculated internally, is that done automatically?

I would really appreciate it if you could give me a working example, following the docs for a classifier, they did calculate the criterion & loss in the same fashion as I did while also using cross-entropy.

Initially, I expected model.train() to train the model, but given that it is done manually I’m having difficulties understanding the process.

Edit: I lowered my learning rate.

Your learning rate is high.
TF/keras use learning 0.001 as default.
You should change learning rate.

I definitely did, I was just experimenting at the time - changed it.

You can pick between these approaches:

  • model output as raw logits + nn.CrossEntropyLoss
  • model output as log probabilities (via nn.LogSoftmax or F.log_sotmax) + nn.NLLLoss
  • numerically unstable approach by applying torch.log on probabilities (via torch.log(F.softmax(output, dim=1)))

Here is a code snippet for the valid approaches (I skipped the numerically unstable approach, as you shouldn’t use it):

# setup
batch_size = 2
nb_classes = 10
output = torch.randn(batch_size, nb_classes) # initialize as logits
target = torch.randint(0, nb_classes, (batch_size,))

# nn.CrossEntropyLoss with logits
criterion = nn.CrossEntropyLoss()
# pass raw logits to the criterion
loss_ce = criterion(output, target)

# nn.NLLLoss with log probabilities
criterion = nn.NLLLoss()
log_probs = F.log_softmax(output, dim=1)
loss_nll = criterion(log_probs, target)

# same but with functional API, which is applied in `nn.CrossEntropyLoss` internally
loss_nll_func = F.nll_loss(F.log_softmax(output, dim=1), target)

# compare
print((loss_ce - loss_nll).abs().max())
> tensor(0.)

print((loss_ce - loss_nll_func).abs().max())
> tensor(0.)

As you can see, the different is zero since all approaches are using the same operations.
Here you can also see the code showing that nn.CrossEntropyLoss is using F.nll_loss(F.log_softmax(...)) internally (the names are difference as it’s written in the C++ backend:

    ret = at::nll_loss_nd(
        at::log_softmax(self, 1, self.scalar_type()),
        target,
        weight,
        reduction,
        ignore_index);

Yes, F.log_softmax is applied automatically in nn.CrossEntropyLoss on the model outputs.

The linked example is correct as it returns the logits from the model (output of the last linear layer without any activation function applied on it) and passes them to nn.CrossEntropyLoss.
Your first example used nn.Softmax as the last activation function, which was wrong.

1 Like