Using linear layers? New user transfering from keras

Hello,

I found in keras a nice multilayer perceptron of the form

model.add(Dense(512, input_shape=(784,)))
model.add(Activation(‘tanh’))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation(‘linear’))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation(‘sigmoid’))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation(‘linear’))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation(‘softmax’))

I was not sure how to do the linear layers in pytorch, trying to mimic the tutorial I have

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.hidden = nn.Linear(784,512)
self.hidden2 = nn.Linear(512,512)
self.hidden3 = nn.Linear(512,10)
self.out = nn.Linear(10,1)

def forward(self, x):
    x = F.tanh(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.sigmoid(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.softmax(self.hidden(x))
    x = self.out(x) 

def num_flat_features(self, x):
    size = x.size()[1:]  # all dimensions except the batch dimension
    num_features = 1
    for s in size:
        num_features *= s
    return num_features

But,

  1. How do you create the purely linear layers?
  2. As follow up nn itself has linear and nonlinear methods, but to do softmax etc I still use linear (per the example), what is the nonlinear layer functionality for?
  3. Can someone describe the purpose/point of the num_flat_features function?
  4. Am I settuping up the dropout right?

My apologies if these are basic questions, but I couldn’t quite find the right examples. In this case I’m just trying to create a multilayer perceptron with many nonlinear layers, so sometimes I am not sure if some of the functionality in the examples is for convolutional or more complicated nets specifically,

The basic building blocks of deep networks are of the form: Linear layer + Point-wise non-linearity / activation.
Keras rolls these two into one, called “Dense.”
(I’m not sure why the Keras example you have follows Dense with another activation, that doesn’t make sense to me.)
To make a simple multi-layer perception in PyTorch you should stack nn.Linear (a simple linear layer that computes w^Tx + b) and nn.ReLU.
If you’d like a softmax followed by cross entropy loss at the end, you can use CrossEntropyLoss (which performs the softmax and the loss in one function for numerical reasons).

Thank you for responding

In my set up I would like a set of linearities with a nonlinear but continuously differentiable activation function, so
layer 1: sigmoid(w_1^T x+b_1)
layer 2: softmax(w^T_2 y_1 +b_2) etc. etc.

Am I doing this wrong in the code? Instead of nn.Linear should I use nn.sigmoid etc.?
And what should the F. function be in the forward pass for the linear part?

this is what I was going by, it is the only example of pytorch multilayer perceptron

thanks

There is nn.Sequential in pytorch. You can add modules like in Keras.

1,2: don’t understand the questions
3: I don’t know
4: dropout functional should be used as: F.dropout(x, 0.2, self.training)

Well is this code correct? For constructing several layers, with the nonlinear activation functions tanh, sigmoid, softmax?

class Net(nn.Module):
    def init(self):
        super(Net, self).init()
        self.hidden = nn.Linear(784,512)
        self.hidden2 = nn.Linear(512,512)
        self.hidden3 = nn.Linear(512,10)
        self.out = nn.Linear(10,1)
def forward(self, x):
    x = F.tanh(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.sigmoid(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.softmax(self.hidden(x))
    x = self.out(x)

let me know please. thank you

You use the same layer over and over (self.hidden)
The reason why you need to instantiate the layers in the init method is, that they have parameters (the weights) that have be be bound to the object.
In the forward method you can use your layers and apply functions (without parameters) like relu or softmax or tanh

OK, I tried interlacing this model with the MNIST example. It seems the model is not correctly implemented.
The code is:

from future import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

parser = argparse.ArgumentParser(description=‘PyTorch MNIST Example’)
parser.add_argument(‘–batch-size’, type=int, default=64, metavar=‘N’,
help=‘input batch size for training (default: 64)’)
parser.add_argument(‘–test-batch-size’, type=int, default=1000, metavar=‘N’,
help=‘input batch size for testing (default: 1000)’)
parser.add_argument(‘–epochs’, type=int, default=10, metavar=‘N’,
help=‘number of epochs to train (default: 10)’)
parser.add_argument(‘–no-cuda’, action=‘store_true’, default=False,
help=‘disables CUDA training’)
parser.add_argument(‘–seed’, type=int, default=1, metavar=‘S’,
help=‘random seed (default: 1)’)
parser.add_argument(‘–log-interval’, type=int, default=10, metavar=‘N’,
help=‘how many batches to wait before logging training status’)
args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if args.cuda else {}

train_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.hidden = nn.Linear(784,512)
self.hidden2 = nn.Linear(512,512)
self.hidden3 = nn.Linear(512,10)
self.out = nn.Linear(10,1)

def forward(self, x):
    x = F.tanh(self.hidden(x))
    #x = F.dropout(self.hidden(x),0.2)
    x = F.sigmoid(self.hidden2(x))
    #x = F.dropout(self.hidden(x),0.2)
    x = F.softmax(self.hidden3(x))
    x = self.out(x) 
def num_flat_features(self, x):
    size = x.size()[1:]  # all dimensions except the batch dimension
    num_features = 1
    for s in size:
        num_features *= s
    return num_features

model = Net()
print(model)

if args.cuda:
model.cuda()

optimizer = optim.SGD(model.parameters(), lr=.01, momentum=0)

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print(‘Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}’.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))

def test():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
test_loss += F.nll_loss(output, target, size_average=False).data[0] # sum up batch loss
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()

test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

for epoch in range(1, args.epochs + 1):
train(epoch)
test()

the output is:

Net (
(hidden): Linear (784 → 512)
(hidden2): Linear (512 → 512)
(hidden3): Linear (512 → 10)
(out): Linear (10 → 1)
)
Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 145, in
train(epoch)
File “mymodel.py”, line 116, in train
output = model(data)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “mymodel.py”, line 85, in forward
x = F.tanh(self.hidden(x))
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/linear.py”, line 54, in forward
return self._backend.Linear()(input, self.weight, self.bias)
File "/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functions/linear.py", line 10, in forward
output.addmm
(0, 1, input, weight.t())
RuntimeError: matrices expected, got 4D, 2D tensors at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:1232

You want in forward this:

def forward(self, x):
x = F.tanh(self.hidden(x))
x = F.dropout(x,0.2)
x = F.sigmoid(self.hidden2(x))
x = F.dropout(x,0.2)
x = F.softmax(self.hidden3(x))
x = self.out(x)
Return x

Still getting this error:

Hi,

if your inputs are arranged as pictures and you want to feed them to a linear layer, you want to flatten them first, e.g x = x.view (-1, 784). Would that help in your case?

Best regards

Thomas

Thank you, good find. I put this at the start of the forward call.

Now I am having an error with calculating the loss function:

Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 147, in
train(epoch)
File “mymodel.py”, line 119, in train
loss = F.nll_loss(output, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functional.py”, line 501, in nll_loss
return f(input, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /b/wheel/pytorch-src/torch/lib/THNN/generic/ClassNLLCriterion.c:57

You want to pass a score vector as prediction…

I’m sorry I’m afraid I do not understand. What is the score vector and where do I pass it to? Note that the rest of the code I took from the mnist example in the pytorch package, main.py, only the net I use is different, so the training and testing code should be correct.

U need model to return return F.log_softmax(x)

I changed the softmax to log_softmax but it does not change the output, I am still having a problem in calculating the loss function

Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 147, in
train(epoch)
File “mymodel.py”, line 119, in train
loss = F.nll_loss(output, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functional.py”, line 501, in nll_loss
return f(input, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /b/wheel/pytorch-src/torch/lib/THNN/generic/ClassNLLCriterion.c:57

it should be set up like in examples in pytorch docs i.e.
input is of size nBatch x nClasses = 3 x 5
input = autograd.Variable(torch.randn(3, 5))

each element in target has to have 0 <= value < nclasses
target = autograd.Variable(torch.LongTensor([1, 0, 4]))

output = F.nll_loss(F.log_softmax(input), target)
output.backward()

Can you provide link to code you have now?

something like this:

def forward(self, x):
x = x.view (-1, 784)
x = F.tanh(self.hidden(x))
x = F.dropout(x,0.2)
x = F.sigmoid(self.hidden2(x))
x = F.dropout(x,0.2)
x = F.softmax(self.hidden3(x))
x = self.out(x)
Return F.log_softmax(x)

has to be after output of last linear layer

OK, here is the code now,

from future import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

parser = argparse.ArgumentParser(description=‘PyTorch MNIST Example’)
parser.add_argument(‘–batch-size’, type=int, default=64, metavar=‘N’,
help=‘input batch size for training (default: 64)’)
parser.add_argument(‘–test-batch-size’, type=int, default=1000, metavar=‘N’,
help=‘input batch size for testing (default: 1000)’)
parser.add_argument(‘–epochs’, type=int, default=10, metavar=‘N’,
help=‘number of epochs to train (default: 10)’)
parser.add_argument(‘–no-cuda’, action=‘store_true’, default=False,
help=‘disables CUDA training’)
parser.add_argument(‘–seed’, type=int, default=1, metavar=‘S’,
help=‘random seed (default: 1)’)
parser.add_argument(‘–log-interval’, type=int, default=10, metavar=‘N’,
help=‘how many batches to wait before logging training status’)
args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if args.cuda else {}

train_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.hidden = nn.Linear(784,512)
self.hidden2 = nn.Linear(512,512)
self.hidden3 = nn.Linear(512,10)
self.out = nn.Linear(10,1)

def forward(self, x):
    x = x.view (-1, 784)
    x = F.tanh(self.hidden(x))
    x = F.dropout(x,0.2)
    x = F.sigmoid(self.hidden2(x))
    x = F.dropout(x,0.2)
    x = F.softmax(self.hidden3(x))
    x = self.out(x)
    return F.log_softmax(x)
def num_flat_features(self, x):
    size = x.size()[1:]  # all dimensions except the batch dimension
    num_features = 1
    for s in size:
        num_features *= s
    return num_features

model = Net()
print(model)

if args.cuda:
model.cuda()

optimizer = optim.SGD(model.parameters(), lr=.01, momentum=0)

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(F.log_softmax(output), target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print(‘Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}’.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))

def test():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
test_loss += F.nll_loss(output, target, size_average=False).data[0] # sum up batch loss
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()

test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

for epoch in range(1, args.epochs + 1):
train(epoch)
test()

this is the output:

Net (
(hidden): Linear (784 → 512)
(hidden2): Linear (512 → 512)
(hidden3): Linear (512 → 10)
(out): Linear (10 → 1)
)
Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 147, in
train(epoch)
File “mymodel.py”, line 119, in train
loss = F.nll_loss(F.log_softmax(output), target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functional.py”, line 501, in nll_loss
return f(input, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /b/wheel/pytorch-src/torch/lib/THNN/generic/ClassNLLCriterion.c:57

for comparison here is the main.py file that I am basing my code on:

from future import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

Training settings

parser = argparse.ArgumentParser(description=‘PyTorch MNIST Example’)
parser.add_argument(‘–batch-size’, type=int, default=64, metavar=‘N’,
help=‘input batch size for training (default: 64)’)
parser.add_argument(‘–test-batch-size’, type=int, default=1000, metavar=‘N’,
help=‘input batch size for testing (default: 1000)’)
parser.add_argument(‘–epochs’, type=int, default=10, metavar=‘N’,
help=‘number of epochs to train (default: 10)’)
parser.add_argument(‘–lr’, type=float, default=0.01, metavar=‘LR’,
help=‘learning rate (default: 0.01)’)
parser.add_argument(‘–momentum’, type=float, default=0.5, metavar=‘M’,
help=‘SGD momentum (default: 0.5)’)
parser.add_argument(‘–no-cuda’, action=‘store_true’, default=False,
help=‘disables CUDA training’)
parser.add_argument(‘–seed’, type=int, default=1, metavar=‘S’,
help=‘random seed (default: 1)’)
parser.add_argument(‘–log-interval’, type=int, default=10, metavar=‘N’,
help=‘how many batches to wait before logging training status’)
args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if args.cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
    x = x.view(-1, 320)
    x = F.relu(self.fc1(x))
    x = F.dropout(x, training=self.training)
    x = self.fc2(x)
    return F.log_softmax(x)

model = Net()
if args.cuda:
model.cuda()

optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print(‘Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}’.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))

def test(epoch):
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
test_loss += F.nll_loss(output, target).data[0]
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()

test_loss = test_loss
test_loss /= len(test_loader) # loss function already averages over batch size
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

for epoch in range(1, args.epochs + 1):
train(epoch)
test(epoch)

At first glance I see u got log_softmax twice
In def train:
Loss = F.nll_loss(F.log_softmax(output), target)

And in return of forward u got one.

Can change to:

Loss = F.nll_loss(output, target)

Oh, it should be log softmax? Interesting :slight_smile: That information is about 15 minutes too late for the tutorial I just made :smiley:

Edit, using log_softmax does effectively train a ton better :stuck_out_tongue:

1 Like