Using linear layers? New user transfering from keras

slavakung · June 28, 2017, 11:31am

Hello,

I found in keras a nice multilayer perceptron of the form

model.add(Dense(512, input_shape=(784,)))
model.add(Activation(‘tanh’))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation(‘linear’))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation(‘sigmoid’))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation(‘linear’))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation(‘softmax’))

I was not sure how to do the linear layers in pytorch, trying to mimic the tutorial I have

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.hidden = nn.Linear(784,512)
self.hidden2 = nn.Linear(512,512)
self.hidden3 = nn.Linear(512,10)
self.out = nn.Linear(10,1)

def forward(self, x):
    x = F.tanh(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.sigmoid(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.softmax(self.hidden(x))
    x = self.out(x) 

def num_flat_features(self, x):
    size = x.size()[1:]  # all dimensions except the batch dimension
    num_features = 1
    for s in size:
        num_features *= s
    return num_features

But,

How do you create the purely linear layers?
As follow up nn itself has linear and nonlinear methods, but to do softmax etc I still use linear (per the example), what is the nonlinear layer functionality for?
Can someone describe the purpose/point of the num_flat_features function?
Am I settuping up the dropout right?

My apologies if these are basic questions, but I couldn’t quite find the right examples. In this case I’m just trying to create a multilayer perceptron with many nonlinear layers, so sometimes I am not sure if some of the functionality in the examples is for convolutional or more complicated nets specifically,

aitutakiv · June 29, 2017, 4:55am

The basic building blocks of deep networks are of the form: Linear layer + Point-wise non-linearity / activation.
Keras rolls these two into one, called “Dense.”
(I’m not sure why the Keras example you have follows Dense with another activation, that doesn’t make sense to me.)
To make a simple multi-layer perception in PyTorch you should stack nn.Linear (a simple linear layer that computes w^Tx + b) and nn.ReLU.
If you’d like a softmax followed by cross entropy loss at the end, you can use CrossEntropyLoss (which performs the softmax and the loss in one function for numerical reasons).

slavakung · June 29, 2017, 8:24am

Thank you for responding

In my set up I would like a set of linearities with a nonlinear but continuously differentiable activation function, so
layer 1: sigmoid(w_1^T x+b_1)
layer 2: softmax(w^T_2 y_1 +b_2) etc. etc.

Am I doing this wrong in the code? Instead of nn.Linear should I use nn.sigmoid etc.?
And what should the F. function be in the forward pass for the linear part?

this is what I was going by, it is the only example of pytorch multilayer perceptron

thanks

ruotianluo · June 29, 2017, 3:03pm

There is nn.Sequential in pytorch. You can add modules like in Keras.

1,2: don’t understand the questions
3: I don’t know
4: dropout functional should be used as: F.dropout(x, 0.2, self.training)

slavakung · July 3, 2017, 8:31am

Well is this code correct? For constructing several layers, with the nonlinear activation functions tanh, sigmoid, softmax?

class Net(nn.Module):
    def init(self):
        super(Net, self).init()
        self.hidden = nn.Linear(784,512)
        self.hidden2 = nn.Linear(512,512)
        self.hidden3 = nn.Linear(512,10)
        self.out = nn.Linear(10,1)
def forward(self, x):
    x = F.tanh(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.sigmoid(self.hidden(x))
    x = F.dropout(self.hidden(x),0.2)
    x = F.softmax(self.hidden(x))
    x = self.out(x)

slavakung · July 7, 2017, 9:53am

let me know please. thank you

j.laute · July 7, 2017, 1:36pm

You use the same layer over and over (self.hidden)
The reason why you need to instantiate the layers in the init method is, that they have parameters (the weights) that have be be bound to the object.
In the forward method you can use your layers and apply functions (without parameters) like relu or softmax or tanh

slavakung · July 14, 2017, 9:09am

OK, I tried interlacing this model with the MNIST example. It seems the model is not correctly implemented.
The code is:

from future import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

parser = argparse.ArgumentParser(description=‘PyTorch MNIST Example’)
parser.add_argument(‘–batch-size’, type=int, default=64, metavar=‘N’,
help=‘input batch size for training (default: 64)’)
parser.add_argument(‘–test-batch-size’, type=int, default=1000, metavar=‘N’,
help=‘input batch size for testing (default: 1000)’)
parser.add_argument(‘–epochs’, type=int, default=10, metavar=‘N’,
help=‘number of epochs to train (default: 10)’)
parser.add_argument(‘–no-cuda’, action=‘store_true’, default=False,
help=‘disables CUDA training’)
parser.add_argument(‘–seed’, type=int, default=1, metavar=‘S’,
help=‘random seed (default: 1)’)
parser.add_argument(‘–log-interval’, type=int, default=10, metavar=‘N’,
help=‘how many batches to wait before logging training status’)
args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if args.cuda else {}

train_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.hidden = nn.Linear(784,512)
self.hidden2 = nn.Linear(512,512)
self.hidden3 = nn.Linear(512,10)
self.out = nn.Linear(10,1)

def forward(self, x):
    x = F.tanh(self.hidden(x))
    #x = F.dropout(self.hidden(x),0.2)
    x = F.sigmoid(self.hidden2(x))
    #x = F.dropout(self.hidden(x),0.2)
    x = F.softmax(self.hidden3(x))
    x = self.out(x)

def num_flat_features(self, x):
    size = x.size()[1:]  # all dimensions except the batch dimension
    num_features = 1
    for s in size:
        num_features *= s
    return num_features

model = Net()
print(model)

if args.cuda:
model.cuda()

optimizer = optim.SGD(model.parameters(), lr=.01, momentum=0)

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print(‘Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}’.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))

def test():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
test_loss += F.nll_loss(output, target, size_average=False).data[0] # sum up batch loss
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()

test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

for epoch in range(1, args.epochs + 1):
train(epoch)
test()

the output is:

Net (
(hidden): Linear (784 → 512)
(hidden2): Linear (512 → 512)
(hidden3): Linear (512 → 10)
(out): Linear (10 → 1)
)
Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 145, in
train(epoch)
File “mymodel.py”, line 116, in train
output = model(data)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “mymodel.py”, line 85, in forward
x = F.tanh(self.hidden(x))
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/linear.py”, line 54, in forward
return self._backend.Linear()(input, self.weight, self.bias)
File "/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functions/linear.py", line 10, in forward
output.addmm(0, 1, input, weight.t())
RuntimeError: matrices expected, got 4D, 2D tensors at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:1232

dgriff · July 14, 2017, 5:53pm

You want in forward this:

def forward(self, x):
x = F.tanh(self.hidden(x))
x = F.dropout(x,0.2)
x = F.sigmoid(self.hidden2(x))
x = F.dropout(x,0.2)
x = F.softmax(self.hidden3(x))
x = self.out(x)
Return x

slavakung · July 17, 2017, 7:53am

Still getting this error:

Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 146, in
train(epoch)
File “mymodel.py”, line 117, in train
output = model(data)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “mymodel.py”, line 85, in forward
x = F.tanh(self.hidden(x))
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/modules/linear.py”, line 54, in forward
return self._backend.Linear()(input, self.weight, self.bias)
File "/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functions/linear.py", line 10, in forward
output.addmm(0, 1, input, weight.t())
RuntimeError: matrices expected, got 4D, 2D tensors at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:1232

tom · July 17, 2017, 8:52am

Hi,

if your inputs are arranged as pictures and you want to feed them to a linear layer, you want to flatten them first, e.g x = x.view (-1, 784). Would that help in your case?

Best regards

Thomas

slavakung · July 17, 2017, 9:16am

Thank you, good find. I put this at the start of the forward call.

Now I am having an error with calculating the loss function:

Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 147, in
train(epoch)
File “mymodel.py”, line 119, in train
loss = F.nll_loss(output, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functional.py”, line 501, in nll_loss
return f(input, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /b/wheel/pytorch-src/torch/lib/THNN/generic/ClassNLLCriterion.c:57

tom · July 17, 2017, 10:24am

You want to pass a score vector as prediction…

slavakung · July 17, 2017, 10:51am

I’m sorry I’m afraid I do not understand. What is the score vector and where do I pass it to? Note that the rest of the code I took from the mnist example in the pytorch package, main.py, only the net I use is different, so the training and testing code should be correct.

dgriff · July 17, 2017, 11:45pm

U need model to return return F.log_softmax(x)

slavakung · July 18, 2017, 9:05am

I changed the softmax to log_softmax but it does not change the output, I am still having a problem in calculating the loss function

Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 147, in
train(epoch)
File “mymodel.py”, line 119, in train
loss = F.nll_loss(output, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functional.py”, line 501, in nll_loss
return f(input, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /b/wheel/pytorch-src/torch/lib/THNN/generic/ClassNLLCriterion.c:57

dgriff · July 18, 2017, 7:57pm

it should be set up like in examples in pytorch docs i.e.
input is of size nBatch x nClasses = 3 x 5
input = autograd.Variable(torch.randn(3, 5))

each element in target has to have 0 <= value < nclasses
target = autograd.Variable(torch.LongTensor([1, 0, 4]))

output = F.nll_loss(F.log_softmax(input), target)
output.backward()

Can you provide link to code you have now?

something like this:

def forward(self, x):
x = x.view (-1, 784)
x = F.tanh(self.hidden(x))
x = F.dropout(x,0.2)
x = F.sigmoid(self.hidden2(x))
x = F.dropout(x,0.2)
x = F.softmax(self.hidden3(x))
x = self.out(x)
Return F.log_softmax(x)

has to be after output of last linear layer

slavakung · July 19, 2017, 8:00am

OK, here is the code now,

from future import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

parser = argparse.ArgumentParser(description=‘PyTorch MNIST Example’)
parser.add_argument(‘–batch-size’, type=int, default=64, metavar=‘N’,
help=‘input batch size for training (default: 64)’)
parser.add_argument(‘–test-batch-size’, type=int, default=1000, metavar=‘N’,
help=‘input batch size for testing (default: 1000)’)
parser.add_argument(‘–epochs’, type=int, default=10, metavar=‘N’,
help=‘number of epochs to train (default: 10)’)
parser.add_argument(‘–no-cuda’, action=‘store_true’, default=False,
help=‘disables CUDA training’)
parser.add_argument(‘–seed’, type=int, default=1, metavar=‘S’,
help=‘random seed (default: 1)’)
parser.add_argument(‘–log-interval’, type=int, default=10, metavar=‘N’,
help=‘how many batches to wait before logging training status’)
args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if args.cuda else {}

train_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.hidden = nn.Linear(784,512)
self.hidden2 = nn.Linear(512,512)
self.hidden3 = nn.Linear(512,10)
self.out = nn.Linear(10,1)

def forward(self, x):
    x = x.view (-1, 784)
    x = F.tanh(self.hidden(x))
    x = F.dropout(x,0.2)
    x = F.sigmoid(self.hidden2(x))
    x = F.dropout(x,0.2)
    x = F.softmax(self.hidden3(x))
    x = self.out(x)
    return F.log_softmax(x)

def num_flat_features(self, x):
    size = x.size()[1:]  # all dimensions except the batch dimension
    num_features = 1
    for s in size:
        num_features *= s
    return num_features

model = Net()
print(model)

if args.cuda:
model.cuda()

optimizer = optim.SGD(model.parameters(), lr=.01, momentum=0)

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(F.log_softmax(output), target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print(‘Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}’.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))

def test():
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
test_loss += F.nll_loss(output, target, size_average=False).data[0] # sum up batch loss
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()

test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

for epoch in range(1, args.epochs + 1):
train(epoch)
test()

this is the output:

Net (
(hidden): Linear (784 → 512)
(hidden2): Linear (512 → 512)
(hidden3): Linear (512 → 10)
(out): Linear (10 → 1)
)
Traceback (most recent call last):
File “”, line 1, in
File “mymodel.py”, line 147, in
train(epoch)
File “mymodel.py”, line 119, in train
loss = F.nll_loss(F.log_softmax(output), target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/functional.py”, line 501, in nll_loss
return f(input, target)
File “/home/slava/dev/miniconda2/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py”, line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /b/wheel/pytorch-src/torch/lib/THNN/generic/ClassNLLCriterion.c:57

for comparison here is the main.py file that I am basing my code on:

from future import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

Training settings

parser = argparse.ArgumentParser(description=‘PyTorch MNIST Example’)
parser.add_argument(‘–batch-size’, type=int, default=64, metavar=‘N’,
help=‘input batch size for training (default: 64)’)
parser.add_argument(‘–test-batch-size’, type=int, default=1000, metavar=‘N’,
help=‘input batch size for testing (default: 1000)’)
parser.add_argument(‘–epochs’, type=int, default=10, metavar=‘N’,
help=‘number of epochs to train (default: 10)’)
parser.add_argument(‘–lr’, type=float, default=0.01, metavar=‘LR’,
help=‘learning rate (default: 0.01)’)
parser.add_argument(‘–momentum’, type=float, default=0.5, metavar=‘M’,
help=‘SGD momentum (default: 0.5)’)
parser.add_argument(‘–no-cuda’, action=‘store_true’, default=False,
help=‘disables CUDA training’)
parser.add_argument(‘–seed’, type=int, default=1, metavar=‘S’,
help=‘random seed (default: 1)’)
parser.add_argument(‘–log-interval’, type=int, default=10, metavar=‘N’,
help=‘how many batches to wait before logging training status’)
args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()

torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if args.cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(‘…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)

def forward(self, x):
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
    x = x.view(-1, 320)
    x = F.relu(self.fc1(x))
    x = F.dropout(x, training=self.training)
    x = self.fc2(x)
    return F.log_softmax(x)

model = Net()
if args.cuda:
model.cuda()

optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print(‘Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}’.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))

def test(epoch):
model.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output = model(data)
test_loss += F.nll_loss(output, target).data[0]
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()

test_loss = test_loss
test_loss /= len(test_loader) # loss function already averages over batch size
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

for epoch in range(1, args.epochs + 1):
train(epoch)
test(epoch)

dgriff · July 19, 2017, 8:21am

At first glance I see u got log_softmax twice
In def train:
Loss = F.nll_loss(F.log_softmax(output), target)

And in return of forward u got one.

Can change to:

Loss = F.nll_loss(output, target)

hughperkins · July 19, 2017, 12:57pm

Oh, it should be log softmax? Interesting That information is about 15 minutes too late for the tutorial I just made

Edit, using log_softmax does effectively train a ton better