Confused about tensor dimensions and batches

Hey guys.
So I’m very new to PyTorch and Neural Networks in general, and I’m having some problems creating a Neural Network that classifies names by gender.
I based this off of the PyTorch tutorial for RNNs that classify names by nationality, but I decided not to go with a recurrent approach… Stop me right here if this was the wrong idea!
However, whenever I try to run an input through the network it tells me:
RuntimeError: matrices expected, got 3D, 2D tensors at /py/conda-bld/pytorch_1493681908901/work/torch/lib/TH/generic/THTensorMath.c:1232
I know this has something to do with how PyTorch always expects there to be a batch size or something, and I have my tensor set up that way, but you can probably tell by this point that I have no idea what I’m talking about.
Here’s my code:
from future import unicode_literals, print_function, division
from io import open
import glob
import unicodedata
import string
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import random
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

"""------GLOBAL VARIABLES------"""

all_letters = string.ascii_letters + " .,;'"
num_letters = len(all_letters)
all_names = {}
genders = ["Female", "Male"]

"""-------DATA EXTRACTION------"""

def findFiles(path):
    return glob.glob(path)

def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

for file in findFiles("/home/andrew/PyCharm/PycharmProjects/CantStop/data/names/*.txt"):
    gender = file.split("/")[-1].split(".")[0]
    names = readLines(file)
    all_names[gender] = names


def nameToTensor(name):
    tensor = torch.zeros(len(name), 1, num_letters)
    for index, letter in enumerate(name):
        tensor[index][0][all_letters.find(letter)] = 1
    return tensor

def outputToGender(output):
    gender, gender_index =
    if gender_index[0][0] == 0:
        return "Female"
    return "Male"

"""------NETWORK SETUP------"""

class Net(nn.Module):
    def __init__(self, input_size, output_size):
        super(Net, self).__init__()
        #Layer 1
        self.Lin1 = nn.Linear(input_size, int(input_size/2))
        self.ReLu1 = nn.ReLU()
        self.Batch1 = nn.BatchNorm1d(int(input_size/2))
        #Layer 2
        self.Lin2 = nn.Linear(int(input_size/2), output_size)
        self.ReLu2 = nn.ReLU()
        self.Batch2 = nn.BatchNorm1d(output_size)
        self.softMax = nn.LogSoftmax()

    def forward(self, input):
        output1 = self.Batch1(self.ReLu1(self.Lin1(input)))
        output2 = self.softMax(self.Batch2(self.ReLu2(self.Lin2(output1))))
        return output2

NN = Net(num_letters, 2)


def getRandomTrainingEx():
    gender = genders[random.randint(0, 1)]
    name = all_names[gender][random.randint(0, len(all_names[gender])-1)]
    gender_tensor = Variable(torch.LongTensor([genders.index(gender)]))
    name_tensor = Variable(nameToTensor(name))
    return gender_tensor, name_tensor, gender

def train(input, target):
    loss_func = nn.NLLLoss()

    optimizer = optim.SGD(NN.parameters(), lr=0.0001, momentum=0.9)


    output = NN(input)

    loss = loss_func(output, target)

    return output, loss

all_losses = []
current_loss = 0

for i in range(100000):
    gender_tensor, name_tensor, gender = getRandomTrainingEx()
    output, loss = train(name_tensor, gender_tensor)
    current_loss += loss

    if i%1000 == 0:
        print("Guess: %s, Correct: %s, Loss: %s" % (outputToGender(output), gender,[0]))

    if i%100 == 0:
        current_loss = 0

# plt.figure()
# plt.plot(all_losses)

Please help a newbie out!


The input to a linear layer should be a tensor of size [batch_size, input_size] where input_size is the same size as the first layer in your network (so in your case it’s num_letters).

The problem appears in the line:

tensor = torch.zeros(len(name), 1, num_letters)

which should actually just be:

tensor = torch.zeros(len(name), num_letters)

As an easy example:

input_size = 8
output_size = 14
batch_size = 64

net = nn.Linear(input_size, output_size)
input = Variable(torch.FloatTensor(batch_size, input_size))

output = net(input)

print("Output size:", output.size())

Output size: (64, 14)

Hope this helps,


I’ve thought about this, but when I change the input dimensions, I get hit with this batch size error, which I can’t make heads or tails of:
Traceback (most recent call last):
File “/home/andrew/PyCharm/PycharmProjects/CantStop/Cant-Stop/Python Files/”, line 119, in
output, loss = train(name_tensor, gender_tensor)
File “/home/andrew/PyCharm/PycharmProjects/CantStop/Cant-Stop/Python Files/”, line 108, in train
loss = loss_func(output, target)
File “/home/andrew/anaconda3/lib/python3.6/site-packages/torch/nn/modules/”, line 206, in call
result = self.forward(*input, **kwargs)
File “/home/andrew/anaconda3/lib/python3.6/site-packages/torch/nn/modules/”, line 36, in forward
return backend_fn(self.size_average, weight=self.weight)(input, target)
File "/home/andrew/anaconda3/lib/python3.6/site-packages/torch/nn/functions/thnn/", line 41, in forward
output, *self.additional_args)
RuntimeError: Assertion `THIndexTensor
(size)(target, 0) == batch_size’ failed. at /py/conda-bld/pytorch_1493681908901/work/torch/lib/THNN/generic/ClassNLLCriterion.c:50

I’m a bit confused by your code, previously I thought len(name) was your batch size, is this correct? That error message suggests the batch size of your target and output are different, which does seem plausible if the batch dimension of your input tensor changes according to the name it’s given (without the target tensor also changing).

If it helps here is a self-contained example:

from __future__ import print_function
import torch
import torch.nn as nn
from torch.autograd import Variable

batch_size = 10
input_size = 2
output_size = 2

net = nn.Linear(input_size,output_size)
x = Variable(torch.FloatTensor(batch_size,input_size).normal_(0,1))
criterion = nn.NLLLoss()
target = Variable(torch.LongTensor(batch_size).fill_(1))

output = nn.LogSoftmax()(net(x))

loss = criterion(output, target)



Also you should move the lines

loss_func = nn.NLLLoss()
optimizer = optim.SGD(NN.parameters(), lr=0.0001, momentum=0.9)

outside of your train() function, as these only need to be set once.

1 Like

Yeah, i guess my batch size is equal to the number of letters in the name… from a conceptual standpoint, is that okay?
My idea was that you pass in a tensor representing the name (where every row is a one-hot vector describing the letter), and it outputs a 1x2 tensor that represents the probability that said name was either a boy or a girl. However, I now see the output’s size is batch_sizex2… does this mean that it shows a probability of each letter in the name being indicative of gender? And is there any way to condense these values into a 1x2 tensor like the current target tensor, or is this batch_sizex2 output format okay and I should change what the target tensor looks like, in which case how should I model it?
I’m sorry for the barrage of questions, I just really want to wrap my head around this :grimacing:

Generally, an MLP expects to produce an output vector for every input example. I.e for every name it will produce a single gender label.

In your case you are treating every letter in the name as an input example. This is what it means to set the batch size equal to the length of a name. In this case the network expects to produce an output label for every letter in the input, which is not very helpful.

Instead, you should take your one-hot encodings and concatenate them into a single vector, then treat this as a single input example. Your network would then take a batch of names as input, and produce a batch of labels as output.

I think your confusion lies in the definition of the terms: feature, instance, batch. A feature is a single pixel or a single letter, an instance is a single name (a collection of features), while a batch is a collection of names. How you choose to represent these is up to you, but generally a feature is a scalar value, an instance is a vector, while a batch is a matrix. If you are dealing with images (and convolutional networks) then an instance is a matrix and a batch is a tensor (4 dimensions).

Hope this is helpful!

And plenty of questions is fine! Everyone has to learn at some point.


Hey, what does .fill_(1) do in your target? As far as I’m understanding, the target should have a shape of (batch_size, output_size).