By default all layers work with float tensors inside. Do you really need double precision? it would be more performant to cast your input to float instead of using DoubleTensors
However if you really need double precision you should be able do
I don’t need double precision, its just that I don’t understand what tensor flow is asking for when it asks for the expected variable [torch.DoubleTensor]
I did what you implied and it says:
Assuming you don’t want double precision this should work on PyTorch 0.4:
import torch
import torch.nn as nn
from torch.autograd import Variable
import pandas as pd
import numpy as np
class Linear_Regress(nn.Module):
def __init__(self, input_sz, output_sz):
super(Linear_Regress, self).__init__()
self.linear = nn.Linear(input_sz, output_sz)
def forward(self, x):
out = self.linear(x)
return out
model = Linear_Regress(1,1)
criterion = nn.MSELoss()
lrn_rt = 0.01
optm = torch.optim.SGD(model.parameters(), lr = lrn_rt)
# using numpy to generate random input and output
train_x = np.random.rand((1))
train_y = np.random.rand((1))
# the trick is to cast input and label manually to float as torch.from_numpy() for some reasons returns them as DoubleTensors
inputs = torch.from_numpy(train_x).to(torch.float)
outs = torch.from_numpy(train_y).to(torch.float)
epochs = 100
model(inputs)
for e in range(epochs):
e += 1
optm.zero_grad()
outputs = model.forward(inputs)
loss = criterion(outputs, outs)
loss.backward()
optm.step()
print('epoch {}, loss {}'.format(e, loss.data[0]))
made appropriate changes, ran the code you’ve given in a separate file and it worked, then i ran my own code as the following:
import torch
import torch.nn as nn
from torch.autograd import Variable
import pandas as pd
import numpy as np
train = pd.read_csv('train.csv')
train_x = np.array(train['x'])
train_y = np.array(train['y'])
class Linear_Regress(nn.Module):
def __init__(self, input_sz, output_sz):
super(Linear_Regress, self).__init__()
self.linear = nn.Linear(input_sz, output_sz)
def forward(self, x):
out = self.linear(x)
return out
model = Linear_Regress(1,1)
criterion = nn.MSELoss()
lrn_rt = 0.01
optm = torch.optim.SGD(model.parameters(), lr = lrn_rt)
epochs = 100
x_in = torch.from_numpy(train_x).to(torch.float)
y_out = torch.from_numpy(train_y).to(torch.float)
for e in range(epochs):
e += 1
inps = Variable(x_in)
outs = Variable(y_out)
optm.zero_grad()
outputs = model.forward(inps)
loss = criterion(outputs, outs)
loss.backward()
optm.step()
print('epoch {}, loss {}'.format(e,loss.data[0]))
and its giving me this error now:
RuntimeError: size mismatch, m1: [1 x 700], m2: [1 x 1] at c:\programdata\miniconda3\conda-bld\pytorch_1524543037166\work\aten\src\th\generic/THTensorMath.c:2033
so the problem is that you have to introduce an extra batch dimension.
By choosing an appropriate batchsize (between 1 and 32 usually but can also be bigger) the inputs and outputs become of shape N,1 where N is the choosen batchsize.
E.g. for batchsize one you could do:
for e in range(epochs):
for _x, _y in np.dstack(x_train, y_train):
inps = Variable(torch.from_numpy(_x).to(torch.float())
outs = Variable(torch.from_numpy(_y).to(torch.float())
optm.zero_grad()
outputs = model.forward(inps)
loss = criterion(outputs, outs)
loss.backward()
optm.step()
print('epoch {}, loss {}'.format(e,loss.data[0]))
and for batchsize 700 (not recommended):
for e in range(epochs):
inps = Variable(x_in).view(-1, 1)
outs = Variable(y_out).view(-1, 1)
optm.zero_grad()
outputs = model.forward(inps)
loss = criterion(outputs, outs)
loss.backward()
optm.step()
print('epoch {}, loss {}'.format(e,loss.data[0]))
The same concept also applies to different batch_sizes where you should split your array into batched chunks. Alternatively you could have a closer look on data loaders here and and in this tutorial
Note: The Networks parameters are only updated once per batch. Choosing a suitable batchsize can be of enormous importance.
The second suggestion was also only a theoretical one. Using the whole dataset as one large batch is usually not done anymore. However this example should work and it does not make sense to me that it produces NaN