I have a file with text data (dataset for predict).
eg. such (many rows, three columns)
34, 56, 76
44, 55, 79
45, 79, 87
…
The file is large, about 700mb.
How best can I convert this file to pytorch?
Hi,
You can do the following. Replace file.csv with your file name.
from numpy import genfromtxt
my_data = genfromtxt('file.csv', delimiter=',')
You will get a numpy array(Nx3). There may be other ways in pytorch. This might help.
Thanks
Thanks, I did just that.
But I was embarrassed by the data output.
if i do like this:
b = "1.0;0.0;512.50;507.00;508.25;122.0;20.0;0.0;20.0"
c = b.split(';')
r = [float(item) for item in c]
print(r)
[1.0, 0.0, 512.5, 507.0, 508.25, 122.0, 20.0, 0.0, 20.0]
well, if i do like this:
st = np.genfromtxt('123.txt',delimiter=';')
print(st)
[1.00000e+00 0.00000e+00 5.12500e+02 … 2.00000e+01 0.00000e+00
2.00000e+01]
Missing comma between numbers.
I did
t = torch.as_tensor(st, dtype=torch.float32)
now everything is fine
if i do the distribution of numbers so ->
st = np.interp(3.4, [0,10], [-1,1])
= -0.31999999999999995
And if I do through the function np.genfromtxt ->
st = np.interp(np.genfromtxt('123.txt',delimiter=';',dtype='d'), [0,10], [-1,1])
= [-0.6 -1. -0.2 -0.6 0.4 0. 0.4 -1. 0.4 0. 0.2 0.9
0.4 0. 0.2 0.95 -1. -1. 0.2 -1. -1. -1. -1. -1.
-1. -1. -0.6 0.6 -1. -1. -0.6 -1. ]
function round numbers. But for normalization, I need to know the full number. Help return values.