Kernel dead when casting to sp.posit8

rodrilag · April 26, 2021, 7:00pm

Hello, I’m trying to use Posit<8,1> format in Pytorch with SoftPosit.
To start, I’m trying to load MNIST dataset and then, cast it to Posit8. My problem comes when I try to cast the X train to sp.posit8, apparently the program do it well, but after 2 minutes it gets blocked and after some time appears a message saying “kernel dead”.

This is the code I use:
Downloading the datasets in Pytorch format:

# define transforms
# transforms.ToTensor() automatically scales the images to [0,1] range
transforms = transforms.Compose([transforms.Resize((32, 32)),
                                 transforms.ToTensor()]) 

# download and create datasets
train_dataset = datasets.MNIST(root='mnist_data', 
                               train=True, 
                               transform=transforms,
                               download=True) 

valid_dataset = datasets.MNIST(root='mnist_data', 
                               train=False, 
                               transform=transforms) 

# define the data loaders
train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=BATCH_SIZE, 
                          shuffle=True)  

valid_loader = DataLoader(dataset=valid_dataset, 
                          batch_size=BATCH_SIZE, 
                          shuffle=False)

Here, I get the X and Y values of the train dataset and put them in numpy arrays:

X_train = []
Y_train = []
for x,y in train_loader:
    X_train.append(x.numpy())
    Y_train.append(y.numpy())
X_train = np.array(X_train, dtype=sp.posit8)
Y_train = np.array(Y_train, dtype=sp.posit8)

Finally, this is the part where the kernel dies:

aux = np.empty_like(X_train, dtype=sp.posit8)
for i in range(X_train.size):
    aux.flat[i] = sp.posit8(X_train.flat[i])

X_test = aux

Is there any solution to fix my kernel dead problem?
Thank you

ptrblck · April 27, 2021, 8:44am

I assume you are using a Jupyter notebook, which might not show the error message properly, but instead would kill the kernel and restart it.
Try to run the script in the terminal and check, if you would get a valid error message.

rodrilag · April 27, 2021, 6:16pm

Hello. Yes, I used Jupyter Notebook. The error message only appeared once. I tried to execute it again and the PC blocked, without error message and denying me to do anything.

As you said, I’ve tried executing the script in the terminal, but I got the same result, the PC blocked.
I don’t know if there is another way of casting the dataset in posit8 format.

rodrilag · April 29, 2021, 5:59pm

Is there another solution availabe? I’ve tried it in the terminal, and I got the same result

ptrblck · April 29, 2021, 9:50pm

I haven’t encountered a kernel dead issue in the terminal yet, so I’m unsure how to debug it.
You could most likely check the application with gdb via:

gdb --args python script.pt args
...
run
...
bt

and see, if you get a better error message.

rodrilag · May 1, 2021, 10:36am

I’ve tried the debug with gdb and, when I run the script I got the PC blocked again. This doesn’t allow me to do the backtrace.

So, I imagine that the issue on this error is in the loop where I do the cast, in the following lines:

for i in range(X_train.size):
    aux.flat[i] = sp.posit8(X_train.flat[i])

I’ve searched the size and structure of trai dataset, and I figure out that is a 5 dimension tensor with this structure: [1875][32][1][32][32] which makes 61,440,000 elements to cast. So, can the reason of the error be the huge size of the dataset?

Moreover, I have tried another way of making the cast to Posit8 with this definition:

def TensorToP8(X_array, Y_array, narray):
    c1=0
    c2=0
    c3=0
    c4=0
    c5=0
    k=0

    for i1,y in narray:
        for i2 in i1:
            for i3 in i2:
                for i4 in i3:
                    for i5 in i4:
                        X_array[c1][c2][c3][c4][c5] = sp.posit8(np.double(i5))
                        c5+=1
                    c4+=1
                    c5=0
                c3+=1
                c4=0
            c2+=1
            c3=0
        for y2 in y:
            Y_array[c1][k] = y2.numpy()
            k+=1
        c1+=1
        c2=0
        k=0
    return X_array,Y_array

I used it with this call:

X_train= np.ndarray([1875,32,1,32,32])
Y_train = np.ndarray([1875,32])
X_validate = np.ndarray([313,32,1,32,32])
Y_validate = np.ndarray([313,32])

X_train, Y_train = TensorToP8(X_train,Y_train,train_loader)

X_validate, Y_validate = TensorToP8(X_validate,Y_validate, valid_loader)

It works, but doesn’t change the datatype. So, I tried to initially change the datatype of the ndarrays with this:

X_train= np.ndarray([1875,32,1,32,32],  dtype=sp.posit8)
X_validate = np.ndarray([313,32,1,32,32],  dtype=sp.posit8)

And, with this change, I get again the PC blocked. So, can it be due to the datatype format of softposit and the huge amount of data the reason that the kernel is not capable of making the cast?

ptrblck · May 2, 2021, 5:39am

rodrilag:

So, I tried to initially change the datatype of the ndarrays with this:
X_train= np.ndarray([1875,32,1,32,32],  dtype=sp.posit8)
X_validate = np.ndarray([313,32,1,32,32],  dtype=sp.posit8)
And, with this change, I get again the PC blocked. So, can it be due to the datatype format of softposit and the huge amount of data the reason that the kernel is not capable of making the cast?

The posted code snippet would use ~470MB in np.float64 and assuming that sp.posit8 uses 8bit, it would be less, so I don’t think that it’s a huge amount of data.

With that being said, I’m not familiar with sp.posit8 and since the numpy array transformation is already freezing your system, it seems that this transformation is failing.