I am pretty new to Pytorch, having only setup a few models to this point.
I am attempting to implement a fully connected ReLU network as seen in this example - https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_nn.html
I have a decently large dataset ~750k rows and 2k columns. I am training the model using a cluster with 4 GPUs. When I try to train the model, I am getting the RuntimeError: Cuda error: out of memory.
From the research that I have done to this point it seems my options are to drop the cached variables during the training of the model, or reduce the batch size.
I attempted to implement some functionality to detach the variables, however, I don’t believe I am doing this correctly as I am still running into the same error. Also, I am not sure how to implement batching when creating tensors from a numpy array. There is a StackOverflow question on the topic, but it doesn’t have a solution - https://stackoverflow.com/questions/46170814/how-to-train-pytorch-model-with-numpy-data-and-batch-size.
device = torch.device('cuda:0') inputSize = X_trainTransformed.shape firstHiddenLayer = 500 secondHiddenLayer = 250 thirdHiddenLayer = 125 outputLayer = 1 model = torch.nn.Sequential( torch.nn.Linear(inputSize, firstHiddenLayer), torch.nn.ReLU(), torch.nn.Linear(firstHiddenLayer, secondHiddenLayer), torch.nn.ReLU(), torch.nn.Linear(secondHiddenLayer, thirdHiddenLayer), torch.nn.ReLU(), torch.nn.Linear(thirdHiddenLayer, outputLayer) ) if torch.cuda.device_count() > 1: print('Train using', torch.cuda.device_count(), 'GPUs!') model = torch.nn.DataParallel(model) model.to(device) X_trainTensor = torch.from_numpy(X_trainTransformed).float().to(device) y_trainTensor = torch.from_numpy(np.array(y_train)).float().reshape(-1, 1).to(device) X_testTensor = torch.from_numpy(X_testTransformed).float().to(device) y_testTensor = torch.from_numpy(np.array(y_test)).float().reshape(-1, 1).to(device) lossFunction = torch.nn.MSELoss(size_average = False) learningRate = 1e-4 optimizer = torch.optim.Adam(model.parameters(), lr = learningRate) for step in range(2000): yPredict = model(X_trainTensor) loss = lossFunction(yPredict, y_trainTensor) yPredict.detach() loss.detach() model.zero_grad() loss.backward() optimizer.step() if step % 200 == 0: testPrediction = model(X_testTensor).cpu() print(step, r2_score(testPrediction.detach().numpy(), y_test.reshape(-1, 1)))
Any advice on how to implement a solution to allow this model to train on the GPUs would be great.