I am a beginner in PyTorch. For a project I am taking a VGG16 model (not pretrained), and training it from scratch. I have two seemingly identical code in keras and pytorch.
Keras Code:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory="Cat_Dog_data/train", target_size=(224,224))
model = tf.keras.applications.VGG16(
include_top=True,
weights=None,
input_tensor=None,
input_shape=None,
pooling=None,
classes=2,
classifier_activation="softmax",
)
from tensorflow.keras.optimizers import Adam
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss=tf.keras.losses.categorical_crossentropy, metrics=['accuracy'])
model.fit(traindata, epochs=100, steps_per_epoch=100)
Pytorch Code:
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
from torch import nn, optim
from tqdm import tqdm
image_transform = transforms.Compose([transforms.Resize(size=(224,224)), transforms.ToTensor()])
tr_dataset = datasets.ImageFolder(root="Cat_Dog_data/train", transform=image_transform)
# I have 22500 images so to make steps_per_epoch 100 (same as keras code) batchsize if set to 225.
tr_dataloader = DataLoader(tr_dataset, batch_size=225, shuffle=True)
from torchvision import models
model = models.vgg16(pretrained=False)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
for t in range(100):
model.train()
print(f'Epoch {t}/100')
for X, y_true in tqdm(tr_dataloader):
# Forward pass
y_hat = model(X)
loss = criterion(y_hat, y_true)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
The keras one trains easily, takes arround 27min per epoch on CPU and have no dramatic effect on memory. But when I try to run the pytorch code it immediately takes up the entire memory, and the computer becomes extremely sluggish. I can’t even train the pytorch model on GPU as I have 4GB of GPU memory and the code fails with error demanding 6GB of memory.
Any suggestions on how to make this work? Are there extra settings that needs to be done in pytorch that I am missing? Thanks in advance.
I can’t speak to the CPU portion of this. But in order to get this working on a GPU – you could use half the batch size you are currently using – which should take up around 3GB and fit on your GPU. You could also reduce the image size. There are ways to ensure that a smaller batch size won’t reduce the models ability to learn – via gradient accumulation.
Yes I can do that. But the motive of the question wasn’t that. I want to know why for seemingly identical code pytorch is taking up so much more memory than its Keras counterpart. Like the difference is huge.