I have used ml.g4dn.2xlarge instance on SageMaker to test GPT-J6B model from HuggingFace using Transformer.
I am using
revision=float16 and low_cpu_usage=True so that the model is only of 12GB.
It is downloaded but after that it suddenly crashes the kernel.
Please share the workaround. The memory of that instance is 32 GB wit 4 vCPU.
If the crash is due to OOM on the CPU side during model loading rather than inference, I would check if e.g., increasing the swap size for loading could help:
Are the crashes happening before data is loaded?
One issue I have had is even if you move tensors to the gpu, they still take up system memory. If that is the issue, maybe streaming the data could help. torch.utils.data — PyTorch 2.0 documentation