Hello Everyone
I am working on a simulation using M3GNet, a machine learning potential that employs deep learning techniques. It specifically utilizes PyTorch as the framework for building and training its models. To accelerate the simulation, I am aiming to leverage GPU resources. I have already installed the necessary GPU drivers on my system.
the information of CUDA binary is that
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
I follow these pathes for installing lammps conda-forge, m3gnet, matgl, dgl, and pytorch are:
conda create -n lammps_m3gnet python=3.11
conda activate lammps_m3gnet
conda install -c conda-forge lammps
pip install m3gnet
pip install matgl
Install cudatoolkit v11.8.0:
conda install -c conda-forge cudatoolkit
Install cudnn v8.9.7:
conda install -c conda-forge cudnn
Install Pytorch:
pip install torch==2.2.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Install dgl:
pip install dgl -f https://data.dgl.ai/wheels/cu118/repo.html
pip install dglgo -f https://data.dgl.ai/wheels-test/repo.html
after installing all requirments, when I run a job with the binary of lmp, I got this one
PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command
3923427 ss 0 Compute 0% 386MiB 2% 100% 6742MiB lmp -in lmp-m
3928976 ss 0 Compute 0% 386MiB 2% 100% 4725MiB lmp -in lmp-m
As you see, just 2% of mem of GPU is implemented for runnig each job. the question is that how to increase the %mem of GPU for each job? Is it neccessary to modify Pytorch for controlling the %mem of GPU?
Thanks in advanced