I am using Pytorch with the ‘fastai’ library supporting the deep learning course by the same name. I am using Ubuntu 16.04, a Titan V, and Anaconda 3.6.
Here is what puzzles me:
When I run the following code without any other jobs running, it is significantly slower than when the GPU is running other process. (Specifically, it is under heavy load running crypto mining software.) I have repeated the trials numerous times to make sure that there were no differences in pre-computing or caching taking place. Moreover, I have tested this off and on over several weeks with the same result. I have used nvidia-smi to verifying what jobs are running on the GPU. Here are the times:
I’ve searched around on this fourum and others and haven’t found out what might be taking place. I hope I haven’t made dumb mistake in my observations.
There are a few possible explanations: (1) the CUDA driver is not loaded in persistent mode, so you are benefitting from it being preloaded when another process is running (but this should only be a few seconds of difference); (2) the crypto mining software is maxing out your GPU’s application clocks, which will make your model run faster. You can test for this by setting nvidia-smi -ac yourself to the maximum clock values.
Edward - Thanks very much for your response. I appreciate the advice to search through the various options provided by nvidia-smi. Before trying to change settings, I used nvidia-smi -q to see if I could detect differences. I really couldn’t find anything, but I’m not all that familiar with CUDA and GPU settings. In the hopes that you might notice something, here are the results of running with-no-load and with-load. (The former running deep learning only and the later running both deep learning and crypto mining.)
My GPU is a Titan V
nvidia-smi -i 0 -q
==============NVSMI LOG==============
Timestamp : Sun Jan 28 19:02:35 2018
Driver Version : 387.34
Attached GPUs : 2
GPU 00000000:01:00.0
Product Name : Graphics Device
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0324917146221
GPU UUID : GPU-a4aaecfb-979f-4ff7-c1b9-96acbc39b5fc
Minor Number : 0
VBIOS Version : 88.00.36.00.01
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : 900-1G500-2500-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1D8110DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x121810DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 2000 KB/s
Rx Throughput : 4000 KB/s
Fan Speed : 35 %
Performance State : P2
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
FB Memory Usage
Total : 12057 MiB
Used : 1946 MiB
Free : 10111 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 9 MiB
Free : 247 MiB
Compute Mode : Default
Utilization
Gpu : 2 %
Memory : 2 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 49 C
GPU Shutdown Temp : 100 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 91 C
Memory Current Temp : 45 C
Memory Max Operating Temp : 95 C
Power Readings
Power Management : Supported
Power Draw : 36.18 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 1200 MHz
SM : 1200 MHz
Memory : 850 MHz
Video : 1080 MHz
Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Default Applications Clocks
Graphics : 1200 MHz
Memory : 850 MHz
Max Clocks
Graphics : 1912 MHz
SM : 1912 MHz
Memory : 850 MHz
Video : 1717 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 6246
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 169 MiB
Process ID : 26771
Type : C
Name : /home/cdaniels/anaconda3/envs/fastai/bin/python
Used GPU Memory : 1764 MiB
diff (between *with-no-load* and *with-load*)
4c4
< Timestamp : Sun Jan 28 19:02:35 2018
---
> Timestamp : Sun Jan 28 19:03:33 2018
54,55c54,55
< Tx Throughput : 2000 KB/s
< Rx Throughput : 4000 KB/s
---
> Tx Throughput : 56000 KB/s
> Rx Throughput : 109000 KB/s
69,70c69,70
< Used : 1946 MiB
< Free : 10111 MiB
---
> Used : 4714 MiB
> Free : 7343 MiB
73,74c73,74
< Used : 9 MiB
< Free : 247 MiB
---
> Used : 13 MiB
> Free : 243 MiB
77,78c77,78
< Gpu : 2 %
< Memory : 2 %
---
> Gpu : 100 %
> Memory : 100 %
132c132
< GPU Current Temp : 49 C
---
> GPU Current Temp : 53 C
136c136
< Memory Current Temp : 45 C
---
> Memory Current Temp : 60 C
140c140
< Power Draw : 36.18 W
---
> Power Draw : 110.65 W
175a176,179
> Process ID : 26975
> Type : C
> Name : ./ethminer
> Used GPU Memory : 2752 MiB
I tried these commands and the changes were successfully executed with appropriate response from nvidia-smi. There was, however, no change in performance:
with-no-load (deep learning only): 42 seconds
with-load (deep learning AND crytomining): 17 seconds
Is it possible that the GPU needs to be operating at a minimum speed before it shifts into a higher speed?
I keep doing this and keep getting the same results. Why wouldn’t I be getting better performance running less on the GPU? Seems weird that I have to spin up a seemingly unrelated gpu intensive process (crypto mining), before getting peak performance from my deep learning models.
There must be something in the crypto mining code to boost performance or something in the fast.ai library that is limiting performance. Is there standard Pytorch benchmarking code I might use to narrow this down?