Nvidia / Nvidia Verify / ML Index / Cuda Index
Good thing
Professor A gave me the the model code (my mistake it was linked in her repo :0)
Errors
Running python3 -m run_openpose
and ./build/examples/openpose/openpose.bin --video examples/media/video.avi --face --hand -net_resolution 256x192
Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED
Seen and tried articles
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Verified cuDNN works
- forced to register Nvidia developer program to access sample code
- forced to download Free Image to run test
./mnistCUDNN
- test passed so yay. >:(
Tried downgrading from 11.2 to 11.1
- status still not initialized
Tried running without CUDA
- openpose face and hand examples work
- openpose regular demo vids take 140s -- 8 times slower
- motion reconstruction failed with out-of-memory error (have 24 GB dram :rolling_eyes:)
Tried downgrading all the way to 10.2
- NVCC 10.2 compiler does not support
compute_80
architecture - Fixed by removing
compute_80
andcompute_86
- Removed AMPERE from caffe and openpose
cmake/Cuda.cmake
- set(Caffe_known_gpu_archs "${KEPLER} ${MAXWELL} ${PASCAL} ${VOLTA} ${TURING} ${AMPERE}")
+ set(Caffe_known_gpu_archs "${KEPLER} ${MAXWELL} ${PASCAL} ${VOLTA} ${TURING}")
- CUDA 10.2 incompatible with GCC 9.3 and Clang 9.0
- Fixed by changing to 9
if defined(__GNUC__)
#if __GNUC__ > 9
#error -- unsupported GNU version! gcc versions later than 9 are not supported!
#endif /* __GNUC__ > 9 */
- conclusion: same error.
Takeaways
- Explore all failure paths so there are no more ways to fail
- Nvidia is a scam
- Interesting tip: NVCC runtime API version and Driver API version TWO DIFFERENT THINGS (do not confuse the two)
- Cool to blog. Read my own tutorial :rolling_eyes:
Next steps
- Clean all 10.2 revisions
- Go back to 11.2 and debug initialization error again and again and again