OpenPose Trial and Error: Day 3

Nvidia / Nvidia Verify / ML Index / Cuda Index

Good thing

Professor A gave me the the model code (my mistake it was linked in her repo :0)

Errors

Running python3 -m run_openpose and ./build/examples/openpose/openpose.bin --video examples/media/video.avi --face --hand -net_resolution 256x192

Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED

Seen and tried articles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Verified cuDNN works

  • forced to register Nvidia developer program to access sample code
  • forced to download Free Image to run test ./mnistCUDNN
  • test passed so yay. >:(

Tried downgrading from 11.2 to 11.1

  • status still not initialized

Tried running without CUDA

  • openpose face and hand examples work
  • openpose regular demo vids take 140s -- 8 times slower
  • motion reconstruction failed with out-of-memory error (have 24 GB dram :rolling_eyes:)

Tried downgrading all the way to 10.2

  • NVCC 10.2 compiler does not support compute_80 architecture
  • Fixed by removing compute_80 and compute_86
  • Removed AMPERE from caffe and openpose cmake/Cuda.cmake
-  set(Caffe_known_gpu_archs "${KEPLER} ${MAXWELL} ${PASCAL} ${VOLTA} ${TURING} ${AMPERE}")
+  set(Caffe_known_gpu_archs "${KEPLER} ${MAXWELL} ${PASCAL} ${VOLTA} ${TURING}")
  • CUDA 10.2 incompatible with GCC 9.3 and Clang 9.0
  • Fixed by changing to 9
if defined(__GNUC__)

#if __GNUC__ > 9

#error -- unsupported GNU version! gcc versions later than 9 are not supported!

#endif /* __GNUC__ > 9 */
  • conclusion: same error.

Takeaways

  • Explore all failure paths so there are no more ways to fail
  • Nvidia is a scam
  • Interesting tip: NVCC runtime API version and Driver API version TWO DIFFERENT THINGS (do not confuse the two)
  • Cool to blog. Read my own tutorial :rolling_eyes:

Next steps

  • Clean all 10.2 revisions
  • Go back to 11.2 and debug initialization error again and again and again