This will enable quda to query the warp sizes of different devices from different vendors. Right now, on the AMD side we're returning 64 without prejudice. On the CUDA side we're returning 32 without prejudice.
The macro-guarded approach breaks for host-usage of this API because the #else branch is erroneously taken.