Saturday, December 11, 2010

GPGPU High Performance Computing using OpenCL -- Installation and Setup

High Performance Computing (HPC) with General-Purpose GPU or GPGPU is one of the hottest technologies when it comes to massive amount of data processing.
Compared to the general-purpose CPU, GPGPU adapts some innovative new approaches to parallel computing both from software and hardware perspective.

When it comes to program GPGPU, OpenCL is the clear choice because it is the open standard for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other existing or future processors. And almost all major GPU vendors including Nvidia and AMD(ATI) support it, which was manifested by their OpenCL 1.0 driver releases recently.

I have been studying GPGPU for several months. Now I get a real chance to help some scientists process some medical image data. Here are what I have.
  • 64-bit Windows 7 professional;
  • Dell Precision T7500 Tower Workstation. It includes 2 Intel's latest 64-bit Xeon quad-core processors (a total of 8 physical cores. no HyperThreading), 6GB DDR3 ECC main memory, Broadcom NetXtreme 57xx Gigabit NIC, and ATI Firepro 3D V5800 for the display.
  • Nvidia Tesla C2050 GPGPU. It has 448 CUDA cores and 3G GDDR5 ECC global memory. It is Nvidia's flagship product to build a powerful workstation.
1. Install C2050
You can really build a personal supercomputer (or a mini-cluster) right on you desk using T7500 and C2050. Unfortunately T7500 and C2050 came separately and installing C2050 on T7500 is not that straightforward as what the C2050 installation instructions tell you inside its CD package.

With C2050 you can basically configure 2 ways:
  1. Keep the existing ATI Firepro as the display adapter and the new C2050 as a pure computing co-processor (not use it to drive display). 
  2. Use C2050 as both the display adapter and a computing co-processor.
Configuration 1 is my case. I believe many people also want to have this configuration.

The first step in the C2050 installation instructions is to remove current graphics driver. This should only work for configuration 2.
My first attempt strictly followed the instructions. I downloaded and installed the latest Nvidia driver 263.06 and its GPU Computing SDK 3.2. My monitor was still connected to the FirePro adapter. The Windows Device Manager shows a standard VGA adapter and Tesla C2050 adapter both of which are working properly.
However when I ran the oclDeviceQuery OpenCL example in the SDK, C2050 was not recognized.
I am not sure whether it will work if I connect the monitor to C2050 and disable the standard VGA driver. I didn't put any more effort into it because I really want to have configuration 1.

So my next attempt was to restore the ATI Firepro driver. Unfortunately my previous uninstalling did some damage and I couldn't restore the ATI Firepro driver even the installation files still seem to be under c:\drivers\video\R259137\packages\drivers. So I downloaded the latest FirePro V5800 driver package on AMD's website and finally restored.
Now when I tried to run the same oclDeviceQuery, I got the same result. Actually even when you try to run any of the CUDA examples in the SDK, it didn't work neither.

In order to make C2050 recognized, you need to enable it to work in the so called TCC (Telsa Computing Cluster) mode instead of the WDDM mode. Basically the TCC mode enables the Tesla C2050 to work as a computing co-processor along with a display adapter among other functions.
Nvidia's release document for C2050 driver is so confusing at this point. The document mentions TCC is the default mode for C2050 that made me think it would be in TCC mode when I installed it. But it is not in my case.

I tried to run nvidia-smi to enable TCC mode but it complained there was no TCC GPU detected.
In order to enable TCC mode, you have to manually edit the Windows registry by adding a DWORD entry called "AdapterType" and setting its value to 2. This new entry should be under
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ Class\{4D36E968-E325-11CE-BFC1-08002BE10318}\0001 (or 0000 or whatever number that corresponds to your C2050 adapter).
After the above change, you have to reboot. After reboot, when you can run the same oclDeviceQuer, you will see Telsa C2050 is successfully recognized.
Please be noted that because TCC mode disable the graphical functions, none of graphical examples will work.But isn't this what you want? The TCC mode is faster than the WDDM mode for computing and let the display adapter to do graphics.

2. OpenCL Support By Tesla C2050 Driver
Another confusing point in the C2050 driver release document is it didn't mention to support OpenCL. It only says to support CUDA C/C++. Fortunately the testing with its OpenCL examples show it indeed supports OpneCL 1.0 (actually if you examine the files installed by the 263.06 driver package, you can find the opencl.dll library).

3. Install OpenCL SDK
Nvidia's Developer Zone also seem to be confusing. When I just want to setup a computing SDK for GPGPU(not graphics), I was at loss what to download.
Basically you have to download 2 SDKs after you have installed the C2050 Driver (the driver package includes both the low level CUDA driver and OpenCL driver).
  1. NVIDIA GPU Computing Toolkit. It includes both CUDA and OpenCL SDKs.
  2. NVIDIA GPU Computing SDK 3.2. It includes SDSs for CUDA, OpenCL and DirectComput. It also include samples.
For OpenCL development, you only need to install 2.
On Windows 7, you probably have Visual Studio 2010. However the OpenCL example solution in the above 2 is for Visual Studio 2005 and 2008 only. The good news is Visual Studio 2010 can successfully convert the 2008 solution. The only major conversion warning is regarding the targetName is not consistent with the generated executable file name. But this will not cause you any trouble.

1 comment:

  1. I have Visual Studio 2010 and I installed parallel insight and Visual Studio 2008 as well. I can compile and run the example in parallel studio but I can even open the solutions in the SDK (V3.2)... VS 2010 tries to convert them but the conversion fails somewhere in the .vcproj file. As of now I cannot find a way to compile the examples in the CUDA v3.2 SDK with VS 2010.

    CUDA v3.2 with Visual Studio 2010 = does not work