Tags

, , , , , , , , , , ,

When it comes to GPGPU compute on Linux, several choices exist:

1. OpenCL:  This is the Open Compute Language, an open standard for parallel programming on heterogenous systems, currently maintained by the Khronos Group. Its’ advantage over other parallel programming approaches such as CUDA is that its’ vendor-neutral, and as such, will work on any GPU and ASIC that implements the standard. OpenCL code also scales well on x86 Multi-core CPUs (Currently, OpenCL 1.2 needs an x86 CPU with support for SSE3 and SSSE3,x and beyond to run on CPU) are also well supported. In the same voice, vendor neutrality also means that other architectures, such as ARM, are also supported, and as such, do not be surprised to see ARM powered boards crunching OpenCL code with great speed and power efficiency. The futture is going parallel, where heterogenous compute environments will co-exist on the same platform, e.g a standard CPU and (a) (series of) GPU(s) and/or ASICs communicating together to accomplish a parallel compute load in the same device.

2. NVIDIA CUDA: NVIDIA’s Compute Unified Device Architecture (CUDA) is a proprietary parallel compute platform and architecture created by NVIDIA , currently implemented on their GPUs. CUDA allocates direct access to the virtual instruction sets and memory of the parallel compute elements in the CUDA-enabled GPUs. Since CUDA is a proprietary, vendor-locked parallel compute infrastructure, this article will NOT cover CUDA beyond this description, and now, we’ll delve into OpenCL, the open compute standard that is vendor-neutral. CUDA will be covered in an article of its’ own later on.

TODAY’S TOPIC: OpenCL on Arch Linux with AMD GPUs and APUs.

To get up and running with OpenCL on Arch Linux with an AMD GPU/APU and such combinations (as may apply in your case), here are the requirements:

1. An Arch Linux installation.

2. An x86-64 Intel or AMD CPU with support for SSE3 and above.

3. An Evergreen (HD 5000+ series) AMD GPU and above. Legacy GPUs that are no longer supported by AMD Catalyst’s mainline driver will NOT be covered here.

4. AMD APP SDK v. 2.8.x and above (From AUR: https://aur.archlinux.org/packages/amdapp-sdk/) AND amdapp-aparapi (https://aur.archlinux.org/packages/amdapp-aparapi/) which is a pre-requisite for AMDAPP’s installation.

5. AMD Catalyst: Install the catalyst-total package from AUR: https://aur.archlinux.org/packages/catalyst-total/ . This package builds everything you’ll ever need, including the lib32 dependencies and stuff. Note: If you’re on an AMD System with PowerExpress (AMD GPU Switching between AMD Discrete GPUs and AMD Integrated GPus such as the Dual GPU setup in some APUs and Hybrid Crossfire), please install the AMD Catalyst-pxp-total package from AUR: https://aur.archlinux.org/packages/catalyst-total-pxp/

6. The ability to follow instructions here and on the Arch Wiki. If you’re not familiar with the Arch Way, read it here: https://wiki.archlinux.org/index.php/The_Arch_Way and get familiar with it from here henceforth.

THE HEART OF THE MATTER

Note:

Before we dive in, I assume that:

1. You have the hardware set up as above and Arch is already installed on the system.

2. You have the base-devel group installed. If not, do so in an elevated terminal:

pacman -S base-devel

3. Edit the makepkg.conf file under /etc/makepkg and ensure it resembles the following under the Architecture and Compile Flags Section:

#########################################################################
# ARCHITECTURE, COMPILE FLAGS
#########################################################################
#
CARCH=”x86_64″
CHOST=”x86_64-unknown-linux-gnu”

#– Compiler and Linker Flags
# -march (or -mcpu) builds exclusively for an architecture
# -mtune optimizes for an architecture, but builds for whole processor family
CPPFLAGS=”-D_FORTIFY_SOURCE=2″
CFLAGS=”-march=x86-64 -mtune=native -O2 -pipe -fstack-protector –param=ssp-buffer-size=4″
CXXFLAGS=”-march=x86-64 -mtune=native -O2 -pipe -fstack-protector –param=ssp-buffer-size=4″
LDFLAGS=”-Wl,-O1,–sort-common,–as-needed,-z,relro”
#– Make Flags: change this for DistCC/SMP systems
MAKEFLAGS=”-j4″

Take note of the MAKEFLAGS = “jn” section. The value of n should be the number of cores on your system. Get this num,ber by running:

cat /proc/cpuinfo | grep processor -wc -I

In my case, it returns 4 because I’m on  a Dual Core system with Hyperthreading. Two cores, 4 logical CPUs. Adjust as appropriate.

Also, note that the mtune-generic changes to mtune=native under the ARCHITECTURE AND COMPILE FLAGS. This is so as to generate code that performs optimally on your CPU, and all code generated by makepkg calling up GCC will utilize all the instruction sets your CPU has to offer.

The second edit is under the #BUILD ENVIRONMENT.

Go to the BUILDENV declarative and remove the ! infront of CCACHE, so that the declarative will look like:

BUILDENV=(fakeroot !distcc color ccache check !sign)

Save and close the /etc/makepkg.conf and folloe up by installing ccache.

sudo pacman -S ccache

This will help speed up consequent builds in the future that rely on the same code since ccache, as the name implies, caches built code objects and consequent rebuilds of the same project benefit by having the same work units skipped if no changes are detected in the source files.  Thats’ a lot of win right there.

3. Lets’ get to installing catalyst-total ( or catalyst-total-pxp) from AUR. Download the pkgbuild’s tarball and extract it somewhere on your filesystem,. e.g a folder called pkgbuilds under your home directory. Once extracted, it will create a parent folder with the name of the pkgbuild, e.g in the case of amd-catalyst-total, the dir structure (assuming pkgbuild was the extraction directory) is:

cd ~/pkgbuilds/catalyst-total

Listing the directory contents reveals its’ contents, including patches. Note that with AUR, we do NOT package and redistribute binaries of any sort.

To build the package, run:

makepkg -c -s PKGBUILD

The -c option tells makepkg to clean up after itself, and the -s option tells makepikg to automatically satisfy any missing dependencies via Pacman so you won’t have to hunt them down yourself manually. As such, you may be prompted for your password as pacman is launched to install any missing dependencies.

Pro-tip: If a dependency is NOT found by AUR, and cannot be installed, it simply means the package in question resides on the AUR, and as such, you’ll have to build it from source before you install it.

As the package builds, it will output verbose info on the terminal and even offer guides to enabling critical services such as the DKMS Module helper included in the package. Follow the instructions on-screen and you’ll be sorted.

When the process is completed, go on and install the generated pkg.tar.xz package with pacman:

pacman -U *.pkg.tar.xz

Note that we use the -U flag to denote an “update” from the local filesystem. If issues occur (such as being prompted to remove the opensource driver and stuff) , go on and obey. The two cannot co-exist on the system.

After install, reboot the system in emergency/recovery mode (boot up the recovery kernel) and run:

Xorg –configure

WARNING: Do NOT run aticonfig –initial as its’ syntax is broken and the resultant Xorg.conf file will be broken.( To be exact, it breaks the way the PCI device is called/named up). The generated file will look as such, with small variances depending on your monitor setup:

cat /etc/X11/xorg.conf
Section “ServerLayout”
Identifier     “aticonfig Layout”
Screen      0  “aticonfig-Screen[0]-0″ 0 0
EndSection

Section “Module”
EndSection

Section “Monitor”
Identifier   “aticonfig-Monitor[0]-0″
Option        “VendorName” “ATI Proprietary Driver”
Option        “ModelName” “Generic Autodetecting Monitor”
Option        “DPMS” “true”
EndSection

Section “Device”
Identifier  “aticonfig-Device[0]-0″
Driver      “fglrx”
BusID       “PCI:1:0:0″
EndSection

Section “Screen”
Identifier “aticonfig-Screen[0]-0″
Device     “aticonfig-Device[0]-0″
Monitor    “aticonfig-Monitor[0]-0″
DefaultDepth     24
SubSection “Display”
Viewport   0 0
Depth     24
EndSubSection
EndSection

Now, reboot the system from recovery:

reboot -f

4. Once booted up successfully to the fancy, shiny desktop with Catalyst installed successfully, we’ll go into building and installing the AMDAPP SDK.

Remember, as stated in the pre-requisite(s), we’ll need amdapp-aparapi from AUR (https://aur.archlinux.org/packages/amdapp-aparapi/) before we can build amdapp-sdk. This is expected. Download and extract the archive. To set up amdapp-aparapi, do:

makepkg -c -s PKGBUILD

In the same directory where amdapp-aparapi was extracted.  This will build the package and install its’ dependencies for you in one go. Once done, install amdapp-aparapi with:

sudo pacman -U *.pkg.tar.xz

When done, go on and download the amdapp-sdk PKGBUILD archive from the AUR: https://aur.archlinux.org/packages/amdapp-sdk/

Extract the archive first, then download the AMD APP SDK from AMD’s Website into THAT same directory. Link to AMD APP SDK:  http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/downloads/ , and select v.2.9 x64.

When done, run:

makepkg -c -s PKGBUILD

The reason we had to download it manually was because AMD’s website is NOT wget-friendly sincve it requites an auth-token generated after accepting the EULA presented after clicking on the download SDK link.

When the process is complete, run:

sudo pacman -U *.pkg.tar.xz

To install the AMDAPP-SDK on your system.

5. Post-Install instructions: You’ll ned to install opencl12-headers package:

sudo pacman -S opencl-headers

When done, go into the same folder and extract the amdapp-sdk source archive from AMD. Extract the generated archive ending in lnx64 and browse into the lnx64-generated directory.

cd AMD-APP-SDK-v2.9-RC-lnx64

cd into the include directory:

cd include

Now copy ALL the content here to /usr/include:

sudo cp -avr *.* /usr/include

Once this is done, you’ll have four new directories under /usr/include:

/usr/include/GL

/usr/include/CL

/usr/include/OpenVideo

/usr/include/SDKUtil

Ensure that they exist.

The reason we copy these directories is so that packages that need to build GL and OpenCL code can locate these headers easily without fiddling with CFLAGS to add custom paths, which only complicate things.

Remember, in the Arch way, one of the things we aim for is SIMPLICITY. KISS Simplicity, that simple.

6. Optional:

If you develop OpenCL code AND you’re looking for a great and efficient debugger and profiler, I highly recommend AMDAPP-CodeXL, available on the AUR:

https://aur.archlinux.org/packages/amdapp-codexl/

Download the PKGBUILD archive, extract it and run:

makepkg -s -c PKGBUILD

In the same directory to build amdapp-codexl.

When done, run:

sudo pacman -U *.pkg.tar.xz

To install amdapp-codexl on your Arch system.

Now that you’re done, and so far, you have a kick-ass Arch Linux box serving up as an OpenCL development workstation, let us enjoy the fruit of our labor and install apps that leverage OpenCL on Linux.

Till next time, and at your service,

Dennis Mungai.

 

 

 

 

 

 

 

 

 

 

 

 

About these ads