In the previous post, we have written a simple vector addition OpenCL program. We were compiling the OpenCL kernel program from source code at run-time, thus we have to distribute the OpenCL source code to our users.
However, in some cases, we may prefer to pre-compile the OpenCL kernel program. For example:
- It might take too long to compile the kernel functions.
- Debug the kernel function much earlier.
- We would like to implement the OpenCL program compilation cache.
- There might be some trade secret in the kernel functions.
Fortunately, there are some OpenCL APIs which can make this possible.
First, we have to compile our OpenCL kernel function:
$ ioc64 -cmd=build -input=vec_add.cl -ir=vec_add.bin
Now, we have converted
vec_add.cl into binary executable for
Intel OpenCL SDK.
Second, we have to load the binaries from our host program. Instead of
clCreateProgramWithSource(), we should
clCreateProgramWithBinary() instead. Here's the listing of the
// Create program unsigned char* program_file = NULL; size_t program_size = 0; read_file(&program_file, &program_size, "vec_add.bin"); cl_program program = clCreateProgramWithBinary(ctx, 1, &device, &program_size, (const unsigned char **)&program_file, NULL, &err); err = clBuildProgram(program, 1, &device, NULL, NULL, NULL); free(program_file);
Please notice that there is two subtle differences between
- We have to pass the devices list.
- For each device, we have to specify the corresponding binary.
Although this is troblesome, this is necessary because the pre-compiled binaries are inherently not portable.