Course Outline


  • What is OpenCL?
  • OpenCL vs CUDA vs SYCL
  • Overview of OpenCL features and architecture
  • Setting up the Development Environment

Getting Started

  • Creating a new OpenCL project using Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying the output using printf and fprintf


  • Understanding the role of OpenCL API in the host program
  • Using OpenCL API to query device information and capabilities
  • Using OpenCL API to create contexts, command queues, buffers, kernels, and events
  • Using OpenCL API to enqueue commands, such as read, write, copy, map, unmap, execute, and wait
  • Using OpenCL API to handle errors and exceptions

OpenCL C

  • Understanding the role of OpenCL C in the device program
  • Using OpenCL C to write kernels that execute on the device and manipulate data
  • Using OpenCL C data types, qualifiers, operators, and expressions
  • Using OpenCL C built-in functions, such as math, geometric, relational, etc.
  • Using OpenCL C extensions and libraries, such as atomic, image, cl_khr_fp16, etc.

OpenCL Memory Model

  • Understanding the difference between host and device memory models
  • Using OpenCL memory spaces, such as global, local, constant, and private
  • Using OpenCL memory objects, such as buffers, images, and pipes
  • Using OpenCL memory access modes, such as read-only, write-only, read-write, etc.
  • Using OpenCL memory consistency model and synchronization mechanisms

OpenCL Execution Model

  • Understanding the difference between host and device execution models
  • Using OpenCL work-items, work-groups, and ND-ranges to define the parallelism
  • Using OpenCL work-item functions, such as get_global_id, get_local_id, get_group_id, etc.
  • Using OpenCL work-group functions, such as barrier, work_group_reduce, work_group_scan, etc.
  • Using OpenCL device functions, such as get_num_groups, get_global_size, get_local_size, etc.


  • Understanding the common errors and bugs in OpenCL programs
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using CodeXL to debug and analyze OpenCL programs on AMD devices
  • Using Intel VTune to debug and analyze OpenCL programs on Intel devices
  • Using NVIDIA Nsight to debug and analyze OpenCL programs on NVIDIA devices


  • Understanding the factors that affect the performance of OpenCL programs
  • Using OpenCL vector data types and vectorization techniques to improve arithmetic throughput
  • Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality
  • Using OpenCL local memory and local memory functions to optimize memory accesses and bandwidth
  • Using OpenCL profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps


  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors


  • Developers who wish to learn how to use OpenCL to program heterogeneous devices and exploit their parallelism
  • Developers who wish to write portable and scalable code that can run on different platforms and devices
  • Programmers who wish to explore the low-level aspects of heterogeneous programming and optimize their code performance
 28 Hours

Number of participants

Price per participant

Testimonials (2)