Course Outline


  • What is GPU programming?
  • Why use GPU programming?
  • What are the challenges and trade-offs of GPU programming?
  • What are the frameworks for GPU programming?
  • Choosing the right framework for your application


  • What is OpenCL?
  • What are the advantages and disadvantages of OpenCL?
  • Setting up the development environment for OpenCL
  • Creating a basic OpenCL program that performs vector addition
  • Using OpenCL API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
  • Using OpenCL C language to write kernels that execute on the device and manipulate data
  • Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations
  • Using OpenCL memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
  • Using OpenCL execution model to control the work-items, work-groups, and ND-ranges that define the parallelism
  • Debugging and testing OpenCL programs using tools such as CodeXL
  • Optimizing OpenCL programs using techniques such as coalescing, caching, prefetching, and profiling


  • What is CUDA?
  • What are the advantages and disadvantages of CUDA?
  • Setting up the development environment for CUDA
  • Creating a basic CUDA program that performs vector addition
  • Using CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
  • Using CUDA C/C++ language to write kernels that execute on the device and manipulate data
  • Using CUDA built-in functions, variables, and libraries to perform common tasks and operations
  • Using CUDA memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
  • Using CUDA execution model to control the threads, blocks, and grids that define the parallelism
  • Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
  • Optimizing CUDA programs using techniques such as coalescing, caching, prefetching, and profiling


  • What is ROCm?
  • What are the advantages and disadvantages of ROCm?
  • Setting up the development environment for ROCm
  • Creating a basic ROCm program that performs vector addition
  • Using ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
  • Using ROCm C/C++ language to write kernels that execute on the device and manipulate data
  • Using ROCm built-in functions, variables, and libraries to perform common tasks and operations
  • Using ROCm memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
  • Using ROCm execution model to control the threads, blocks, and grids that define the parallelism
  • Debugging and testing ROCm programs using tools such as ROCm Debugger and ROCm Profiler
  • Optimizing ROCm programs using techniques such as coalescing, caching, prefetching, and profiling


  • Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm
  • Evaluating GPU programs using benchmarks and metrics
  • Learning the best practices and tips for GPU programming
  • Exploring the current and future trends and challenges of GPU programming

Summary and Next Steps


  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors


  • Developers who wish to learn how to use different frameworks for GPU programming and compare their features, performance, and compatibility
  • Developers who wish to write portable and scalable code that can run on different platforms and devices
  • Programmers who wish to explore the trade-offs and challenges of GPU programming and optimization
 28 Hours

Number of participants

Price per participant

Testimonials (2)

Upcoming Courses

Related Categories