Graphic Processing Units (GPU)

Overview
1. A graphics processing unit (GPU) is similar to a set of vector processors sharing hardware. The multiple SIMD processors in a GPU act as independent MIMD cores, like vector computers have multiple vector processors. The main difference is multithreading, which is fundamental to GPU. This feature is missing on most vector processors.
2. Set of vector processors
3. Multiple SIMD processors
  1. Act like independend MIMD
4. Multithreading
Programming for the GPU
1. Compute Unified Device Architecture (CUDA)
  1. It is a C-like programming language developed by NVIDIA used to program for its GPUs. CUDA generates C/C++ code for the system processor (named host), and a C/C++ dialect for the GPU (named device). In this setup system, the processor is known as the “host”, and the GPU as the “device”.
  2. Characteristics
    1. Developed by NVIDIA
    2. C-like programming language
    3. Setup System
      1. Host
        System processor
        C/C++ code
      2. Device
        GPU
        C/C++ dialect
  3. CUDA thread
    1. Lowest level of parallelism
    2. Single instruction, Multiple Threads (SIMT)
  4. Thread block
    1. Threads are executed together in blocks
  5. Multithreaded SIMD
    1. It is the hardware that executes a whole block of threads.
  6. Modifiers
    1. Function modifiers
      1. The CUDA functions can have different modifiers such as device, global or host.
      2. __device__
        Executed in the device, launched by the device.
      3. __global__
        Executed in the device, launched by the host.
      4. __host__
        Executed in the host, launched by the host.
    2. Variable Modifiers
      1. __device__
        A variable declared with this modifier is allocated to the GPU memory, and accessible by all multithreaded SIMD processors
      2. The CUDA variables have also some modifiers such as the device.
  7. CUDA specific terms
  8. Code examples
    1. Ex. Y = a*X + Y
      1. Conventional C code
      2. CUDA corresponding version
        This code launches n threads, once per vector element, with 256 threads per thread block in a multithread SIMD processor. The GPU function begins by computing the corresponding element index i based on the block ID, number of threads per block, and the thread ID. The operation of multiplication and addition is performed as long as the index i is within the array.
    2. Ex. A = B * C
      1. Multiply 2 vectors with 8192 elements each
      2. Grid (Vectorized loop)
        GPU code that works on the whole 8192 elements multiply is called grid.
        A grid is composed of thread blocks (body of vectorized loops)
        in this case, each thread block with up to 512 elements (16 threads/block x 32 elements/thread), i.e., 16 threads per block
        SIMD instruction executes 32 elements at a time
2. Open Computing Language (Open-CL)
  1. The Open Computing Language (Open-CL) is a CUDA-similar programming language, in a general and rough sense. Several companies are developing OpenCL to offer a vendor-independent language for multiple platforms, in contrast to CUDA.
  2. Vendor independent
  3. Multiple Platforms
3. Extended function call
  2. Components
    1. dimGrid
      1. Specifies the dimensions of the code, in terms of thread blocks
    2. dimBlock
      1. Specifies the dimensions of a block, in terms of threads.
    3. Parameter list
      1. blockIdx
        It is the identifier/index for blocks.
      2. threadIdx
        It is the identifier/index of the current thread within its block
      3. blockDim
        It stands for the number of threads in a block, which comes from the dimBlock parameter.

Next up

Graphic Processing Units (GPU)

Description

Resource summary

Media attachments

0 comments

Similar

	Created by Artur Assis over 2 years ago