Anything about GP-GPU

1 reply [Last post]

What is GP-GPU?
General-purpose computing on graphics processing units (GPGPU, also referred to as GPGP and to a lesser extent GP²) is the technique of using a GPU, which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the CPU. It is made possible by the addition of programmable stages and higher precision arithmetic to the rendering pipelines, which allows software developers to use stream processing on non-graphics data.

Source: Wikipedia

kcpoliran (not verified)
GPU programming concepts

GPU programming concepts
Computational resources
There are a variety of computational resources available on the GPU:

Programmable processors – Vertex, primitive, and fragment pipelines allow programmer to perform kernel on streams of data
Rasterizer – creates fragments and interpolates per-vertex constants such as texture coordinates and color
Texture Unit – read only memory interface
Framebuffer – write only memory interface
In fact, the programmer can substitute a write only texture for output instead of the framebuffer. This is accomplished either through Render-To-Texture (RTT), Render-To-Backbuffer-Copy-To-Texture(RTBCTT), or the more recent stream-out.

Textures as stream
The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.

Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.

Kernels can be thought of as the body of loops. For example, if the programmer were operating on a grid on the CPU they might have code that looked like this:

// Input and output grids have 10000 x 10000 or 100 million elements.

void transform_10k_by_10k_grid(float in[10000][10000], float out[10000][10000])
for(int x = 0; x < 10000; x++)
for(int y = 0; y < 10000; y++)
// The next line is executed 100 million times
out[x][y] = do_some_hard_work( in[x][y] );
On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.

Flow control
In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs.[3] Conditional writes could be accomplished using a properly crafted series of arithmetic/bit operations, but looping and conditional branching were not possible.

Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various techniques, such as static branch resolution, pre-computation, and Z-cull[4] can be used to achieve branching when hardware support does not exist.

Source: Wikipedia