Benjamin Eitzen - GpuPy: accelerating numpy with a GPU - graphics processing unit: provide hardware acceleration used in OpenGL or DirectX - GPU structure - vertex shaders (transforms): 8 on GeForce 7800 GTX - fragment shaders (most floating point): 24 on GeForce 7800 GTX - GPU strengths - highly parallel, - native trigonometry - rapid CPU update cycle - GPU weakness - single-precision floats - data to be copied to GPU memory - programming conceptually difficult - programmability - newer GPUs can execute programs written in high-level languages - "shaders" written in Cg, GLSL, HLSL - execute a program once for each pixel - coordinates index an element of an array - support one-to-four D vectors of floating points - GpuPy extends NumPy so that it can take advantage of a GPU - overrides default implementations of Float32 operations - GpuPy can be transparent - no doubles - speed - simple operations aren't faster than software (copying overhead) - some (pow, arccosh) get faster for large enough arrays - trigs are much faster - reductions can be performed in parallel - more complex operations - correlation, convolution - FFTs - C library gets arrays, compiles (and caches) programs, executes them - future work - minimize copying by avoid transfer of intermediate values - basically, lazy evaluation - http://eecs.wsu.edu/~eitzenb/gpupy