|
Altivec Fractal Carbon Demonstration Software
High-performance parallel software utilizing:
AltiVec, a.k.a. the Velocity Engine |
Multiprocessing ("MP") |
TCP/IP, via MacMPI_X |
Carbon (OS 9 &
X) |
Single Node Performance of the AltiVec Fractal Carbon Demo
4-Node Parallel Performance of the AltiVec Fractal Carbon Demo using Pooch
|
|
AltiVec Fractal
Carbon v1.3 (116 kB) - A numerically-intensive parallel graphics application that
uses the Velocity Engine, a.k.a. AltiVec, multiple processors (MP), and cluster computing (via MPI)
for its computations.
This code will also compute correctly on single processor machines and non-G4/G5's.
This code uses the MacMPI_X.c library for communcations and requires Mac OS 10.2 or later, or
Mac OS 8.5 or later
with CarbonLib 1.2 or later,
and Pooch.
New in version 1.3:
This app's parallelization organization has been rewritten to be optimized for large clusters of PowerPC G5s,
enabling it to achieve over 1.21 TeraFlop!
This code has achieved:
Your results may vary. Click on the above links for benchmark
information.
For comparison, AltiVec Fractal Carbon v1.2 is also available.
Note: The name of this program has been depcrecated.
As a Universal Application, its new name is
Power Fractal,
where you will be redirected in 60 seconds.
|
To run this app on a single Mac running OS 9 or X, no
additional software is necessary.
To run this in parallel on OS 9 and X, you will need
Pooch.
Version 1.1 includes automated "Computational Grid" launching
on a Mac cluster running version
1.1 of Pooch.
See
the
Pooch Quick Start for instructions on configuring your Macs for parallel computing.
For information about writing your own parallel applications, see
the Pooch Software Development Kit and
the Compiling MPI page.
Behind the Scenes What the
AltiVec Fractal source code looks like
The AltiVec Fractal Carbon demo contains code for both the z -> z^2 + c
and z -> z^4 + c Mandelbrot-style iteration methods. Although the z^4 case is
selected by default, the z^2 case is shown below for simplicity.
The innermost loop of the iteration for the regular FPU case is:
//12 flops per iteration (counted in the assembly language)
{ float tr; count++; tr=ca+za*za-zb*zb;
zb=cb+zb*za+zb*za; za=tr;
tr=tr*tr+zb*zb; if
(tr>sqrmaxrad) { } }
|
To rewrite the above code for AltiVec, the multiplications, additions, and
subtractions must be reexpressed in AltiVec macros, such as vec_madd and
vec_nmsub. The comparison operations must be performed in vector form as well,
here using vec_cmpgt, vec_and, and vec_any_ne. Note that there is no such thing
as a simple floating-point multiply in AltiVec; you must add to something
after the multiply. Thus an add to zero is required to complete the iteration
loop.
The corresponding loop, converted to AltiVec is:
//13 flops - 1 add to zero = 12 flops per iteration {
vector float tr; count++;
tr=vec_madd(za,za,vec_nmsub(zb,zb,ca));
zb=vec_madd(zb,za,vec_madd(zb,za,cb)); za=tr;
tr=vec_madd(zb,zb,vec_madd(tr,tr,zero)); { vector bool int
exitMask=vec_and(vec_cmpgt(tr, sqrmaxrad),loopMask); if
(vec_any_ne(exitMask,zeroInt)) { } } }
|
This is the primary difference between the AltiVec and non-AltiVec version of
the z^2 code. The remaining differences involve preparing the vector variables
with the appropriate data prior to the iteration loop and saving the final data
out into a regular array of floating-point numbers for later conversion to pixel
data.
For a slightly more complex example of vectorized code, compare sections of
the DoSqueezeToLetterboxEffect()
routine in the
source code of
the iMovie Squeeze to Letterbox plug-in on
the Other Software page.
|
|