| This site is accessible to all versions of every browser. However, this browser may not support basic Web standards, preventing the proper display of this site. If you experience any display problems, please upgrade your browser to a newer, standard compliant, version. |
|
|
|
|||||||||||||||||||||||||||||
|
|
|
|||||||||||||||||||||||||||||
|
english only
|
||||||||||||||||||||||||||||||
A quick evaluation of the machines available at DMA with HINT
Update: This page was written in 1998 and describes the hardware we had at that time. Some of the tests where repeated in 1999 with the new hardware, and the results have lead us to the same conclusions. RoadmapThe hardwareThe benchmark: HINT Results and comparaisons A word about parallelism Applications and libraries Conclusion Past and current hardwareThe hardware reviewed here is a selection of the machines currently available at the Mathematics Department of EPFL, or available for immediate purchase. Most of the comments tend to be a comparaison with the Indigo2, since this is what we are used to work with.
The benchmark: HINTUnlike most "famous" benchmarks (SPECint_9x, SPECfp_9x, LINPACK, Peak FLOPS,...) HINT aims at benchmarking the machines for a wide range of problem sizes. Although it does also return a single number (in QUIPS, QUality Improvements Per Seconds), HINT gives an curve of the performance as a function of time or -- equivalently -- memory usage. Find out more about the ideas behind HINT, and see this nice tutorial on Understanding HINT Graph. The HINT benchmark was run several times on each platform with different data types: INT32, INT64, FLOAT and DOUBLE. Results and comparaisons
A quick look at these figures, taking the Indigo2 as reference, indicates that the new 400 MHz Pentium II performs very well on small integers, but still lacks floating point power. The Sun has a problem with non-optimized 64 bits interger, but does fairly well on floating point. The Alpha chip does not have 32 bits integer but otherwise beats everyone everywhere. Let us now take a closer look at each machine:The PC-PII results
INTEL makes no secret that the Pentium's design aims at the buisness market, and not at big, number crunching, scientific applications. So it is not a surprise to see the best results for 32 bits integer computations. Unfortunately, this data type is too small to allow HINT to test larger problem sizes. The 64 bits integer and 64 bits double curves both show very well the memory structure of the machine: the best performance is achieved until the 16 KB on-chip L1 cache is saturated, but the rythm remains steady until the limit of the 512 KB L2 cache. This cache is not on-chip, but on the same daughter card and is accessed at half the processor chip, 200 MHz in our case. By today's standards, 512 KB of cache is not enough, but this is attenuated by a very efficient main memory and a fast 100 MHz bus. The Indigo2 results
With the Indigo2, we leave the buisness world and get closer to what scientists expect, floating point performance. Again the various memory regimes (32 KB L1, 1 MB L2, 96 main memory) are clearly visible, but the drops between them is bigger. The Ultra 10 results
The integer performance of this machine is suprisingly low, with an obvious problem on 64 bits integers. Interesting to note is the high memory bandwidth on large problems using DOUBLEs. The T0 results
The Alpha chip of the T0 is a true 64 bit chip. It loses against the Pentium II for small integers. The high MFlops advertised can be seen in the left part of the blue curve, but they cannot be sustained when the CPU is not fed from the internal cache. Even a 3 level caching system (8 KB, 96 KB, 1 MB) does not help. Finaly, the larger memory (256 MB) avoided the final drop, where all other systems started to swap on disk. Comparaison of the machines for various data typesThe INT32 results
For small problems based on 32 bits integers, the Pentium II can be up to twice as fast as the MIPS and the Alpha, and four times faster than the Ultra. Again, the Alpha was tested on a wider range because small integers are indeed 64 bits wide. The INT64 results
When longer integers are needed, the Alpha takes the lead again thanks to its 64 bits architecture. One interesting thing to note is that the Indigo2 is better than the PC for small problem sizes, but when main memory starts to play a dominant role, the Pentium II 100 MHz bus and memory are able to provide a higher bandwidth. The FLOAT results
It is interesting to note that as long as the 32bits based FLOAT datatype provides enough precision, the PC, Indigo2 and Ultra 10 give similar performance results despite their various clock frequencies. The DOUBLE results
The DOUBLE datatype is probably the most commonly used. For small, cache resident problem sizes, the T0 remains ahead. For larger problems, the gap between the systems is smaller. The Ultra 10 and the PC both suffer from their small L2 cache, but benefit from better memory bandwidth. A word about parallelismThis study covers exclusively single CPU performance of the machines, although some are "parallel machines": The PC is a Dual-Pentium, and with the SMP support of Linux, it can be seen as a parallel machine, offering multithreading on shared memory as well as message passing (MPI). The Swiss T0 prototype is the first of a suite of parallel machines that aims at providing "Teraflops" performance. But, however efficient the interconnection network is, single CPU efficiency is always a prerequisite. Our results mesured on the Indigo2 were matched with those of an SGI Origin2000 server based on the same CPU, up to the improvement due to the bigger L2 cache (4MB). Although the HINT benchmark can measure them, the parallel performances, including scalability, communication latency and bandwidth, were not addressed at all in this study. Applications and librariesHeavy users of CPU power at the DMA have various needs. Some rely on high level packages (Cplex, Splus, Mathematica...) that may or may not be available on every platform (CPU or OS). Some rely on basic linear algebra (BLAS) and need highly optimized versions of these tools, typically offered by UNIX vendors; the status and degree of optimization of these tools on the various Pentium chips is a key issue for them. Most users develop their own codes in Fortran, C or C++ and need efficient compilers. Traditional UNIX vendors offer solid compilers, optimized for their respective architecture, but Linux usually ships with the generic GNU gcc. More efficient alternatives are available either from 3rd parties compiler vendors or from the upcoming pgcc and egcs projects. ConclusionThis quick review is intended to help people decide whether a cheap PC can replace a workstation, not just as an X terminal but to actually run CPU intensive applications. There is still no easy answer, even if for the first time, the PC outperforms the workstations in some areas. The enhancements due to the new 100 MHz motherboards and memory result in very efficient systems. The current cache size of 512 KB is too small, but future versions of the Pentium II will accept up to 2 MB. |
||||||||||||||||||||||||||||||
|