You are here

Performance benchmarks

28 December, 2015 - 11:36

It would be ideal for performance if a computer system were tailored to a specific application. However, computers today are designed for running a wide range of applications. Therefore, it is important for computer designers and users to develop a set of tools that can be used to evaluate and predict performance of applications on various systems. You just learned the most common metrics we use in the science of computers for comparing performance. It is the time for you to learn the tools we use to obtain those metrics. We called these tools the performance benchmarks.

A benchmark is a software program that is used for obtaining the execution performance of a computer system based on a particular workload. The most effective workloads are the real programs that users will run to solve their problems. However, it is not easy to use the performance results of real programs to predict the performance of other applications. Therefore, many different types of performance benchmarks have been developed. Some benchmarks stress computational operations, while others may only focus on measuring the data transfer performance of computer systems.

Much effort has been put into extracting a few key operations that are the lowest common denominators among popular applications to evaluate system performance. Micro-benchmarks (also known as kernels) are small software programs that are being used to measure the performance of individual components or features of computer systems. Results obtained from these micro-benchmarks are often used to estimate the performance of real programs, and to explain the performance differences of real programs run on various computer systems.

Personal computers and servers are used for a variety of applications today. These applications put pressure on various components of the system. Therefore, it is rare that a single micro-benchmark result can be used for estimating the overall performance of systems. A benchmarks suite is a collection of micro-benchmarks that attempt to capture a particular set of workloads. NAS Parallel Benchmarks Suite is one such benchmark suite developed by NASA Advanced Supercomputing division (NAS). The suite contains a set of eight computational fluid dynamics (CFD) kernels, which are used to evaluate the performance of parallel supercomputers. These CFD kernels are representative routines that are used frequently in various fluid dynamics simulations and modelling applications, such as simulating the physical fluid phenomena in weather systems and hypersonic aerospace vehicles.

In addition, we often use weighted average such as weighted arithmetic mean to indicate the relative importance of each application. For example, more weight would be given to graphic rendering benchmarks on systems that are intentionally designed for home entertainment use. Weighted arithmetic means of benchmark results are calculated using the following formula:

where n is the total number of benchmarks in the suite.

Example

The following table is the execution time of a benchmarks suite, which contains the program X and Y executed on computers A and B. Calculate the weighted arithmetic mean if the frequency of program X is 20% and program Y is 80%. Which system performs better?

 

Machine A

Machine B

Program X

20 seconds

40 seconds

Program Y

65 seconds

10 seconds

 

Answer: the weighted arithmetic mean of machine A is (0.2 x 20) + (0.8 x 65) = 56 seconds. The weighted arithmetic mean of machine B is (0.2 x 40) + (0.8 x 10) = 16 seconds. Therefore, machine B performs better overall.

You probably have read about performance results many times in computer magazines and wondered how editors review the performance of newly available systems. The following are some common benchmarks that are used by system vendors and users today to compare the performance of various systems.

Standard Performance Evaluation Corporation (SPEC)

SPEC is a non-profit organization aimed at establishing standards for measuring the performance of high-performance computer systems. SPECint and SPECfp are the two most frequently produced performance results by system vendors. Numerous revisions have been made to the SPEC benchmarks suite and SPEC CPU2000 is a benchmarks suite that is designed to measure the performance of CPU intensive tasks. This includes SPEC CINT2000 and SPEC CFP2000 (measures the performance of floating-point intensive operations).

Linear Algebra Package (LINPACK and LAPACK)

LINPACK and LAPACK are libraries of subroutines that are frequently used by scientific applications to solve systems of linear equations. They are real programs that are often linked to scientific applications to perform various linear algebra functions. They were originally developed to help scientific application writers develop programs that are portable to different computer systems. These subroutines are computation intensive, and some require good memory communication bandwidth. Many system vendors today provide the performance results of these subroutines to compete for scientific application customers. The TOP 500 Supercomputer Sites listing uses the results produced by LINPACK as the basis for performance comparison.

Transaction Processing Performance Council (TPC)

TPC is a non-profit organization that was founded to standardize the performance benchmarks, and the review and monitoring process of performance results for the database transaction processing community. There are four major benchmarks that are in use today, and each of these benchmarks represents a different class of usage of the database system. Since databases perform large quantities of data disk read and write operations, TPC benchmarks are good performance indicators for the I/O subsystem.

Online Resources

Here are some online resources where you can find more information on popular benchmarks used by industrial vendors and reviewers to evaluate system performance:

Additional resources for the benchmarks described in this section: