The weighted arithmetic mean calculation gives a performance index, which does not show the distribution of the performance among the micro-benchmarks used. Consider the following example:
A benchmarks suite consists of two micro-benchmarks (X and Y) and the distribution of weights is: 80% of weight is given to benchmark X, and 20% of the weight is given to benchmark Y. Assume that the benchmarks suite is used to compare machine A and B with the following statistics:
Machine A |
Machine B |
|
---|---|---|
Benchmark X |
10 seconds |
30 seconds |
Benchmark Y |
100 seconds |
30 seconds |
Weighted arithmetic mean |
28 |
26 |
From the weighted arithmetic mean, we can conclude that machine B performs better. But is the mean a fair representation of machine A's performance with benchmark X? The problem arises when the workload mix selected by the benchmarks suite and the weights distribution is only a close estimate to the actual usage. Suppose user Z wants to use the performance results for his workload mix, which comprise 95% of programs similar to benchmark X, and only 5% of the programs similar to benchmark Y. The following table shows the problem with the weighted arithmetic mean calculation:
Results with old weights |
Results with new weights |
|||
---|---|---|---|---|
Machine A |
Machine B |
Machine A |
Machine B |
|
Benchmark X |
10 seconds |
30 seconds |
10 seconds |
30 seconds |
Benchmark Y |
100 seconds |
10 seconds |
100 seconds |
10 seconds |
Weighted arithmetic mean |
28 |
26 |
14.5 |
29 |
Since the benchmarks suite uses a weight distribution close to user Z's usage (95:5 verses 80:20), user Z will select machine B based on the results. However, machine A actually performs better for his/her workload, which cannot be concluded from the weighted arithmetic mean results published by the benchmarks suite.
- 1063 reads