News & Analysis

Use your own benchmarks for Java execution time

Vincent Perrier, Java Applications, Product Manager, Wind River Systems Inc., Oakland, Calif.

4/1/2002 7:58 AM EST

Use your own benchmarks for Java execution time

When considering a benchmark to determine the overall performance of a Java application, the impact of byte code execution, graphics and native code execution varies depending on the nature of the specific application: what the application does, how much of it is byte code vs. native code and how much use it makes of graphics. How well a Java virtual machine (JVM) will perform for a given application depends on how the unique mix of these three functional areas maps onto its capabilities.

Given these variables, the best way to benchmark a JVM is against your own application. Since that's not possible before the application has been written, you must find those benchmarks that are most relevant to the application you intend to write.

Sorting through Java benchmarks to find the ones that are relevant for embedded applications can be confusing. SpecJVM98, for example, provides a relatively complete set of benchmarks that test diverse aspects of the JVM. Sounds good — but Spec-JVM-98 runs in a client/server environment and requires a minimum of 48 Mbytes of RAM on the client side for the JVM. That excludes it from any relevance to most embedded applications. In addition, the benchmark can't be used with precompiled classes.

Other benchmarks have different pitfalls. VolanoMark, for example, is a chat server implementation and is therefore relevant only for benchmarking applications with the same set of requirements as chat servers. The JMark, meanwhile, assumes that the application includes the applet viewer and a full implementation of Java's Abstract Windowing Toolkit (AWT). This benchmark can be irrelevant for the many embedded applications that have no graphics, or limited graphics that don't require full AWT support, such as devices running a PersonalJava minimal-AWT implementation.

Embedded CaffeineMark (ECM), the embedded version of the CaffeineMark benchmark from Pendragon Software (it has no graphics tests), is easy to run on any embedded JVM, since it requires support for basic Java core classes only, and it doesn't require a large amount of memory. More important, there's a high correlation between good scores on this benchmark and improved byte code performance in embedded applications.

To get the most meaningful results from ECM, however, you must use exactly the same hardware when testing different JVMs. You must also pay attention to implementation differences among the JVMs you're testing. If, for example, you're comparing a JVM with a just-in-time compiler against a JVM without one, it's important to run the JVM that has the JIT with the "java -nojit" option on the command line to ensure an apples-to-apples comparison.

ECM will typically make any JVM using compilation look good, no matter the type of compilation, because it includes a very small set of classes and always repeats the same small set of instructions. Dynamic compilers just cache the complete translation of the Java code in RAM and execute next iterations of the tests in native code. Ahead-of-time compilers can easily optimize the loops and algorithms used in ECM, too.

To address such variability, the EEMBC consortium (www.eembc.org) is currently working with industry leaders to define and implement a realistic embedded Java benchmark that should cover all aspects of Java performance.

Existing benchmarks also may not measure other aspects of your application code. Tuning Java applications to meet performance goals may require addressing many program functions besides byte code execution. Some of those functions — for example, thread management, synchronization, method-to-method calls, class resolution, object allocation and heap management (including garbage collection), calls to native methods, byte code verification and exception handling — occur within the JVM.

Because few if any benchmarks address such functions, it falls to the designer to conduct an in-depth study of a JVM's internals to understand how its design may affect crucial aspects of your application.

Writing special programs that exercise critical aspects of a JVM can help you evaluate it for the application. If, for example, your application uses a heavy mix of Java and C code, you can benefit by writing a program that tests native method call performance. Other functions, including native code execution and such factors as network latency, may occur outside the JVM.

What if your application includes graphics? To start, there are two major factors that affect graphics performance in Java applications: Does the application's graphics display driver use graphics coprocessor hardware acceleration? Is the application configured with a lightweight (faster) or a heavyweight (slower) implementation of the Abstract Windowing Toolkit? In addition, like any other high-level Java service, graphics performance is affected by the way that the graphics services integrate with lower-level native libraries.

And you need to consider the performance of your CPU. To help identify CPU-bound performance, you should supplement simple benchmarks by running real-world applications that exercise large amounts of different, complex Java code. Such test code must meet a number of requirements: It should contain a large number of classes that reflect an estimate of the real application — 20-plus is a good ballpark.

It must also be large — thousands of lines, at least — and have no file system access and no graphics. Some existing programs meet all those criteria. The GNU regular expression package, regexp, for example, comprises about 3,000 lines of code and more than 21 classes, providing a large number of expressions to parse and match. Another program, the Bean Shell interpreter, is a simple prime-number sieve that has 70 classes and several thousand lines of code. JavaCodeCompact, Sun Microsystems Inc.'s PersonalJava ROM-izing tool, also would make a good test program.

Running these programs as test cases illustrates the wide variance in the meaning of benchmark scores. For example, a JVM using a JIT compiler may run Embedded CaffeineMark up to 30 times faster than when the "nojit" option is turned on (thus running in pure interpretation mode), but the same JVM runs the Bean Shell and "regexp" tests only about one and a half times faster when using the JIT compiler.

The difference in results clearly demonstrates that high benchmark scores may not translate into a commensurate level of performance improvement in real-world applications. While they do suffer from the limitations discussed above, SpecJVM98 and JMark yield results that most closely approximate those for real-world applications.





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form