Writing a hexdump program seems to be just about the first serious program lots of us do when learning a new language. As an experienced programmer in that language, revisiting the problem can bring many of the language features (and misfeatures, often) into focus. Regardless of the language, high or low level, there's always plenty of room for optimisation.
Presented on this page is my own fairly elementary effort at hexdump. The program itself isn't very interesting - a competent java programmer should be able to throw something better together in half an hour. What is interesting, however, is the degree of improvement possible when one really tries. The first version of the program seems fine, and there's no obvious problems in the code - but see the benchmarks section below to discover how much faster it could be made.
|v1||Jun 13th 1999||source||Initial version|
|v2||Feb 15th 2000||source||Cleanup, optimised|
|v3||Mar 20th 2003||source||Dramatic speedup - make each line a single println call rather than lots of single char prints - now nine times faster than v2.|
|v4||Mar 23rd 2003||source||Added printing of 8 digit offset. Changed output code to store all output in a giant StringBuffer and then print it all in one go. This makes things a little faster.|
|v5||Mar 28th 2003||source||Changed read code to read in a larger block at one time - a buffer of 1K seems to be the sweet spot. Also made output StringBuffer bounded (periodically flushed) so we can handle huge files (which crashed v4), and also makes us another 10% faster. Compared with v3, this version is 20% faster for small files and twice the speed for very large files. v5 is also speed compatable with od -c for small files, and twice its speed for very large files.|
|v5multi||Apr 9th 2003||source||This is a multithreaded version of v5, with seperate threads for reading data from the source stream and for writing out to the output stream. A queue is used to transmit data from one to the other. This wasn't successful - there wasn't any speed improvement.|
Benchmarks for each version of the java Hexdump program are shown, dumping both a reasonably small file, where startup time should predominate, and for a pretty large one, where only raw IO performance should be important. For comparison, the results of the same benchmarks run by the cygwin
version of "od -c" are shown. This is no slight on
od, which is a more general purpose program with lots of features that the java hexdump lacks, and which hasn't been so monomanically engineered to a single benchmark suite. The comparison is intended, however, to show how differing
versions of the java hexdump compare with a well written, sensible C program running on a fairly efficient platform.
The results of running these benchmarks against each revision of java hexdump (and the reference cygwin od) are shown in the following table - times are shown in the following format
program benchmark1 benchmark2 od -c 0:00.5 0:48 hexdump v1 0:06.4 7:20 hexdump v2 0:06.4 7:20 hexdump v3 0:00.9 0:42 hexdump v4 0:00.7 [failed] hexdump v5 0:00.5 0:21
V4 was something of a gamble - it creates a single string buffer for the entire output file in one in-memory buffer. This is great for small files, but suicidal for large ones (in my defense, the original testfile I used for benchmark 2 was a lot smaller). V4 fails on benchmark2 as it can't allocate the RAM necessary for that large file. Sure, I could run the JVM with a bigger heap, but that's only postponing the problem.
On older Java platforms, JVMs typically featured a sizeable delay between being started and actually starting to run the target program. This initialisation gap was mostly due to classloading and quickening of dozens or hundreds of individual classes needed to get even the most elementary Java environment running. This was one of the major reasons why writing unix-style single-function command line utilities in java was a problem (where each invocation would need around ten seconds to get going). Later JDK versions, particularly JDK1.4, start up very quickly.
This also shows that non-synchronised StringBuffers, and economical use of the synchronised
out objects, can make for major improvements in I/O performance. Lastly, it shows that, with modern JVM software, there's no reason why Java software has to be slow.