Working unbuffered streams

October 15, 2012 »common-mistakes

When working with I/O in Java, you can normally choose from a variety of Stream and Reader classes to handle all the “dirty” work for you. But sometimes, for example when dealing with binary data, you have to get your hands dirty and do it yourself.

Being the buffer

When reading/writing binary data, for example from a Socket or a file on the filesystem, you have to handle the buffering yourself. Of course, this presents the opportunity to introduce errors into your code.

Can you spot the problem with the following snippet?

// This code is intended to be incorrect. DON'T COPY THIS!
byte buffer[] = new byte[1024];
while (input.read(buffer, 0, buffer.length) != -1) {
    output.write(buffer, 0, buffer.length);
}

Don’t feel bad if you can’t spot the problem, most people don’t see it either, until the written files turn out to be corrupted. So, why is that?

The “full buffer” lie

If I told you that the problem with the above code was in line 4, the one that reads output.write(buffer, 0, buffer.length), would you find it? Probably not.

The problem is, that this code assumes that the buffer is entirely full, which is not necessarily the case.The documentation for InputStream.read(byte[], int, int) states:

Reads up to len bytes of data from the input stream into an array of bytes. An attempt is made to read as many as len bytes, but a smaller number may be read. The number of bytes actually read is returned as an integer. […]

So, the buffer is not guaranteed to be full. This becomes a problem when we use the OutputStream.write(byte[], int, int)-method to write the read bytes to the output stream. It’s documentation reads:

Writes [exactly] len bytes from the specified byte array starting at offset off to this output stream. […]

Here, it’s the other way around. When we call the method with a len-parameter of our byte-array size (which is and will always be 1024 in this example), the method will write exactly 1024 bytes.

Now, if the read()-method only read 300 bytes into the buffer (which is possible) and we tell the write()-method to write exactly 1024 bytes, the remaining 724 bytes will be filled up with null-bytes. This will then corrupt the output file.

Doing it right

So, what can we do about this? Turns out, it’s a rather easy problem to solve. Quoting again from the InputStream.read(byte[], int, int)-documentation:

[…] a smaller number may be read. The number of bytes actually read is returned as an integer.

In the above example, we already checked that very integer to see, if we where at the end of the stream. Now, we’ll store it and use it as the len-parameter for the write()-method:

byte buffer[] = new byte[1024];
int read_count = 0;
while ((read_count = input.read(buffer, 0, buffer.length)) != -1) {
    output.write(buffer, 0, read_count); // Now writes the correct amount of bytes
}

This will write the exact amount of bytes read from the input-stream, into the output-stream.

Conclusion

  • Keep track of the amount of bytes read from your input-stream.
  • Check to write the correct amount of bytes to your output-stream.
  • Use existing Reader/Writer implementations, if possible.

Posted by Lukas Knuth

Comments

comments powered by Disqus