Working unbuffered streams

Estimated read time: 2 min

Originally published on October 15th, 2012 (Last updated on July 3rd, 2020)

When work­ing with I/O in Java, you can nor­mal­ly choose from a vari­ety of Stream and Reader or Writer class­es to han­dle all the dirty” work for you. But what hap­pens under the hood? And why is this stuff so error prone?

Being the buffer #

When reading/​writing bina­ry data, for exam­ple from a Socket or a file, you should use a BufferedOutputStream. But what if you couldn’t?

Lets imple­ment a sim­ple bina­ry copy ourselfs:

// This code is intended to be incorrect. DON'T COPY THIS!
byte buffer[] = new byte[1024];
while (input.read(buffer, 0, buffer.length) != -1) {
    output.write(buffer, 0, buffer.length);
}

Can you spot the prob­lem in the snip­pet? It’s sub­tle. What makes it worse is that (under the cor­rect con­di­tions) the result can be per­fect­ly cor­rect. Nev­er the less, this code has a bug!

The full buffer” lie #

The prob­lem with the code above is on line 4, the one that reads output.write(buffer, 0, buffer.length). The code assumes that the buffer is always com­plete­ly filled, which is not nec­es­sar­i­ly the case! The doc­u­men­ta­tion for InputStream.read(byte[], int, int) states:

Reads up to len bytes of data from the input stream into an array of bytes. An attempt is made to read as many as len bytes, but a small­er num­ber may be read. The num­ber of bytes actu­al­ly read is returned as an integer. […]

So, the buffer is not guar­an­teed to be full. This becomes a prob­lem when we use the OutputStream.write(byte[], int, int)-method to write the read bytes to the out­put stream. It’s doc­u­men­ta­tion reads:

Writes [exact­ly] len bytes from the spec­i­fied byte array start­ing at off­set off to this out­put stream. […]

Here, it’s the oth­er way around. When we call the method with a len-para­me­ter of our byte-array size (which is and will always be 1024 in this exam­ple), the method will write exact­ly 1024 bytes.

Now, if the read()-method only read 300 bytes into the buffer and we tell the write()-method to write exact­ly 1024 bytes, the remain­ing 724 bytes will be filled up with null-bytes. Even worse, if we pre­vi­ous­ly read 700 bytes of data into the buffer and the next call to read() only over­wrote the first 300 bytes, the remain­ing 400 bytes from the pre­vi­ous read()-call will be writ­ten out again (along with anoth­er 324 null-bytes). Either case will lead to cor­rupt­ed output.

Doing it right #

So, how do we know how many bytes where read into the buffer? Quot­ing again from the InputStream.read(byte[], int, int)-doc­u­men­ta­tion:

[…] a small­er num­ber may be read. The num­ber of bytes actu­al­ly read is returned as an integer.

In the above exam­ple, we already checked the return-val­ue to see if we where at the end of the stream. Now, we’ll store it and use it as the len-para­me­ter for the write()-method:

byte buffer[] = new byte[1024];
int read_count = 0;
while ((read_count = input.read(buffer, 0, buffer.length)) != -1) {
    output.write(buffer, 0, read_count); // Now writes the correct amount of bytes
}

This will write the exact amount of bytes read from the input-stream, into the output-stream.

Con­clu­sion #

  • Keep track of the amount of bytes read from your input-stream.
  • Check to write the cor­rect amount of bytes to your output-stream.
  • Use the BufferedOutputStream for bina­ry data.
  • Use exist­ing Read­er/Writer imple­men­ta­tions when han­dling String data.

Posted by Lukas Knuth

Comments

No com­ment sec­tion here 😄

You can reach me over at @knuth_dev or send me an Email.