I work for a company that shall remain unnamed, developing an application with soft real-time requirements. This application helps human operators perform their work more safely because human lives depend on them and it’s crucial that the software works well. Some days ago we had a small problem that called my attention and I’m sharing it. It’s not something we can learn much from, but it’s entertaining.
This software has pieces of code written in Ada and some others written in C and C++. While the parts in Ada have their fair share of problems too, this one belonged to a piece of code that was written in C++. Let’s start with a question about rounding: how would you round a floating point number whose fractional part is exactly 0.5? There are two classic answers to this question.
Some people will argue that you should round those “up”, that is, away from zero. There is a reason for that. When rounding away from zero, you are dividing the interval [X, X+1) (let’s suppose positive numbers) into two smaller intervals of equal size [X.0, X.5) and [X.5, X+1). Each of those intervals has a length of 0.5 units. The first interval is rounded “down” and the second interval is rounded “up”. This is the desired behavior in many applications.
However, there’s another way to look at this problem, from a statistical point of view. People sharing this view will correctly argue that it doesn’t matter if you round the number “up” or “down”, because the rounding will introduce an “error”, that is, a distance between the original number and the rounded value. That error has an absolute value of 0.5 no matter if you round the number “up” or “down”. The distance to both integers is the same. Going further, while the distance is the same in absolute value, it’s not the same in the value itself. For one of them, the error is +0.5 and for the other type of rounding, it’s -0.5. Some applications would like to compensate this, and make half the cases round with an error of +0.5 and the other half with -0.5. This way the rounding doesn’t introduce any bias, and this strategy is better in many other situations. It’s usually implemented by rounding to the nearest even number.
Back to our application, we had to round a number. As this was code in C++, the proposed strategy was to use “rint”. If you can, you should read the man page for “rint” at this point. If you don’t have a Unix system, you can find it with a web search. The decision to choose “rint” was a quick one. If you read its man page, you will see it mentions that it rounds the numbers using the “current rounding direction”. Depending on your system, it may also mention that you should also read the man page for “fesetround”, but it doesn’t matter. The important thing here is that most people, including me (and I was not the one who took the decision to use “rint”), will interpret that text as “rint” rounding either “up” or “down”, whatever the current direction is, but consistently up or down. Either up or down, but always up or always down unless you change the rounding direction. Well, that’s wrong! “rint” rounds to the nearest even number in halfway cases, and to the nearest integer in the rest of the cases. In other words, the second strategy I described above. In C++ and C89 (not C99), you basically get “floor”, “ceil” and “rint”, so “rint” was picked.
That number we had to round was directly going to the user screen. They would see whichever integer number “rint” came up with. After deploying a new version of the software, and being a critical piece of software, we soon got a complaint from users describing that, sometimes, they would detect some weird cases in which the numbers were “always even” and couldn’t explain why. The number originated from a hardware sensor that was supposed to give you more or less continuous readings, so it didn’t make sense that all the numbers appeared to be even from time to time. More specifically, this noticeable issue was not present in the previous software version. We started investigating it.
Fortunately for us, we “soon” saw what was going on. In previous software versions, the sensor readings were sent to the application through a network connection by another application, using an IDL structure. The number was stored in a floating point variable and sent to our software. In the other end, we would read the value and store it in a floating point variable of the same type. However, in the new version, this information traveled in an industry-standard format. This format specified that the number would be transmitted in a field that represented quarters of the given unit. In practice, this meant that we could only receive numbers that were X.0, X.25, X.5 and X.75. As you can imagine, the bias to even numbers can be very noticeable in this situation. Despite the number going directly from the network to the user screen, only changing in the call to “rint”, the software is big and the path is not obvious. Several people were studying the code for several hours verifying that the number was not being modified along the way of the many layers of functions and methods from the point in which you receive it to the point in which you printed it. They concluded that, apparently, the only transformation would be through “rint”, but they couldn’t be 100% sure (the code is complex). The first thing they did was to read the “rint” man page.
Obviously, the “rint” man page doesn’t mention this behavior directly at all. If it mentioned, explicitly, that it rounds to the nearest even number in halfway cases, the game would have been over at that point. I take credit here for being the one who created a 10-lines program that tested what “rint” did, just to be sure it was not the problem. But it was!
I wanted to solve the problem by replacing the call to “rint” by a call to “round”, another standard function. The man page for “round” says that it rounds halfway cases away from zero. In other words, it follows the first strategy I described above. This is what we wanted. X.0 and X.25 would be rounded to X, and X.5 and X.75 would be rounded to X+1. No bias to even numbers and a good decision because most X.5 numbers we would receive would probably come from X.5something numbers that had been truncated to X.5. As I discovered, our C compiler and our C++ compiler didn’t have “round” available. “round” is a standard library function that is available in C99, but is not present in some compilers unless they attempt to cover C99. You can read the man page for “round” to confirm this.
This just goes to show my lack of experience. I had never faced a rounding problem in my life and I had always been using “round” when I needed to round a number because it’s present in my system and it’s the obvious solution. When you wonder how to round a number in C, you type “man round” and its man page pops up. The question still open is how to round a number in C in the way “round” does but without using “round”. Does the lack of such a rounding function in C89 surprise you? It did to me, but the definitive solution was provided by an experienced programmer in the company, that explained to us that he had always rounded numbers adding 0.5 and truncating. It is almost obvious why this works exactly as “round”. Experienced programmers can laugh at me now. That’s right, thank you. Yes, that was the first time I saw the trick. More laughs. Anyway, we ended up taking advantage of the number being received in quarters of a unit and save some casts and floating point operations by rounding it adding two and dividing by four. Got it?