503 Service Unavailable

2010-12-19

Frustration with Yahoo! and RFC 5965

Filed under: Programming — rg3 @ 14:05

In the past, I mentioned several times that I don’t receive spam. It’s not completely true, but it’s very true. My normal level of spam messages is about one each month. I have achieved this by using approaches like Yahoo! AddressGuard, and translating that same scheme to GMail when I moved to GMail. My e-mail address is publicly accessible by anyone and exposed in this blog (“Contact me” in the right column). If you take a look at the source of that page, you’ll notice how I wrote it so you can copy-paste the address to your e-mail client while keeping spammers at bay.

When one of my addresses is compromised, I can change it right away, yet I prefer not to change them so often to avoid annoying people wanting to contact me from time to time. For this reason, when I receive a spam message to one of my accounts, the first thing I do is reporting it. If I received 100 spam messages a day, I couldn’t do this, but as I receive one a month, I don’t mind spending 5 minutes reporting the message. Only if the spam doesn’t stop and apparently increases, I change that address.

Reporting a message is quite straightforward, but I don’t have it automated. I could, but I haven’t bothered yet. Basically, I view the source of the message and look for “Received from” headers. I find the first one, in chronological order, that appears to be a valid public SMTP server that people should trust. Then, I run “whois” on that IP address and find the ISP or organization owning that network block, and report the message to the abuse address they provide as part of the “whois” reply. If they don’t provide an abuse address, usually I send it to the technical contact that appears in the “whois” reply, and also to the “abuse@” address of the company’s main domain, just in case it actually exists and is being read.

In my e-mail client I have a template to report spam. I fire a new message from the template, fill the “To” field with the addresses just mentioned and copy-paste the full spam message source at the end of my message, which consists of a very brief message to the person that could be reading it, saying I received a spam message apparently coming from their network block. As I said, this takes 5 minutes and could be automated.

Sometimes, the spam message comes from a Yahoo! account, using their servers, and I follow the same procedure, emailing abuse@yahoo.com. This is the case of the latest spam message I received, two days ago. I proceeded to report the spam as I always do and received a reply from Yahoo! with the following contents.

Thank you for your email, but this address now only accepts messages in
Abuse Reporting Format (http://tools.ietf.org/html/rfc5965)

To report abuse manually (or to get help with security or abuse related
issues), please go to Yahoo! Abuse:
http://abuse.yahoo.com

For questions about using Yahoo! services, please visit Yahoo Help:
http://help.yahoo.com

Thank you,
– Yahoo! Customer Care

Note: Please do not reply to this email as replies will not be answered.

A quick Google search revealed a few people upset by this. Apparently, Yahoo! is applying this policy since the beginning of December. The RFC they mention in that first paragraph is from August. People are upset for several reasons. The RFC is so recent there are almost no tools to handle or create reports in that format yet. For that reason, they are cutting people out of the loop. The second option is going to their website and reporting the spam message there. This means two things: that you have to treat Yahoo! in a special way when reporting spam and that you have to be annoyed by their web form to report spam. It’s annoying because the landing page has no direct form to report spam. As of today, you have to click on “I want to report spam” (this opens a new window or tab), then copy, on separate locations, the full email headers on one box, and the message contents on another one. Fantastic. So you can’t simply upload the message for paste the full contents to a form. No, no. You have to carefully select the message headers first, then copy them, then paste them on the form, then copy the message body, then paste it on the form, then pass a captcha.

I was also a bit upset by this, so I read RFC 5965 a little bit. It looked simple if you only wanted to fill the required parts, and had a simple report example at the end, so I searched for a tool that would convert an e-mail message to a report based on these parameters. I didn’t find any tool immediately. I realized Python has a very comprehensive and easy to use package to handle e-mail messages, so I investigated a little bit and decided to spend the rest of the evening trying to create such a tool. The result has been uploaded to github as the spamreport repository, but don’t try to use it immediately. I have some bad news. Python’s e-mail library is amazingly simple and, in the end, including all the code to check program options and such, the program is exactly 100 lines long, so it’s very short and straightforward, and should work perfectly. However, it doesn’t work.

I have tried submitting an abuse report to Yahoo! in that format several times, making minor changes to the code, tweaking my program here and there, and every time the report has been rejected. Yahoo! does not explain why the report is being rejected in their reply, which, by the way, is a bit against the RFC itself. Section 4:

When an agent that accepts and handles ARF messages receives a
message that purports (by MIME type) to be an ARF message but
syntactically deviates from this specification, that agent SHOULD
ignore or reject the message. Where rejection is performed, the
rejection notice (either via an SMTP reply or generation of a
DSN) SHOULD identify the specific cause for the rejection.

As they are replying via SMTP with a rejection, they SHOULD explain the reason but they’re not doing it, and that’s why this is so frustrating. At first, I thought GMail was mangling the reports so I sent one to my own accounts at another e-mail provider, and it came out unmangled on the other end. GMail is not manipulating the reports. Just so you get an idea, here’s a screenshot from a test case. I took the simple report example they give in the RFC and attempted to create a similar report with my tool, using the same spam message and the same notification text, just to see what the differences were. Click on the image to view it in full size.

As you can see, apparently the only differences are:

  1. The header order for “To”, “From”, “Date” and “Subject” differs (this should be irrelevant).
  2. The words “feedback-report” are quoted in my output because Python writes them that way. This should also be irrelevant.
  3. The MIME boundary markers differ (irrelevant and are generated randomly for each message).
  4. The words “us-ascii” are in lowercase in my output. Python writes them in lowercase even if I put them in uppercase, and this should be irrelevant too.
  5. The User-Agent string changes (obviously).

Yet the reports are being rejected by Yahoo! I’m puzzled at this moment and won’t tag the release as 1.0.0 until the reports are accepted or proved to be correct, but I don’t know what more to check. I suspect there’s a minor flaw I haven’t detected. If you spot it, please let me know. The code is on the net.

https://github.com/rg3/spamreport/

2010-11-06

youtube-dl has moved to github.com

Filed under: Programming,Software — rg3 @ 12:13

Some days ago, youtube-dl, my most popular project, moved from being managed using Mercurial at bitbucket.org to being managed using Git at github.com. Since the move, I’ve been wanting to write something about it. I’ve also been wanting to rewrite the program partly or completely every time I look at its source code, but that’s a different matter. Back to the main topic.

I should start by apologizing to anyone who thinks this is a bad move either because they may have to rebase all their work in a new repository, bringing all their changes back, or simply registered at bitbucket.org to follow the project. It currently has 17 forks and 100 followers, and I’m pretty sure some of them registered there just to follow youtube-dl, and the move to github.com is, if anything, a problem because they would have to create an account somewhere else to continue following the project. Again, apologies to anyone for whom the move has no practical aspects.

That said, I’d like to explain why I made the move. You may recall I wrote an article some time ago about Mercurial vs. Git. Apart from explaining what I considered were the main differences between the two, I also wanted to express my indecision about which one was better. While I think Mercurial is and was great, the balance has been leaning towards Git for some time now, and I tend to use Git for all my personal projects. Many of the reasons, if not all of them, have been expressed by other people in the past. It’s a good moment to quote a very well known blog post from Theodore Tso, written in 2007 when he was still planning to migrate e2fsprogs to Git from Mercurial:

The main reason why I’ve come out in favor of git is that I see its potential as being greater than hg, and so while it definitely has some ease-of-use and documentation shortcomings, in the long run I think it has “more legs” than hg, [...]

I think that paragraph describes with great accuracy what I think too. In the medium and long run, Git’s problems almost vanish. Its documentation was a bit poor back then, but people have been writing more and more about Git and there are a few very good resources to learn its internals and basic features. Furthermore, once you have a simple idea about its internals and use it daily, you no longer need that much documentation. If you’re not sure how to do something, chances are a simple web search will tell you how to do what you wanted to achieve.

Also, as many people know, Mercurial was and is mostly about not modifying the project’s history, while Git has a lot of commands that directly modify the project’s history. With time, I’ve come to realize that modifying the project’s history is simply more practical in many cases and in a range of situations it leads to less confusion. In my day job, we are slowly moving from CVS to Subversion to manage the sources of a very old and important project, which exists since about 1984. At the same time, we are modifying our work flow here and there to take advantage of Subversion, and we’re heavily using branching and merging despite the fact that’s not one of Subversion’s greatest strengths, as you may know. That’s giving us some problems and it’s amazing how many times I caught myself thinking “this would be much easier if we were using git, because we would simply do this and that and job done”. Many of those actions would modify the project’s history and clean it up. I repeat, in real situations with a lot of people working on something and not doing everything exactly as it should be done, it’s only a matter of time that you miss a Git feature.

The only thing I don’t like about Git is its staging area. From a technical perspective, the staging area makes a lot of sense, and you can build many neat features based on it. However, one thing is having a staging area and a second thing is exposing it to end users. I think you can have a staging area and all the features it provides while hiding it from users in their most common work flows. Still, it’s something you get used to and everybody knows that, when your project is a bit mature, you spend way more time browsing the source code, debugging, running it and testing it than actually committing changes to the source tree. The staging area is not a big issue and “git commit -a” covers the most common cases.

Apart from Git itself, the move was partly motivated by the site, github.com. When I started using bitbucket.org I liked it a bit more than github.com, but things have changed slowly. github.com fixed a rendering bug that hid part of project top bar, got rid of its Flash-based file uploader and got an internal issue tracker with a web interface that works really really well. The site is very nice and the “pages” feature, that allows you to set up a simple web page for the project, is still not provided by bitbucket.org as far as I know. In addition, with the arrival of Firesheep, it quickly moved to using SSL for everything. It’s fantastic.

bitbucket.org was recently bought by Atlassian and their plans are indeed better. For me, however, the number of private repositories and private collaborators is not an issue, because all the projects I host on github.com are public. Still, it’s fair to mention their plans because it could be a deciding factor for some people.

I wouldn’t like to close this article without mentioning the big improvement that both sites bring to the typical free and open source software developer. I still host a few projects on sourceforge.net, and I can tell you I’m not going back to it despite the great service they have provided for years for which I thank them sincerely.

It’s been months since I last used it so I apologize if things have changed without me noticing, but back then it was very hard to get your code on sourceforge.net. You didn’t perceive it was hard because there was no github.com. Once you try github.com or bitbucket.org, you realize how much the process can be simplified. Two key aspects to note. First, the project name doesn’t have to be unique. It only needs to have a unique name among your own projects, which is much easier and simplifies choosing the project name a lot. Second, once the project is created and has a basic description, without filling any form and without having to wait for anything, you are only a few commands away from uploading your code to the Internet. It can literally take less than 5 minutes to create a project and have your code publicly available, and that’s fantastic and motivating. You don’t need to find time to upload your code or thinking if the process is worth it for the size of the project. You simply do it. That’s good news for everyone.

Let me finish by apologizing again to anyone for the inconveniences created by the move. I sincerely hope this will remain the project location for many years to come.

2009-12-15

When your hobby becomes a job: reflections on the em28xx driver situation

Filed under: Programming,Software — rg3 @ 21:59

More than one year ago I bought a TV USB stick to be able to watch analog and digital TV in my computer running Linux. It was not an easy task. As you may know, usually it’s not hard to find hardware that is supported by Linux. Sometimes, however, while there are multiple devices supported that would serve your purposes, the trouble will be locating a place or site that will have one of those models available for you to buy. This was my case. I printed a list of supported digital and/or analog TV tuner USB devices and went to most computer stores and malls in my area trying to locate at least one of them and compare prices, and I went back home with hands empty.

I had to change the strategy and get the list of devices I could buy, and then search for them on the Internet, trying to know if any of them were supported by an out-of-tree driver or something similar. After a couple of returns, thanks to some manufacturers changing devices internally while keeping the product name unchanged, I finally arrived home with a working, hybrid, TV USB stick, the Pinnacle PCVT Hybrid Pro Stick, sold in some countries as model 330e. It costed just over 100 euros.

My main target being digital TV, I quickly got it working with an out-of-tree driver by Markus Rechberger. This out-of-tree driver was part of a project that tried to create the possibility of having user-space tuners for TV cards. While I am nobody to judge if that’s a good or bad idea, it was different enough to not make it into the main kernel tree. The author, then, appeared to change the approach and created a different out-of-tree driver called “em28xx-new”, based on the in-kernel “em28xx” driver that he had already contributed. This driver used a more traditional approach, and worked like a charm too. Unfortunately, it never made it into the vanilla kernel either, for whatever reasons.

I contacted Markus Rechberger a couple of times, if I recall correctly. I thanked him for his efforts and time put into creating the driver and asked a couple of questions once, and also sent him a patch for the build scripts some time later. I don’t recall if the patch was applied or not. He was always very nice and polite.

However, one day I had just compiled a new kernel and was about to build the driver for it. Before doing that, I always downloaded the latest copy of the driver source code from its Mercurial repository. This time, when I ran Mercurial it exited with a confusing error message, saying the remote tree was not the same repository I had in my hard drive. I supposed the author would have created a new repository for the driver, so I cloned it to a new directory. It turned out there was only a README file in the repository. I opened it and… uh oh. A note saying the old driver had been pulled from the Internet and giving a URL that led to the web site of a TV card manufacturer offering products that were supposedly supported by Linux. The equivalent USB stick costed just about 100 euros, like the one I had. But, of course, it was too late for me to return the one I had bought. I had been using the device for months.

I searched on the Internet again trying to find the reason that led to the driver being pulled from the web site, and everything I got was the site of an Arch Linux user that uploaded the latest version he got from the repository and even offers some patches to make the code work with more recent kernels. However, as of the time I’m writing this, the latest patch is for kernel 2.6.30 and the driver does not compile for the recently released kernel 2.6.32. So the status of this device is that it works, but only if you have a specific kernel version. At the top of that page, you can see a huge banner that reads like this:

DISCLAIMER: Don’t bother me or the original author, Markus Rechberger, with any questions about problems with this driver, because Markus Rechberger deleted it because of these questions and because I just host these files.

I thought the driver may have been pulled from the Internet for some kind of legal reasons, but the disclaimer suggests a different reason. I don’t know if I buy the reason. I’m not sure it’s entirely credible but there’s no point in not believing those words are true. Markus Rechberger, for all we know, got burned out maintaining the driver and decided not to maintain it any longer.

A story published months ago at lwn.net explains this case with more details and further information. The situation for people owning this device and wanting to use it under a recent kernel is that you are supposed to be using the in-kernel em28xx driver. However, as the linuxtv.org page for the device says, the difficulty in supporting digital TV for it has its source in the Micronas DRX3975D DVB-T chipset it features. This chipset already has an in-kernel driver, which can be located at Device Drivers > Multimedia support > DVB/ATSC adapters > Customize the frontend modules to build > Customize DVB Frontends > Micronas DRX3975D/DRX3977D based. The location may change in the future (2.6.32 as I’m writing this).

Unfortunately, the driver cannot be used by now. As its help text mentions, this driver needs external firmware which currently cannot be obtained. Marked as “TODO” in the help text, you are told to run “<kerneldir>/Documentation/dvb/get_dvb_firmware drx397xD”. But, if you try, you’ll get an error saying that drx397xD is not a know component.

It’s an appropriate moment to thank and encourage the developers that are working on this, being the last missing piece. Devin Heitmueller has done a good job trying to keep people up-to-date with information on the progress and the difficulties encountered. The last comment on that blog post is from December 6 and says:

Unfortunately, at this point the answer is “not right now”. I’m waiting for the DVB generator to arrive, at which point I should be able to complete the work.

Again, thanks for working on this, keep up the good work and we’re eager to make our 330e USB cards work again with recent kernels, Devin!

While reflecting on the driver situation and putting together the different pieces of this soap opera, it all reminded me of the situation we professional programmers face from time to time while maintaining open source software. Many of us really love programming and we have tried to make it our job, successfully. There’s a difference, however, when you change from student to professional programmer.

When you are a student, you have a lot of time in your hands. It’s a wonderful experience going to college and learning new things everyday, buy books, read about different languages and technology, and the amount of spare time to learn and have fun programming is incredible. Later, however, you become professional and you start working for a company in a full time job. You leave home before dawn everyday and, at least in winter and in my case, you arrive home after sunset. It’s incredibly depressing if you think about it. You spend the day coding, fixing issues in programs, debugging, testing, etc. This kind of life doesn’t make it impossible to enjoy programming again, but if you arrive home and find that you have a popular open source program in your hands with users reporting bugs and requesting new features you may feel as if you were still at the office.

My advice here is obvious. Don’t stop coding in your spare time, but do it for fun. If you don’t feel like adding a new feature someone requested, don’t add it. It’s very important to say “no” often so your program will still be your program, the product you wanted. If a user or a group of users are still in the fortunate situation in which they are students and have a lot of spare time, they can always fork your code. That is the beauty of free and open source software.

I couldn’t care less if some people would like to use all this text above to attack FOSS and say bad things about it: non-working drivers, unresponsive maintainers, lack of documentation, user unfriendliness. Mental health of the people writing the code is more important. Don’t burn out. Produce something and let others make better things out of it if you don’t have the time. Start new projects all the time. Handle maintenance of old projects to new people. Have fun. Enjoy. Code. Help others. Submit patches with bug reports if possible. Appreciate the effort of others and thank them for the work they provide you. Try to be kind and explain your users the reasons behind your “noes”.

2009-10-09

GDB now supports stopping on system calls

Filed under: Programming,Software — rg3 @ 20:41

One of the best moments in my professional career, from a pure personal perspective, came about 6 weeks ago when I was able to find out the cause of a memory leak one of our programs was suffering, and it turned out to be a problem in a standard library function completely unrelated to memory management. I am very proud of that moment because it took me a lot of time to find the problem and I had to apply a good amount of knowledge I did not expect to apply as a professional. When I could finally prove the memory leak was in the library function, using a short (20 lines) demonstration program, I felt simply happy. The drama was over.

Our program has soft real-time requirements and is written mostly in Ada, with a few calls to C library functions via pragmas. Due to the nature of the program, it avoids memory allocation as frequently as possible and uses static-sized arrays for its data structures, and O(1) algorithms whenever possible to behave properly in the context it is being used.

I will completely skip the part of the story dealing with analysis of the source code in vain to find where the program was leaking memory, but I can tell you it took a lot of time and did not give any positive results, making us quite angry and desperate. We would have been unable to find the problem this way. As I said before, the leak was in a standard library function from an expensive software development kit for Ada, and had nothing to do with memory allocation functions. To be more precise, the memory leak was in Text_IO.Reset, a procedure that resets the state of a text file, very similar to rewind in C. I will head to the final steps, what I considered interesting.

The program runs on Solaris, so we monitored the process using pmap. This gave us precise information and told us clearly that the memory region that was growing was the heap, where memory allocation happens. I thought that, if our program was barely doing any memory allocation operations, and normally it should be doing none, according to the code, we had a good chance of catching it leaking memory. When a program in Unix needs more memory, it calls either mmap, brk or sbrk. I could not come up with more system calls that allocated memory. Normally, when you program in C or C++ you use malloc, free, new and delete. These language operators or library functions in turn manage memory blocks but request more memory to the operating system with the previous system calls I mentioned. It is explained in many books and tutorials over the Internet but I would say, and maybe I am wrong, that it is not exactly common knowledge.

My first approach, which did not work, involved creating a shared library that I would load using LD_PRELOAD, which intercepted calls to sbrk, brk and mmap. When intercepted, it would call pstack on the current process (a program that prints the call stack of any process given its pid), save the call stack to a text file and proceed with the normal system call. Hackish and clever, I thought while laughing like a maniac when I was coding that. Well, that did not work, I repeat. While the program was indeed calling sbrk as confirmed by truss (for Linux users, truss is very similar to strace), it was not calling, apparently, any function called sbrk that I could intercept. I created a test program to see if my library worked and it did, but it did not create any stack trace for the program in question.

Still, I had already started using truss to verify the program was allocating memory with sbrk, so I dived into the truss manual to see if I could use it for something else. This way I discovered that truss was able to stop the program execution when it made any system call I specified. My new approach was, then, tracing the program with truss, stopping in sbrk, then calling pstack on the PID and then telling the program to continue running. This almost worked. The printed stack did not have any symbols, probably due to the Ada compiler not populating the executable file with the debugging information as the C compiler did. So close yet so far. Our programs were indeed compiled with debugging information, and a minor change to the strategy was enough. Instead of printing the stack with pstack, I would attach the Ada debugger to the program and print the call stack. This way, I finally witnessed the program leaking memory in what seemed to be a call to Text_IO.Reset.

I thought this could be wrong, so I created a test program that read a file over and over again, calling Text_IO.Reset when reaching EOF. The test program, indeed, leaked memory at an alarming rate. Case closed, smile and surprise in my face. Well, to be honest, we replaced the calls to Text_IO.Reset with something else and tested again, to confirm the program had stopped leaking memory. But I already knew the problem had been found after running the test program.

When I came home I wondered if I could have done something similar in my Linux system. I read the man page for strace to see if it could stop programs when a specific system call was made, but I found no way of doing so. Apparently, the solution and the strategy I had employed was Solaris (or maybe UNIX) specific. Several Google searches did not give me any clue about doing the same in Linux.

Yesterday, however, GDB 7.0 was released. I took a look at the new features and I found this little sentence at the end of it:

* New command to stop execution when a system call is made

According to the documentation, you only need to set a cathpoint for the program like catch syscall sbrk to achieve the same. Two months ago, I would have read the features and forgotten them five minutes later. But, yesterday, I again smiled, then laughed like a maniac and shouted “BEGONE, MEMORY LEAKS!!!”.

2009-05-14

Rounding anecdote

Filed under: Programming — rg3 @ 21:01

I work for a company that shall remain unnamed, developing an application with soft real-time requirements. This application helps human operators perform their work more safely because human lives depend on them and it’s crucial that the software works well. Some days ago we had a small problem that called my attention and I’m sharing it. It’s not something we can learn much from, but it’s entertaining.

This software has pieces of code written in Ada and some others written in C and C++. While the parts in Ada have their fair share of problems too, this one belonged to a piece of code that was written in C++. Let’s start with a question about rounding: how would you round a floating point number whose fractional part is exactly 0.5? There are two classic answers to this question.

Some people will argue that you should round those “up”, that is, away from zero. There is a reason for that. When rounding away from zero, you are dividing the interval [X, X+1) (let’s suppose positive numbers) into two smaller intervals of equal size [X.0, X.5) and [X.5, X+1). Each of those intervals has a length of 0.5 units. The first interval is rounded “down” and the second interval is rounded “up”. This is the desired behavior in many applications.

However, there’s another way to look at this problem, from a statistical point of view. People sharing this view will correctly argue that it doesn’t matter if you round the number “up” or “down”, because the rounding will introduce an “error”, that is, a distance between the original number and the rounded value. That error has an absolute value of 0.5 no matter if you round the number “up” or “down”. The distance to both integers is the same. Going further, while the distance is the same in absolute value, it’s not the same in the value itself. For one of them, the error is +0.5 and for the other type of rounding, it’s -0.5. Some applications would like to compensate this, and make half the cases round with an error of +0.5 and the other half with -0.5. This way the rounding doesn’t introduce any bias, and this strategy is better in many other situations. It’s usually implemented by rounding to the nearest even number.

Back to our application, we had to round a number. As this was code in C++, the proposed strategy was to use “rint”. If you can, you should read the man page for “rint” at this point. If you don’t have a Unix system, you can find it with a web search. The decision to choose “rint” was a quick one. If you read its man page, you will see it mentions that it rounds the numbers using the “current rounding direction”. Depending on your system, it may also mention that you should also read the man page for “fesetround”, but it doesn’t matter. The important thing here is that most people, including me (and I was not the one who took the decision to use “rint”), will interpret that text as “rint” rounding either “up” or “down”, whatever the current direction is, but consistently up or down. Either up or down, but always up or always down unless you change the rounding direction. Well, that’s wrong! “rint” rounds to the nearest even number in halfway cases, and to the nearest integer in the rest of the cases. In other words, the second strategy I described above. In C++ and C89 (not C99), you basically get “floor”, “ceil” and “rint”, so “rint” was picked.

That number we had to round was directly going to the user screen. They would see whichever integer number “rint” came up with. After deploying a new version of the software, and being a critical piece of software, we soon got a complaint from users describing that, sometimes, they would detect some weird cases in which the numbers were “always even” and couldn’t explain why. The number originated from a hardware sensor that was supposed to give you more or less continuous readings, so it didn’t make sense that all the numbers appeared to be even from time to time. More specifically, this noticeable issue was not present in the previous software version. We started investigating it.

Fortunately for us, we “soon” saw what was going on. In previous software versions, the sensor readings were sent to the application through a network connection by another application, using an IDL structure. The number was stored in a floating point variable and sent to our software. In the other end, we would read the value and store it in a floating point variable of the same type. However, in the new version, this information traveled in an industry-standard format. This format specified that the number would be transmitted in a field that represented quarters of the given unit. In practice, this meant that we could only receive numbers that were X.0, X.25, X.5 and X.75. As you can imagine, the bias to even numbers can be very noticeable in this situation. Despite the number going directly from the network to the user screen, only changing in the call to “rint”, the software is big and the path is not obvious. Several people were studying the code for several hours verifying that the number was not being modified along the way of the many layers of functions and methods from the point in which you receive it to the point in which you printed it. They concluded that, apparently, the only transformation would be through “rint”, but they couldn’t be 100% sure (the code is complex). The first thing they did was to read the “rint” man page.

Obviously, the “rint” man page doesn’t mention this behavior directly at all. If it mentioned, explicitly, that it rounds to the nearest even number in halfway cases, the game would have been over at that point. I take credit here for being the one who created a 10-lines program that tested what “rint” did, just to be sure it was not the problem. But it was!

I wanted to solve the problem by replacing the call to “rint” by a call to “round”, another standard function. The man page for “round” says that it rounds halfway cases away from zero. In other words, it follows the first strategy I described above. This is what we wanted. X.0 and X.25 would be rounded to X, and X.5 and X.75 would be rounded to X+1. No bias to even numbers and a good decision because most X.5 numbers we would receive would probably come from X.5something numbers that had been truncated to X.5. As I discovered, our C compiler and our C++ compiler didn’t have “round” available. “round” is a standard library function that is available in C99, but is not present in some compilers unless they attempt to cover C99. You can read the man page for “round” to confirm this.

This just goes to show my lack of experience. I had never faced a rounding problem in my life and I had always been using “round” when I needed to round a number because it’s present in my system and it’s the obvious solution. When you wonder how to round a number in C, you type “man round” and its man page pops up. The question still open is how to round a number in C in the way “round” does but without using “round”. Does the lack of such a rounding function in C89 surprise you? It did to me, but the definitive solution was provided by an experienced programmer in the company, that explained to us that he had always rounded numbers adding 0.5 and truncating. It is almost obvious why this works exactly as “round”. Experienced programmers can laugh at me now. That’s right, thank you. Yes, that was the first time I saw the trick. More laughs. Anyway, we ended up taking advantage of the number being received in quarters of a unit and save some casts and floating point operations by rounding it adding two and dividing by four. Got it?

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.