A couple of days ago I had some spare time and rewrote youtube-dl from scratch. This now slightly popular script started its life as a simple, quick & dirty, way of downloading videos from YouTube. I originally wrote it because, back then (it’s been almost two years since the first version was released to the public), there were no real working alternatives. There was a Firefox VideoDownloader extension and a Greasemonkey script, but none of them were working at that moment. They problably work now. Another solution was to use a video downloader site, but I didn’t think it was a good idea in the long run.

Using a video downloader site meant that if the video site was down or blocked or something bad was happening at the moment, I couldn’t download the video. Option discarded. Using a Firefox extension was great and probably multiplatform, but I was using Konqueror (since then, I’ve moved back to Firefox). My proposal was a multiplatform command line program that mimicked what the web browser was doing to download the videos. Being command-line meant that it wasn’t a program for the masses, but it also meant batch downloads were possible easily, and that the program would run almost in any system, so I coded it in vanilla Python. I never ever intended it to be a popular program. It was quick and dirty and straight to the point. I didn’t waste a single second in thinking about its design. It was a “do this, do that and download this URL” program. Just in case someone else could find it useful, I put it in my webspace and added it to freshmeat.net. If I had known what was coming, I’d have created a project in SourceForge.

It became very popular for a single reason: it was featured in linux.com. Joe Barr (recently deceased, rest in peace) wrote an article about it which also found its way to the front page of digg.com (798 diggs). Well, not exactly popular. However, I know it has more than one thousand users. That’s popular enough for me. It took me a day to notice the events. Suddenly, I received a handful of emails about youtube-dl. I don’t remember how many, but probably 5 in a row. As surprising as it sounds, I had never received 5 emails in a row about any of my programs, and specially not about a Python script that was like 100 lines back then. It had become an instant and surprising success. It was not until the next day or later that day that I opened my web browser to read a few news sites, and noticed that my program was on the front page of linux.com. Jaw to the ground. That explained the emails, I supposed. Then, a friend of mine who reads digg.com congratulated me. I asked “Oh, you also read the article at linux.com?” and he replied “I don’t remember where the article is located, but it’s on the front page of digg.com”. My. bad.

As time passed, it had more users and many of them wrote me about possible improvements and reported bugs. That’s the great thing of having many users. I don’t use it that much, but when YouTube changes something and my program breaks, I inmediately get a couple of people complaining by email, and it can be fixed in less than 24 hours. Due to its internal no-design, just-do-the-job structure, some of the features that were requested needed a lot of hacking and modifying and tweaking the script until it became a real mess. Also, making the program download from more sites apart from YouTube didn’t feel alright because of the limited amount of code that could be shared. That’s why I created separate programs and shared the code internally only.

In the last months I felt that sooner or later I’d have to rewrite it. As Agent Smith would put it, “it’s inevitable”. At some point someone was going to request an interesting feature and it was going to require hacking the script beyond recognition. Finally, some days ago, I had some spare time and thought for some minutes about a possible design that could be used if I rewrote the program. I’m no design genius, but I believe any rational programmer can come up with a good way of solving a problem and future-proof parts of their code if they think about it for some minutes, so that’s what I did.

The new code was released two days ago in the usual place. Internally it’s now object oriented and I tried to make it very easy to integrate new sites and features into it. If it falls short in some aspect it could be tweaked and refactored further. It’s also easier than ever to integrate the code into a bigger program, in my humble opinion, as it could be imported as a module.

In the near future, my goals are to make it possible to download playlists with it and integrate the code from metacafe-dl so it’s able to download from metacafe.com too. My intention is that, if anyone wants support for more sites or meta-sites (like youtube playlists), they could inherit from a class and reimplement 2 or 3 methods. I can then put that class into the program code and hook it into the main file downloader. I am going to maintain the YouTube code and the metacafe code. If anyone submits code for more sites, I think I should require more responsibility and ask people to fix that code if it ever breaks. In the same line, it wouldn’t make sense that someone submits code to support 10 more sites becase they went on a hacking spiral if they are not really going to use that code in the future. In addition, take into account the program has changed from the MIT license to public domain. Code submitted would have to be put in the public domain, of course.

A quick guide on how to add more sites is: inherit from InfoExtractor, reimplement the suitable() method to define which URLs are suitable for it (using regular expressions or something like that), reimplement the _real_initialize() method to initialize the information extractor (authentication, confirmation, filling initial forms, etc) and the _real_extract() method to return a list of dictionaries with information about the content. If in doubt, have a look at the code of the YoutubeIE class taking it as an example.

Please read the program webpage, specially the new sections, and happy hacking!

fsync and fdatasync

A few days ago someone asked, in the ##slackware IRC channel at FreeNode, if there was a simple way to encrypt a small piece of data in a hard drive. Instead of trying to apply encryption to the whole filesystem and other measures, I recommended using GnuPG. Generating a pair of keys and using them to encrypt one file is a very short and easy process. You only have to be careful when choosing the private key password, and when deleting the original file. When you tell GnuPG to encrypt a file, it creates a new one with the encrypted data, but the original file stays on disk. If you don’t want anyone to see those contents, you have to be careful when removing that file. A simple file deletion is not enough, because the data blocks are going to remain on disk and may be available until they are overwriten, and recovered before. Fortunately, there’s a perfect tool for this problem, called shred and distributed part of GNU coreutils. shred takes the file and, by default, overwrites its contents with random data 25 times. Optionally, you can make shred do a final pass overwriting the file contents with zeros and, also optionally, delete the file. This has the purpose of making the original data unavailable to any forensic analysis tool, given that it has been proven that it’s possible to recover the original contents of a hard drive even when they have been overwritten a few times. Up to here, that’s what I knew. However, one person by nickname fred pointed out a flaw in this process, and I have to thank him for making me try to find out more information about the matter. fred argued that, even if shred calls fsync or fdatasync after each write pass (which it does; it calls fdatasync), the data would only be passed to the device’s write cache, which is normally enabled. When fdatasync returns, the data has not really been written to the disk surface. Despite sounding horrible, the story is true. As of today, under many operating systems and, for sure, under Linux, fdatasync does not request the data to be flushed from the device write cache. When it returns, your data may not have reached the disk surface yet as, I think, any programmer would expect, eventhough the manpage for fsync and fdatasync in my system mentions this issue in the notes section. This is a problem for shred. Even if there is no power failure in the process, after shred has finished its job, you can’t really guarantee anything else that in the very near future, the data on the disk surface will have been overwritten once, at least for small files. I don’t know if this stands when the file contents are bigger than the disk cache. Granted, this is a pessimistic approach and probably the data has been overwritten more times, but the danger is there, and it is surprising that the kernel doesn’t even try to send a flush cache command to the hard drive. There are rumours, and I must make it clear that they are only rumours as I could not verify anything, that some drives happen to ignore the flush cache requests in order to look better during benchmarks, which would make the problem even worse. But, at least, the kernel should probably try to ask the disk to flush the write cache before returning from fsync and fdatasync, even if the current behaviour is, strictly speaking, allowed by the POSIX standard. That’s no excuse, in my humble opinion. Those two system calls are the only ones available in POSIX if you don’t want to use system specific functions, if available. They are the key to create applications that can guarantee data consistency. The manpage reads like “these system calls are part of the POSIX standard and the only portable way of making sure your critical data in this file is already on disk before the program continues… (by the way, they may not work at all)”. They are of no good use if they don’t work as expected and flush the write cache.

I am not alone in thinking that, even if it comes at a performance cost, and our shred problem is not an issue compared to the database software out there that calls fsync or fdatasync to try to keep the database consistent. Linux is, after all, being used in production servers holding databases with critical data, and you would expect that when someone calls fdatasync, the data is sent to the disk surface and the function doesn’t return until the write operation has been performed. At these moments, I think the only way to request a write cache flush from userspace in Linux is to call hdparm -F /disk/device. This, in turn, seems to call ioctl with some Linux-specific arguments, but this requires root access and ioctl isn’t POSIX either. Also, you can enable, disable and see if the write cache is active with hdparm -W. Disabling the write cache permanently on a hard drive is not recommended. While it may improve performance in some very specific situations and loads, in general your computer will become much, much slower, and the life of your hard drive will be severely shortened. Still, before using shred, you may want to disable the write cache and enable it later so as to try to make sure the data is really being overwritten 25 times. Let’s see an example with a small, around 4k, file called test. First, with the disk write cache enabled:

$ time shred test

real 0m0.074s
user 0m0.014s
sys 0m0.017s

Then, after disabling the write cache:

$ time shred test

real 0m1.327s
user 0m0.011s
sys 0m0.022s

An amazing difference, as you can see. I don’t know how many times the data is being overwritten with the write cache enabled, but this test suggests they are not 25. The write cache problem affects security applications, like shred, and the consistency of data in other applications. When investigating this topic I reached a thread started on the Linux Kernel Mailing List (LKML) by Jamie Lokier in which he proposes a solution, from February 26, 2008. As I didn’t know if there had been any progress in this area, eventhough the shred test doesn’t suggest so, I emailed him directly and asked. In his reply, he says:

As far as I know, there has been no progress on code, but it’s nice to see higher awareness.

This, for some of us, surprising situation makes you give much more important to UPSs. If you have a server with critical data, it’s very important to have a UPS in place, because it lets you at least shut the machine down cleanly, making sure the data present in the write cache is written to the disk surface before the computer loses power. If that happens after a database system has called fdatasync, you face a possible nasty database corruption. The database can’t guarantee consistency despite calling fdatasync. If you perform some writes, call fdatasync, perform more writes and then call fdtasync a second time, in a short period of time, there’s no guarantee that the first writes are on disk when the second set of writes begins. If this process is interrupted, you may get inconsistent data. This introduces the concept of write ordering, also very important for journaled filesystems.

Write ordering

The write cache also affects the order in which data is written to your hard drive, and this is important. Its only purpose is to make the hard drive work better. If there was no write cache, every time you send data to the disk so it’s written, it would be written. The order of those writes is the order in which you pass the data to the hard drive. Many times, this would mean the disk would be rotating all the time and the disk head would jump from one position to another one. With the write cache in place, the disk can first accumulate an amount of data to be written and, once it has reached a critical mass or some time has passed and no new data has been received, it can proceed to write it to the disk surface optimizing the order, so as to minimize how much the disk has to rotate and how much the head has to move. This makes it work much faster and, as an additional bonus, last longer. However, this conflicts with an aspect of journaled filesystems, like Linux’ ext3. I’m sorry for the lies I may tell in the following sentences. You can consider a filesystem with journal writes data in three steps. First, it writes a journal entry indicating what it’s going to do. It then proceeds. Finally, it puts a mark in the journal indicating it has finished doing what it said it was going to do. It is very important that those steps are performed in that order, and that no step starts before the previous one has finished. If this process is interrupted, the system would be able to verify if the first journal entry has been written completely. If not, it knows no other steps were performed and it can discard the journal entry. If the entry is complete but there is no confirmation that the work has been done, it can do it because all the modifications are described in the journal.

Until now, we know the kernel doesn’t try to flush the write cache when using fsync and fdatasync, but does it use something when working with the journal of some filesystems? Yes, from kernel 2.6.9 this process was sanitized with the creation of the so-called barriers. Barriers try to guarantee that a write process cannot continue until all the pending data has been written on the hard drive. This is perfect for the conditions we mentioned before. So the only remaining question is: are the barriers enabled by default in journaled filesystems? The answer is “no”. They are not used by default. As sad and surprising as it sounds, they are not used and the disk write cache, by default, can interfere in the write order of those three critical steps. There are a few practical factors that, fortunately this time, make it unlikely that an ext3 filesystem is corrupted during a power loss and that’s why the filesystem corruption reports are few and far between for ext3, but the danger is there and there are artificial test programs which are able to corrupt an ext3 filesystem with a high degree of success rate if interrupted while running. From what I have read, SuSE has been shipping kernels patched to enable barriers by default for some time. In the future, maybe barriers will be enabled by default. As of today, they are not. They can be enabled, though, with a mount option in the fstab or passed to the mount command. The exact option depends on the filesystem type. For ext3, it’s barrier=1. This is mentioned in a Gentoo guide I read at some point while finding information, but more prominently it’s also mentioned in the Wikipedia entry for ext3. Enabling barriers is said to cause, in some situations, a reduction of 30% in the hard drive’s performance (comment from Alan Cox in one of the LKML threads). For a desktop or laptop computer holding important personal data you don’t want to lose, even if that was a constant performance reduction, I’d enable barriers. Performance is the main reason many times these features are not enabled by default.

Conclusion

Remember to enable barriers in your personal computer, disabling the write cache when using shred and, if possible, use your laptop battery or a UPS in more critical environments, and check it works. Your data may not be as safe as you thought due to the device write cache. Here are some significant links I used to obtain information about the write cache problems.

Long lines of code

2008-05-20

One of the subjects of a possibly heated discussion among programmers are the limits in the length of lines of code. Are long lines a problem? Should we impose a limit such as the traditional 80 columns?

Long lines clearly have disadvantages. Many programming environments and tools don’t wrap long lines and instead force you to use horizontal scrollbars to see the full text, which is obviously clumsy and makes it a bit harder to follow the code structure, as when you scroll to the right, you lose the ability of seeing at a glance the nesting level you are in. Visual diff tools, for example, usually present you unwrapped lines and split the screen in two columns. Short lines make it much easier to spot the differences.

However, they also have some advantages. As long as you don’t surpass the column limit in your screen (nowadays usually over 100 columns), long lines allow you to write code more naturally and, in fact, can increase the clarity of the code being written. In addition, the less lines a sentence takes, the more sentences you can fit in a screen of text. This can make the difference between being able to view a whole method or function without scrolling, hence having a clear visual perception of its structure, or having to scroll (or scroll a lot more) to achieve the same.

The attitude of some people defending arbitrarily short limits in the length of lines of code is a bit childish, and many of them fail to understand at least three fundamental concepts, making it reasonable to doubt about their programming and reasoning ability.

First, that the average length of a line of code varies with the language. If you impose the same limit to every language, you are failing to understand that the languages are different and code written in them looks different. In C, most lines of code are part of functions. This means that the typical obligatory indentation level of a line of code is 1. However, in C++, if you are creating a well-designed piece of code, you could be writing an inline method of a class inside a namespace. The initial indentation level is 3 or 4, depending on if you indent the public, private and protected keywords too. If you use tab characters and are displaying tabs as 8-columns wide gaps, that means 16 or 24 columns less for something that may be fundamentally the same code. Furthermore, in C, standard functions have names rarely surpassing 8 characters while, again as an example, in C++ you could be using std::transform, which are 14 characters by itself. Many modern languages use more verbose function and method names. For them, 80 columns are simply not a reasonable limit.

Second, a basic logic mistake: they are confusing the problem with its symptoms. Long lines of code may indicate bad code because it may mean you are using too many nested expressions, making the code hard to follow due to the increased cyclomatic complexity. However, a long line may not be long due to this problem. It may be long because it has to be long. For example, if it’s a printf of a slightly long message. What if it has 90 columns? Nothing. It’s perfectly fine. If, on top of that, you believe that if a line is short, you are writing good code, you are not placing yourself in a good position. What makes code good or bad is how well, efficiently and clearly the problem is being solved, how well you are dividing your application into independent modules and into simple functions and how it makes the initial problem look easy. It is perfectly possible to write an uggly application using short lines. Remember, long lines are most times not a problem, but in some languages they may indicate a problem. It’s not the same.

Third, the solutions they propose usually don’t fix anything. If you have a long line in which you call a function with four parameters and it turns out the line is too long, they tell you to put each parameter in its line. Woah. What a radical change! The code will be so much better structured that way, if it was not by the fact that it’s the same code. People using a random length limit will, many times, surpass that limit, and their solution is to split the expression. The expression is the same. You are not making the code better by splitting the line. You are not changing your approach to the problem, or the structure of your program, or decreasing the cyclomatic complexity, or anything. You are only splitting the line. Plus, as mentioned previously, you are pushing one line below the limits of your screen. If you have space in your screen, use it.

My advice is: try to structure your program as best as you can. Create good code. Write simple functions. If, when doing that, you notice you write an unusually long line, it’s fine if it fits in your screen and it’s readable. Focusing on keeping the line length below an arbitrary limit is only a distraction. You won’t be a better programmer. Be practical. If you are working with people whose screen resolution is lower and they tell you they usually have problems reading your long lines, be a good work mate and try to make them shorter. If they are working in the same application but their lines are much shorter, maybe there’s a problem with your code.

PS: Usability studies have shown short lines are easier to read because it’s easier to track the line you are reading and jump to the beginning of the next line when you reach the end on the current one. Sometimes this is used as an argument in favour of short lines of code, but fails to notice that the studies apply to unstructured normal text, not code. Code is easier to read in this regard because it follows an structure, the lines are usually different from each other and are divided into short blocks instead of long paragraphs. In my humble opinion, this argument is void when talking about source code.

Loading times

2008-03-06

Since the invention of the personal computer, the speed and capacity of those machines have been improving a lot and they are now much faster and better than they were many years ago. These performance improvements have affected several components. The most important ones for general usage nowadays are the CPU and the memory system, which includes the different CPU cache memory levels, the RAM memory and the hard drive. The increase in performance has been bigger in the components closer to the CPU. Hard drives have had many speed improvements in access, read and write performance, but it’s next to nothing compared to the improvements in the CPU speed.

If we don’t take into account the latest MS Windows version and some special types of programs like 3D games, in my opinion computers have been good enough for a few years now. Any computer built, let’s say, in the last four years, is able to run Windows XP or a decent Linux distribution with an office suite, a communications suite and several other programs without any problems. You can have them running at the same time in your computer and they don’t bring it to its knees. Right now I’m typing this from a laptop computer with a single-core Sempron CPU, 1 GiB of RAM and a 60 GB hard drive. The majority of the hard drive space is being taken by personal data like pictures and videos, not the system itself, the CPU is being barely used at this moment and I have a Slackware Linux system running with the latest KDE 3.x, Kontact (a communications suite with calendar, to-do list, notes, e-mail, etc), Mozilla Firefox and several other programs. The amount of memory being used by the applications, according to the KDE Information Center and the “free” tool, is about 160 MiB. I could perfectly run OpenOffice.org too if I wanted and still not use any swap memory. There’s plenty of memory available. And that’s what most computers are being used for. Be it at home or in offices, they are used to browse the web, communicate using email or IM programs, send and receive files, watching pictures and videos and composing documents.

All these improvements have allowed us to create more complex systems, normally with the single purpose of making other people’s lives easier, starting by the people who create the software itself. In the early days you couldn’t do this, but now you can have libraries, and more libraries on top of them, and a third library layer and a lot of abstractions and a complete API so writing an application becomes much easier. This means that the application is easier to maintain and easier to create, and those two tasks take less time and effort, allowing the software to be cheaper and more easy to use, because once you have a solid framework to work fast, you can concentrate on other types of improvements like making the application easier to use or more intuitive.

So we know that, for now, we don’t need much more computing power in most cases, and we can guess there is not much room for making a word processor or web browser better. They are what they are. You can point and click, save bookmarks, type seeing how the document will be on paper, correct mistakes before printing it on paper, save a lot of time by creating and applying styles, make it create and update a table of contents automatically… so where can we improve? Is there anything in which we haven’t seen a real improvement? Something we should have paid a little bit of attention to and we didn’t? Some unavoidable problem derived from all of this that we could try to solve? In my opinion, yes: the loading times. There’s a system with several layers of libraries and other software in the middle and at the top of it there’s an application. That complex base makes it easier to write and maintain the application, but it also means it’s bigger byte-wise, more things need to be initialized and read from disk, and while the systems we have are perfectly capable of running everything we need to run at the same time, it still takes too much time to get the program started. Once it is running, fine. While loading, not so fine.

The time it takes to boot an operating system today is worse than it was in the early days, due to the number of systems that need to be initialized and the incredible variety of the hardware it needs to detect and support, among other factors. The first Unix systems booted in seconds. Now they may take minutes. This is an obvious problem, and people have been trying to solve it and improve the situation in the latest years. Windows XP takes less time to boot than Windows 98, normally. Ubuntu Linux, for example, is now trying to move away from the classic SysV init (and others) and replacing it with “upstart”, to boot more efficiently. Someone invented “suspend to RAM” and “hibernate”. In the first case that means booting in a few seconds to a completely usable system with everything started and ready to run, as we had it running before. Windows has SuperFetch, based on the standard Windows Prefetcher, which can be tweaked through the registry in Windows XP, for example. Linux has Preload and Prelink, and Mac OSX had Prebinding, according to the Wikipedia now abandoned because it didn’t really give noticeable performance improvements.

So, yes, people have been well aware of the loading time problems and have been trying to create hacks and find ways of circunventing the problems derived from having a complex base system that makes our lives easier. For some types of applications, loading times are not a problem. For example, I can start an XTerm with Mutt almost instantly once the files are in the disk cache (in memory). Yet KMail takes several seconds to load, even when it’s already cached. Which one will I use? While Mutt is no worse than great, and I say it from my own experience, it doesn’t have many very useful features KMail has (God bless the quick search bar), and I’d prefer to use KMail if possible. Firefox takes four seconds to start in this machine when cached, and OpenOffice.org… let’s not go in there. Once they are running, no problem, I have plenty of memory available. Like I said, the problem lies in the loading time.

The approach I use to minimize this problem is relatively simple. You may have your own, here’s mine. I use suspend-to-RAM as much as possible. It takes a little bit of power but, let’s face it, “booting” in 5 seconds to a completely prepared system is a very spectacular and real improvement. On top of that, I noticed some time ago that KMail had an option to place itself in the system tray, invisible until you clicked on it (Windows users can take uTorrent as an example of this behaviour), and when you use it that way and want to write an email, you just click on the system tray icon and it comes back instantly. When you close the window, it goes back to the system tray instantly. Instant-on and instant-off. This is the obvious solution. If we have a lot of memory and running all of them at the same time is not a problem, yet the loading time for many of them is a problem, you do the obvious: keep all of them running all the time. If you use suspend-to-RAM too, they can run for ages, only stopping at the eventual reboot that comes with some system upgrades (like kernel upgrades in the Linux case). As I launch them when I enter my session, their total launch time is added or accounted as part of the boot time, but real reboots are only needed once in a blue moon. Some applications have a built-in system to do this, like I mentioned uTorrent, KMail, but also OpenOffice.org for Windows (and maybe Linux, I don’t remember) has a pre-load system, and Firefox had (or has?) a similar system under Windows. Some of them don’t have it, but they are not a problem provided you have tools like “ksystraycmd”, a KDE tool that will allow you to place any windowed program in the system tray. Thanks to it, Firefox starts and stops instantly too, as well as other applications (I could use it with OpenOffice.org Writer and Calc if I used them frequently). Some of you may be wondering what’s the difference between this and keeping the application minimized. The answer is that while the application sits on the system tray, it doesn’t appear in the task bar and it’s not present, for example, in the Alt+Tab menu. So when you send it to the system tray, you can effectively forget about it, and it doesn’t interfere in your usage of other programs you may be running at a given time. When you bring them back from the system tray you can minimize them like other applications, or keep the window open, yet you can at any time say “I’m done for now with it, so put it somewhere else until I need it again in 15 minutes”. This separation of concepts between “minimization” and “iconification” (to the system tray in this case) also marks the difference between this tool and other iconification systems like the ones from CDE or WindowMaker, where “iconification” is only an aesthetic difference to “minimization”, in the sense that iconified windows are still considered active, and iconifying a window is the only way of hiding it.

kde-systray-and-applets.png

There is also a very interesting application called Yakuake, which is a Quake-console-style terminal application for KDE. It sits invisible on the top of the screen until you press a hotkey or key combination and then it comes down. It’s very efficient in order to have a terminal permanently running. Some time ago the main Yakuake developer was considering extending this functionality beyond terminals, and make Yakuake handle any type of application, so you could bring any application down with the appropriate hotkey and keep them running all the time. When I read about it I didn’t give it much importance, but now I do and I think that was indeed a great idea, that probably won’t be implemented any time soon. With the arrival of KDE4, maybe it will now be easier to write such an application. I imagine having a very thin top screen bar with several applications that can notify me of activity through that bar (like KMail getting new mail), they can only be brought down using a key combination, without interfering with the other active applications from the taskbar. I think that would be very nice, practical and a good way to have instant on and off for a defined set of applications. Who knows. Maybe some decent programmer (or me if that fails) will create such an application in the future.

Keeping an application running all the time only has one problem: sometimes the application is not very nice on the system memory or has memory leaks (there are many claims on Firefox having serious memory leak problems, but I haven’t noticed any real ones — I use the Flash and MPlayer plug-ins and the Flashblock extension). To keep an eye on that, KDE has a very nice applet to monitor CPU and memory usage at a glance from time to time. Remember: your computer has a lot of memory, so make good use of it. Nowadays, when someone buys a new computer they expect very short loading times and everything starting in a fraction of a second (like in movies). There are a few tools and tricks which can help you achieve this goal most times, so give them a try.

Introduction

Writing programs that support multiple platforms is sometimes complicated. The bigger the program is, the closer to the system it is and the more libraries it uses, the more complicated it is to keep it multiplatform. To create such a program, special care must be put when writing code, many times no matter the programming language, and also when creating the build system for the program. Apart from the code itself, if you write it in a language that needs to be compiled, like C or C++, you must (or at least should, if you don’t want to lose your mind) specify somehow the building rules, that is, how the program needs to be compiled and linked to obtain a runnable program or usable library, or both. When someone realized typing the compiler commands by hand everytime they wanted to rebuild a project was tiresome at best, they had to do something. Probably, one of the first things they did was to create shell scripts to run the compiler commands, saving quite a lot of time. However, some time later they also realized compiling code took a lot of time. With modern computers, this need has partly disappeared. Still, it takes a good amount of minutes to completely compile a big project like the Linux kernel or KDE, and even some programs have unusually long compile times (LyX comes to mind). To the developers of those programs it’s important to be able to compile them effectively. This is because sometimes you make a small change in one of the source files and want to test if your change works as expected and you haven’t introduced an idiotic bug. So you want to rebuild that source file and integrate it into the final binary object, but only that. You don’t want to rebuild the whole thing, taking minutes, again, and you don’t want to type the two long command lines by hand either.

Makefiles

So this was probably the starting point for “make” tools. These tools are a wonderful invention. In them, you specify the rules to build your project. You create a file named “Makefile” in the directory holding the source files, containing the names of the target files (program, library names and object files, among others) and indicate two things. First, other files that target depends on, if any. Second, how to obtain the target files from the files it depends on. For example, the target program “hello” depending on “hello.c” and got by running “gcc -o hello hello.c”. You can get them by running any command, not only compiler calls. A second good thing about Makefiles is that you can assign values to variables, and use them in the file. For example, you can define a variable named “COMPILER” and assign to it the value “gcc”, and later use “$(COMPILER)” through the whole file instead of “gcc”, so if you want to change the compiler, a one line change is enough. On top of that, “make” tools, the ones that read Makefiles and run the specified commands, only generate targets when one of the target dependencies has a more recent modification time than the target itself. So if you call “make” on a source directory in which you have made a small change to a source file, only the files that depend on it (and the ones that depend on the ones that depend on it, recursively) will be rebuilt, saving a lot of time.

This could have been the end of the chain. Everytime you start a project, you write the source files and a Makefile, and use the Makefile to specify how the project is to be built. But it’s not. This is because for multiplatform programs, even within the same family of platforms (BSD and SunOS, for example), there may be differences in the compiler, in the include directories, in the library directories, in the linker, in the underlying architecture. Enough that sometimes you need to have different Makefiles for each platform. I first noticed these in some college programs I had to create and test under some different platforms. They used sockets and, under SunOS, you had to specify a couple of extra linker flags for the program to be able to link successfully. These extra linker flags were not needed under other platforms. In fact, using them in other platforms resulted in linker errors. Sometimes the differences are very small and easy to detect, and you can write a single Makefile. Some other times you really have to write different Makefiles because it’s easier, or use nested Makefiles while the global Makefile is the platform-specific one defining stuff to be used by the rest of Makefiles. As complicated as you want or need. Not to mention that if you want to support Windows natively, the available software is usually different and maybe there is no “make” tool.

automake and autoconf

automake and autoconf are the names of some tools that assist you to create a build system for your software package, trying to overcome some of the problems mentioned above. They are very popular in the world of free or open source software. When you download a source package and the first step to build it is to run a shell script named “configure”, it’s probably using automake and autoconf. These tools, using template files and shell scripts, are able to detect the specific details about your platform and generate, from a template, a suitable Makefile for your specific system. After running the “configure” script, a Makefile is generated, and you can then proceed to run “make” and use the generated Makefile to build the project. This is quite impressive, and they support a lot of platforms and are used by thousands of projects. Their age is also a guarantee of stability and correctness.

However, they do have a number of detractors. While I learned about their existance many years ago, I never really digged into them. For my small projects, Makefiles were everything I needed, and I had my doubts about learning automake and autoconf. This was because critics always mentioned how automake and autoconf are hard to understand. It is a known joke that only a handful of people really understand automake and autoconf, and the rest of people just copy each other when they need to do something. Back to reality, there were really many people who understood automake and autoconf, and some of them were not very kind when they talked about it. The KDE developers, while planning KDE4, decided that, with these tools, it was too hard to, for example, add a new source file to the project due to the size of it. automake and autoconf didn’t scale very well to projects of that size, according to them. Apart from that, there are a number of notable projects that, in fact, avoid using them. vsftpd, the FTP daemon, and Postfix, the mail transfer agent, come to my mind. If I recall correctly (I may be wrong at this moment), Postfix uses a set of Makefiles, one for each platform it supports, and vsftpd uses a quite clever mechanism to detect what it needs to detect using some relatively simple shell scripts. I recommend you to have a look at the vsftpd build system because it’s very interesting. Given the source code quality of vsftpd and the cleverness of its build scripts, I get the impression that the programmer behind it is an experienced man, and its build scripts are a clever solution that the author has been probably simplifying and fine tuning over the years. They are very insightful.

Apart form the problems just mentioned (hard to understand, do not scale very well), autoconf and automake have also some other problems related to their architecture. All the templates and shell scripts you have to include in your project for it to use these tools introduce a good weight to the final source tarball size, and they are a bit slow. To overcome the problems of these tools, some other alternatives to them have been created, and one which is going to stick is CMake, if any because it’s being used by the KDE project for KDE4 and it’s being adopted by more people due to this fact.

CMake

For the already mentioned reasons, I directly learned CMake instead of going the autoconf and automake route. I don’t know how complicated they are, but CMake is simple. I had no problems understanding it. Instead of writing a Makefile, you create a file named CMakeLists.txt. While the syntax of this file doesn’t have anything in common with a Makefile, its purpose is very similar, it’s very easy to understand, and small projects will have very simple build files. I first noticed this simplicity in the build system files when QtCurve, a theme for Qt4, Qt3 and GTK2, changed its build system to CMake. The tarball was a few hundred kibibytes before and, when it switched to CMake, it started to weight under one hundred kibibytes. If the project is much bigger, the proportional impact in the tarball size will be much less, of course.

CMake is also significantly faster than autoconf and automake. This is because CMake is a tool that compiles to native machine language, and it directly interprets the build files, like “make” does to Makefiles. This contrasts to the “configure” shell script. In big projects, the build time can be significanly reduced if you swich to CMake, like the KDE developers reported. One last difference I particularly like about CMake is that it encourages using different places for holding the source files and holding the build environment. After running “configure” from a source directory, you end up with a lot of generated files, and you then run “make” to build the project. This generates more files, and you usually have to type “make distclean” so the build files and generated files are completely deleted, so you end up with the original source directory again. CMake avoids this by recommending to use build directories. You create an empty directory and instruct CMake to, there, configure the project that has the source files in some other directory. CMake then uses that directory to generate a lot of files and a Makefile, and to store the build results, leaving the source directory intact. This is handy to keep different builds with different compiler flags at the same time, among other things.

CMake also has a generally simpler approach to the problem. You create the source files and CMakeLists.txt somewhere, and that’s all. That’s what you distribute. You don’t need to run anything to generate a “configure” script or update some metadata files. You distribute the source files and CMake, under the user system, will run and use CMakeLists.txt to give him the needed Makefile. This is another advantage over autoconf and automake. They try to be at least a bit self-contained. In other words, the source tarball tries to contain everything or almost everything you need to build the project, except the “make” tool and compilers. With CMake, the destination machine needs to have CMake installed to parse CMakeLists.txt. While this can be considered a problem, it solves more problems that it creates, because CMake does not need to run so many tests when configuring a project. When you install CMake in your system, it can perform the common checks and get information about your system at that moment, and it doesn’t have to rerun those tests again and again for every project from a “configure” script. This is a second factor influencing in CMake being faster. So give CMake a try for your C and C++ projects. I’m sure you’ll think it’s a fine tool. Did I mention it works on Windows too, out of the box, and has an optional GUI/curses interface to configure the project or make changes in the project configuration easily?