A couple of days ago I had some spare time and rewrote youtube-dl from scratch. This now slightly popular script started its life as a simple, quick & dirty, way of downloading videos from YouTube. I originally wrote it because, back then (it’s been almost two years since the first version was released to the public), there were no real working alternatives. There was a Firefox VideoDownloader extension and a Greasemonkey script, but none of them were working at that moment. They problably work now. Another solution was to use a video downloader site, but I didn’t think it was a good idea in the long run.
Using a video downloader site meant that if the video site was down or blocked or something bad was happening at the moment, I couldn’t download the video. Option discarded. Using a Firefox extension was great and probably multiplatform, but I was using Konqueror (since then, I’ve moved back to Firefox). My proposal was a multiplatform command line program that mimicked what the web browser was doing to download the videos. Being command-line meant that it wasn’t a program for the masses, but it also meant batch downloads were possible easily, and that the program would run almost in any system, so I coded it in vanilla Python. I never ever intended it to be a popular program. It was quick and dirty and straight to the point. I didn’t waste a single second in thinking about its design. It was a “do this, do that and download this URL” program. Just in case someone else could find it useful, I put it in my webspace and added it to freshmeat.net. If I had known what was coming, I’d have created a project in SourceForge.
It became very popular for a single reason: it was featured in linux.com. Joe Barr (recently deceased, rest in peace) wrote an article about it which also found its way to the front page of digg.com (798 diggs). Well, not exactly popular. However, I know it has more than one thousand users. That’s popular enough for me. It took me a day to notice the events. Suddenly, I received a handful of emails about youtube-dl. I don’t remember how many, but probably 5 in a row. As surprising as it sounds, I had never received 5 emails in a row about any of my programs, and specially not about a Python script that was like 100 lines back then. It had become an instant and surprising success. It was not until the next day or later that day that I opened my web browser to read a few news sites, and noticed that my program was on the front page of linux.com. Jaw to the ground. That explained the emails, I supposed. Then, a friend of mine who reads digg.com congratulated me. I asked “Oh, you also read the article at linux.com?” and he replied “I don’t remember where the article is located, but it’s on the front page of digg.com”. My. bad.
As time passed, it had more users and many of them wrote me about possible improvements and reported bugs. That’s the great thing of having many users. I don’t use it that much, but when YouTube changes something and my program breaks, I inmediately get a couple of people complaining by email, and it can be fixed in less than 24 hours. Due to its internal no-design, just-do-the-job structure, some of the features that were requested needed a lot of hacking and modifying and tweaking the script until it became a real mess. Also, making the program download from more sites apart from YouTube didn’t feel alright because of the limited amount of code that could be shared. That’s why I created separate programs and shared the code internally only.
In the last months I felt that sooner or later I’d have to rewrite it. As Agent Smith would put it, “it’s inevitable”. At some point someone was going to request an interesting feature and it was going to require hacking the script beyond recognition. Finally, some days ago, I had some spare time and thought for some minutes about a possible design that could be used if I rewrote the program. I’m no design genius, but I believe any rational programmer can come up with a good way of solving a problem and future-proof parts of their code if they think about it for some minutes, so that’s what I did.
The new code was released two days ago in the usual place. Internally it’s now object oriented and I tried to make it very easy to integrate new sites and features into it. If it falls short in some aspect it could be tweaked and refactored further. It’s also easier than ever to integrate the code into a bigger program, in my humble opinion, as it could be imported as a module.
In the near future, my goals are to make it possible to download playlists with it and integrate the code from metacafe-dl so it’s able to download from metacafe.com too. My intention is that, if anyone wants support for more sites or meta-sites (like youtube playlists), they could inherit from a class and reimplement 2 or 3 methods. I can then put that class into the program code and hook it into the main file downloader. I am going to maintain the YouTube code and the metacafe code. If anyone submits code for more sites, I think I should require more responsibility and ask people to fix that code if it ever breaks. In the same line, it wouldn’t make sense that someone submits code to support 10 more sites becase they went on a hacking spiral if they are not really going to use that code in the future. In addition, take into account the program has changed from the MIT license to public domain. Code submitted would have to be put in the public domain, of course.
A quick guide on how to add more sites is: inherit from InfoExtractor, reimplement the suitable() method to define which URLs are suitable for it (using regular expressions or something like that), reimplement the _real_initialize() method to initialize the information extractor (authentication, confirmation, filling initial forms, etc) and the _real_extract() method to return a list of dictionaries with information about the content. If in doubt, have a look at the code of the YoutubeIE class taking it as an example.
Please read the program webpage, specially the new sections, and happy hacking!