Rounding anecdote
2009-05-14
I work for a company that shall remain unnamed, developing an application with soft real-time requirements. This application helps human operators perform their work more safely because human lives depend on them and it’s crucial that the software works well. Some days ago we had a small problem that called my attention and I’m sharing it. It’s not something we can learn much from, but it’s entertaining.
This software has pieces of code written in Ada and some others written in C and C++. While the parts in Ada have their fair share of problems too, this one belonged to a piece of code that was written in C++. Let’s start with a question about rounding: how would you round a floating point number whose fractional part is exactly 0.5? There are two classic answers to this question.
Some people will argue that you should round those “up”, that is, away from zero. There is a reason for that. When rounding away from zero, you are dividing the interval [X, X+1) (let’s suppose positive numbers) into two smaller intervals of equal size [X.0, X.5) and [X.5, X+1). Each of those intervals has a length of 0.5 units. The first interval is rounded “down” and the second interval is rounded “up”. This is the desired behavior in many applications.
However, there’s another way to look at this problem, from a statistical point of view. People sharing this view will correctly argue that it doesn’t matter if you round the number “up” or “down”, because the rounding will introduce an “error”, that is, a distance between the original number and the rounded value. That error has an absolute value of 0.5 no matter if you round the number “up” or “down”. The distance to both integers is the same. Going further, while the distance is the same in absolute value, it’s not the same in the value itself. For one of them, the error is +0.5 and for the other type of rounding, it’s -0.5. Some applications would like to compensate this, and make half the cases round with an error of +0.5 and the other half with -0.5. This way the rounding doesn’t introduce any bias, and this strategy is better in many other situations. It’s usually implemented by rounding to the nearest even number.
Back to our application, we had to round a number. As this was code in C++, the proposed strategy was to use “rint”. If you can, you should read the man page for “rint” at this point. If you don’t have a Unix system, you can find it with a web search. The decision to choose “rint” was a quick one. If you read its man page, you will see it mentions that it rounds the numbers using the “current rounding direction”. Depending on your system, it may also mention that you should also read the man page for “fesetround”, but it doesn’t matter. The important thing here is that most people, including me (and I was not the one who took the decision to use “rint”), will interpret that text as “rint” rounding either “up” or “down”, whatever the current direction is, but consistently up or down. Either up or down, but always up or always down unless you change the rounding direction. Well, that’s wrong! “rint” rounds to the nearest even number in halfway cases, and to the nearest integer in the rest of the cases. In other words, the second strategy I described above. In C++ and C89 (not C99), you basically get “floor”, “ceil” and “rint”, so “rint” was picked.
That number we had to round was directly going to the user screen. They would see whichever integer number “rint” came up with. After deploying a new version of the software, and being a critical piece of software, we soon got a complaint from users describing that, sometimes, they would detect some weird cases in which the numbers were “always even” and couldn’t explain why. The number originated from a hardware sensor that was supposed to give you more or less continuous readings, so it didn’t make sense that all the numbers appeared to be even from time to time. More specifically, this noticeable issue was not present in the previous software version. We started investigating it.
Fortunately for us, we “soon” saw what was going on. In previous software versions, the sensor readings were sent to the application through a network connection by another application, using an IDL structure. The number was stored in a floating point variable and sent to our software. In the other end, we would read the value and store it in a floating point variable of the same type. However, in the new version, this information traveled in an industry-standard format. This format specified that the number would be transmitted in a field that represented quarters of the given unit. In practice, this meant that we could only receive numbers that were X.0, X.25, X.5 and X.75. As you can imagine, the bias to even numbers can be very noticeable in this situation. Despite the number going directly from the network to the user screen, only changing in the call to “rint”, the software is big and the path is not obvious. Several people were studying the code for several hours verifying that the number was not being modified along the way of the many layers of functions and methods from the point in which you receive it to the point in which you printed it. They concluded that, apparently, the only transformation would be through “rint”, but they couldn’t be 100% sure (the code is complex). The first thing they did was to read the “rint” man page.
Obviously, the “rint” man page doesn’t mention this behavior directly at all. If it mentioned, explicitly, that it rounds to the nearest even number in halfway cases, the game would have been over at that point. I take credit here for being the one who created a 10-lines program that tested what “rint” did, just to be sure it was not the problem. But it was!
I wanted to solve the problem by replacing the call to “rint” by a call to “round”, another standard function. The man page for “round” says that it rounds halfway cases away from zero. In other words, it follows the first strategy I described above. This is what we wanted. X.0 and X.25 would be rounded to X, and X.5 and X.75 would be rounded to X+1. No bias to even numbers and a good decision because most X.5 numbers we would receive would probably come from X.5something numbers that had been truncated to X.5. As I discovered, our C compiler and our C++ compiler didn’t have “round” available. “round” is a standard library function that is available in C99, but is not present in some compilers unless they attempt to cover C99. You can read the man page for “round” to confirm this.
This just goes to show my lack of experience. I had never faced a rounding problem in my life and I had always been using “round” when I needed to round a number because it’s present in my system and it’s the obvious solution. When you wonder how to round a number in C, you type “man round” and its man page pops up. The question still open is how to round a number in C in the way “round” does but without using “round”. Does the lack of such a rounding function in C89 surprise you? It did to me, but the definitive solution was provided by an experienced programmer in the company, that explained to us that he had always rounded numbers adding 0.5 and truncating. It is almost obvious why this works exactly as “round”. Experienced programmers can laugh at me now. That’s right, thank you. Yes, that was the first time I saw the trick. More laughs. Anyway, we ended up taking advantage of the number being received in quarters of a unit and save some casts and floating point operations by rounding it adding two and dividing by four. Got it?
Mercurial vs Git
2009-04-07
Edit from 2009-04-10: as time passes and I receive feedback, this article is being refined and modified subtly to remove typos, make concepts more clear, or clarify when something is not really needed. Special thanks to Martin Geisler for pointing out my mistakes related to Mercurial.
There are many blog posts and articles all over the Internet providing comparisons between Git and Mercurial. Most of them only briefly describe the main differences and then try to decide which one is better. However, I didn’t find many articles explaining the differences in detail from a neutral point of view and that’s what I’ll try to do here, also providing links to relevant documentation. For simple uses like a single user managing a private project, Git and Mercurial are equivalent. Their workflow differs a little due to the underlying differences, but their usage doesn’t seem to be very far apart. However, those differences start being noticeable when you collaborate with more users and, the more complex the project is, the more you will notice them. Hopefully, this information could be useful to Mercurial users wanting to know how Git works and vice versa, as well as novice users who are not using either one yet. In that case, I suggest you to experiment a little bit with both instead of trying to make a theorethical decision based on what you read in the documentation or in articles like this one. I will focus on four main aspects.
- The repository structure, that is, how each one of them record changes and history.
- The noticeable differences in how they manage the branching process.
- Documentation.
- Their two popular hosting sites, GitHub and Bitbucket.
Repository structure
Mercurial and Git differ a lot in the way they store changes and history. Some of those differences are irrelevant from the user’s perspective while others are not. I won’t provide a very verbose explanation of either one. The specific details are well covered in chapter 3 of “Mercurial: The Definitive Guide” for Mercurial and the “Git Object Model” chapter in the Git Community Book. A very brief description of each one: Git does not store differences between different versions of the same file. For each new version of a file, Git stores a copy of that version. Mercurial, on the other hand, stores file differences for a limited number of times and a new full copy from time to time in order to optimize the time needed to reconstruct a particular revision. Also, it uses a binary comparison algorithm that works for both binary or text files. Knowing that, I think it’s possible to read any of those book chapters and understand the details. If you read both chapters, you will notice that the model in Git looks more simple, while the way Mercurial stores changes is not trivial. A consequence of this underlying model is that Git repositories, without repacking, tend to use more disk space than Mercurial repositories. Git can pack and compress objects to change this, but this means that you need to run a command from time to time. Also, this process can take quite a lot of time depending on the size of the repository and the amount of unpacked objects. However, its model is probably easier to understand once you have access to a proper explanation.
They also share similarities. In both of them, history is represented as a sequence of commits, each commit being identified by a character sequence that turns out to be the SHA1 sum of, well, something. Each commit has one or two parent commits, allowing for different branches to exist. This connection is what creates the concept of history in the repository (things were in a given state, then a change happened from that point and we ended up with this new state). The exact wording people use to describe this is “history is a directed acyclic graph”. However, many people won’t know what exactly is a directed acyclic graph, and things can only get worse when it’s called DAG. I’m not sure the Git or Mercurial creators thought “I’m going to make the project history be a directed acyclic graph”. Being a DAG is probably more of a consequence than a starting point in the design of either tool. Knowing it’s a DAG probably won’t help you understand anything really.
Differences in branching
In both Git and Mercurial, a given state (using Git names, a commit; using Mercurial names, a changeset) can be the parent for more than one other state. This means that the project history has the notion of branches in the sense that history can diverge at some point. Using an ASCII diagram:
C ···
/
A --- B
\
D ···
State B can be the starting point for states C and D. C and D would, somehow, indicate that the parent state is state B. In both Git and Mercurial, too, a state can have more than one parent. This means that it is possible to join branches that were once separated:
··· V --- W
\
Z ···
/
··· X --- Y
Z would somehow indicate that its parent states would be W and Y.
And despite sharing all that in common, the way they export this functionality to the user is quite different, and workflows differ a lot in how you are expected to proceed in different situations. The most common expression you will hear about this is “in Git, heads or branches are explicit, while in Mercurial they are implicit”. But what does this really mean? I read the previous expression in several places and didn’t fully understand what it meant until I read a very useful document that appeared in Hacker News. It’s “Understanding Git Conceptually”. This tutorial or explanation to Git fails to explain properly the object model described in the Git book chapter I mentioned above, but the Git book lacks, as of the time I’m writing this, an explanation on Git branching as good was the one described in this tutorial. I recommend you to read it at some point after having read the chapter, but not too late. It is better to understand that from the beginning.
Briefly, Mercurial doesn’t have the notion of a branch or head as something by itself (I know I’m lying, but read until the end). In Mercurial, a branch is present because history diverged at some point, and a head is a history state that has no children. A head is not explicitly marked as a head, and there is no data structure called a branch. It’s implicit by looking at the project history, more or less. However, in Git, this information is explicit. There is a type of data structure present in the repository called a “head”. As the tutorial mentioned above explains, a head is like an arrow pointing to a specific commit or state in the history of the project, giving it a name. It’s possible to have several heads in one repository, and one of them is always the currently active one you’re working on. When you commit changes, the current active head points to the commit that will be the parent of the new one you are creating. After it has been created, the arrow is moved “forward” to point to the new commit. Very important too, these arrows always have a name, be it “master” or “experimental” or “testingsomething” or “whoohoo”. Finally, these arrows or heads are created or destroyed as needed. When you branch, you create a new one. When you merge, you no longer need one of the arrows, but you can keep it if you want to.
Going back to Mercurial, you don’t need to do anything special to create a new branch. If someone (or even yourself) starts working on the repository from a point in time, and different commits are created starting at that point, two anonymous branches will automagically appear implicitly. At this point they reside in two different repositories. If you then pull changes from one of them to the other, the two anonymous branches will also automagically (implicitly) appear in the same repository. At that point, Mercurial suggests you to merge both to continue committing changes. You can still force Mercurial to go to any point in history and create a commit there. The commit can create a new branch or extend an existing one. However, working this way can be a bit confusing unless you use tags, named branches or a GUI to see the project history graphically.
In Git, however, you cannot do that. One of the branches is going to need a new name, like “work_on_feature_X”. Actually, you can create anonymous branches like in Mercurial but it’s not the common practice and it would produce an error in old versions. So, for the rest of this document, we will suppose it still produces the error. Why? On the one hand, you worked on the default branch and moved the head forward. On the other hand, someone did the same in parallel and ended up with the head somewhere else. If you then try to pull changes, the head would either have to point to your last commit, or to their last commit. That’s why you need to give names to heads (or branches, at this point you can guess both terms are nearly equivalent). The fact that you can (and probably should) remove one of the heads after merging work means that your branch name doesn’t really have to be unique or special. Also, in Git you can pull changes without being forced or suggested to merge. After all, one of the heads is the active one and the fact that you have another head with another name pointing to another branch does not prevent you from committing more changes to your active head and moving it forward. You can merge whenever you want to and change the active head at will. The usual practice in Git to have named branches for everything makes it a bit easier to jump to any head without confusion and getting an idea of what you were doing in that branch.
While I still have to mention Mercurial’s named branches, these differences already translate to different workflows for both Git and Mercurial.
In Mercurial, branching is simpler, as you don’t need to do anything special. You simply clone the repository and start working. When you’re done, you (or someone else) pulls-and-merges. This is also needed when you want to branch your own repository. Let’s suppose you have a repository which is the official branch and want to start working on an experimental feature just like other developer, in its own branch so as not to mess up your main copy. To do that, you clone the repository to some other directory in your hard drive. In addition, as pulls and merges are a bit tied, you should keep different branches in different repository clones until you’re ready to merge.
In Git, the process is not so simple but it is more flexible. When working on a feature, you should create a new named branch explicitly. This is specially important if you’re a Mercurial user. So if you want to contribute something to a project, you clone the repository but do not inmmediately start working. First, you should create a branch for your work. The advantage is that you are likely to work with the same directory all the time. Also, the explicit heads allow you to continue working after a pull, before merging, and its named heads help you track what you were doing. In Mercurial, this is usually achieved by giving useful names to your repository clones. If your main repository sits at directory “foo”, you will usually clone that repository to another one called “foo-new-feature” to know what you were working on in that branch.
Despite everything I said above about Mercurial branches being anonymous, in Mercurial there are also named branches. However, they are a different concept. In Mercurial, the name of the branch is stored with each commit. This means that branches cannot be deleted as they are in Git (in Git, it was simply deleting the arrow). You can create new named branches and commit changes to them, and can merge work from one branch to another like you merge anonymous branches. However, the inability to delete a named branch means that they should have unique names, and that they should be used for a different purpose. My impression is that, in Git, there is no difference between a short-term branch (a branch created to implement a new feature or fix a bug) and a long-term branch (like creating a branch for a stable release and only put commits in that branch to fix security issues or annoying bugs). In Mercurial, anonymous branches are used for the first case and named branches are used for the second case.
Documentation
The documentation quality varies from one project to another. Mercurial has always had its documentation well organized. Its manpages are good and describe the features in a comprehensible way. Some time ago, and still in some aspects, the documentation of Git was not very good. It has improved a lot, however, and many unofficial books and tutorials have appeared to fill the gaps in the official documentation, the most notable example being the community book I mentioned above. Still, as you can see, its documentation is still not fully unified. For example, for an aspect as important as branching, I would have expected the book to have a clear and comprehensive explanation such as the one present in the tutorial “Understanding Git Conceptually”, also mentioned above. However, the book doesn’t have one, and the tutorial doesn’t have the beautiful and easy graphical explanation of the object model as the one in the book. So, to fully understand Git when you’re still a novice user, I think it’s still partly true that you cannot read one single document. You need to get information from various sources.
This changes when you already know the principles of Git and are looking for specific information about a command. The manpages of Git are very well written and provide lots of clear examples, even with ASCII schemes like the ones I used above.
github and bitbucket
GitHub and Bitbucket are two sites that provide hosting for projects using Git or Mercurial, respectively, to manage source code. They both have nonfree plans that give projects additional hosting services or higher limits in several aspects, but they both have free plans so any developer can host projects in it. GitHub appeared first and is probably responsible for a good amount of Git’s popularity. It provided an easy way to make distributed development. Any user can register on the site and host their own repositories, fork existing ones and communicate with other projects easily by requesting pulls, etc. Bitbucket appeared later as a clone of GitHub for people prefering Mercurial to Git. Nowadays, they differ a bit in some aspects but they allow you to do exactly the same: upload a repository and give you a project wiki you can use to write news, documentation, FAQs or whatever you want. Basically, your project can live by itself being hosted completely in GitHub or Bitbucket, both the source code and its documentation.
Despite sharing a lot of things in common, there are things you can miss in GitHub from Bitbucket and vice versa. Some examples:
- GitHub has statistics.
- GitHub is more popular.
- Bitbucket has an integrated issue tracker.
- Bitbucket’s wiki is versioned (you can work on it from your local hard drive and then push changes).
Conclusion
I hope this comparison is easy enough to understand for novice users to both programs, and I hope to have been specific enough for experts in one of the two programs wanting to use or know the other one. I would say that both Git and Mercurial seem to be converging over time in many of the additional features, while keeping the base a bit different. For example, some months ago I checked Git and it didn’t have a feature I loved from Mercurial: the ability to create a bundle (a file containing a group of commits/changesets) that could be sent by email or using a USB stick. Nowadays, it can do that too. From Git, I loved the ability to put your local changes aside (git-stash operation) to do something and being able to select specific difference chunks in files to be committed (you don’t need to commit all the changes inside a single file). Nowadays, Mercurial can do that too. And I suppose the same goes for their respective hosting sites GitHub and Bitbucket. I feel both will converge to provide the same functionality.
I have not talked about the Git staging area intentionally, leaving it for the end. It is well described in most tutorials and documents you can read about it, but let’s only mention that Git introduced a new concept that other tools lack (for good or for bad), which is the staging area. In Git, when you want to commit something you have to “prepare” the commit first, indicating which content you want to commit. This is is performed with the “add” command. With it, you will take snapshots of the current state of files and record them in an area ready to be committed. This provides a lot of flexibility when creating commits and performing some operations, but it also has its dangers. For example, if you modify a file, prepare it to be committed with “git add file” and then modify the file again to make a minor correction, you need to remember to add it again, or you will commit the file as it was before the correction. The commit operation has the “-a” option to minimize the probability of making this mistake.
Another important comment I need to make before closing this article is that Git has a lot of operations to manipulate the project’s history (e.g. git-rebase), while Mercurial does not. In Mercurial, the concept of history is a bit more immutable. Some people will like the immutability of Mercurial, while some others will prefer to be able to mutate history like Git allows you to do.
The only opinion I’m going to give over all of this is that I think both Git and Mercurial are great.
Some time later…
Events that happened, or things I found out, after writing the article above.
- Mercurial has a rebase extension.
- GitHub got an in-site issue tracker.
- Google Code added support for Mercurial (but not Git).
- The Pro Git book was published.
- A guide to branching in Mercurial was written by Steve Losh.
New project was born: halrv
2009-02-18
Some days ago I started a new small project called halrv. It is a very simple Python script that allows you to manage removable volumes from the command line using HAL. This would be a command line equivalent of the typical GUI programs that let you do the same thing. In modern desktops, when you plug a USB memory stick in your computer, you get a dialog asking you if you want to open its contents on an explorer window, and they also put an icon in your desktop for similar purposes. You can see which devices are connected to your computer by looking at your desktop. If you click on the icon, an explorer window opens showing its contents, and maybe the device was mounted in the way. When you’re done, you close the window and right-click on the device icon, and then you select the option to “safely remove” the device.
However, if you don’t run X or if you run a minimalistic desktop environment or window manager like fluxbox chances are, as a user, you don’t have any program running to do the same trick. Many times these users run their systems with a few static entries on the /etc/fstab file to perform this operation. Now you can do the same from the command line using HAL, hence only needing to be part of the “plugdev” group. No need for static entries in /etc/fstab. Some of you may be thinking “didn’t this type of program exist already?”. Maybe it did, but I didn’t find any other than pmount-hal, and it wasn’t working when I downloaded a copy (I wasn’t able to mount my memory stick and the error message appeared in several Google search results that didn’t give me a solution). You can still use plain pmount instead of pmount-hal, but then you are not using HAL and lose the advantages of it. You have to create a pmount config file where you whitelist devices that users can mount with it. In practice this works well, like static /etc/fstab entries. However, HAL is more flexible and, if you can use it, why not use it? Why not use the same service many desktop environments are using?
So here we go with halrv. In a mere 300 lines script, with short functions and commented code, you can have the following functionality, similar to a GUI and following the same steps:
$ # I am part of the plugdev group $ groups users lp audio video cdrom plugdev power $ $ # Let's see which removable volumes are present $ halrv Device Label UUID Mount point $ $ # None, the table is empty $ # I plug my SanDisk memory stick, labeled "SANDISK", and wait a few seconds $ # ... $ # Let's check again $ halrv Device Label UUID Mount point /dev/sdb1 SANDISK 4026-23C0 $ # There we go! So let's mount it $ $ halrv mount /dev/sdb1 /dev/sdb1 mounted on /media/SANDISK $ $ # Great! Let's see the contents $ ls /media/SANDISK sz3fen.pdf* $ $ # And let's see how HAL mounted the device $ mount /dev/sda3 on / type ext4 (rw,relatime,barrier=1,data=ordered) /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) none on /proc/bus/usb type usbfs (rw) /dev/sda4 on /boot type ext2 (rw) none on /dev/shm type tmpfs (rw,noexec,nosuid,nodev) /dev/sdb1 on /media/SANDISK type vfat (rw,nosuid,nodev,uhelper=hal,uid=1000) $ # You can see the last entry above $ $ # Now let's unmount it and eject the device (safely remove) $ halrv eject /dev/sdb1 $ # No error messages, which is a success $ # Let's confirm it $ $ mount /dev/sda3 on / type ext4 (rw,relatime,barrier=1,data=ordered) /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) none on /proc/bus/usb type usbfs (rw) /dev/sda4 on /boot type ext2 (rw) none on /dev/shm type tmpfs (rw,noexec,nosuid,nodev) $ halrv Device Label UUID Mount point $ $ # The device has been unmounted and ejected!
Apart from the basics above, the program also uses a personal config file (per user) where you can put specific mount options for some filesystems or even per-volume identified by its UUID. I’d say it’s quite flexible and useful. I hope you think the same. Don’t forget to try it and report back any problems or bugs you find!
Notes on Huawei E220 and mobile connections
2008-11-16
I’d like to start this post with an apology for not writing anything for several months. My job and other issues have kept me busy and I didn’t find a good moment to write a long post about something interesting.
As I have written in the past, I live in a place with very unusual circumstances from the telecommunications point of view, for several reasons. I can’t get DSL or cable here, and we’re almost in 2009, and the only proper way to get Internet access until some months ago has been a dial-up connection.
You may remember some time ago I became interested in the possibility of getting mobile internet access using a 3G modem. At that moment I went to a Vodafone shop, which was the only available option, and asked if I had 3G coverage in my area, and a nice lady told me it wasn’t available instead of convincing me into signing the contract and pay.
Fortunately for me, due to my job I had the possibility of testing one of those Vodafone modems by myself and see if it worked or not, and it turned out it worked. I tested both the Huawei E220 modem as well as the Huawei E172 modem. Both worked without any problems and gave me good connection speeds, at least compared to the dial-up connection I had been using! Knowing Vodafone worked, I studied other providers, and eventually settled with Orange if only because, for the same price as the other major operators here in Spain, the traffic limit is 5GB per month instead of 1GB, which is what most other providers are offering. I’m close to finishing my second month with them and I can tell you last month I downloaded about 2GB of data, so the choice has been mostly correct and my prediction that 1GB wouldn’t be enough became true.
Prices and connection quality
However, I’d like to share my opinion and observations now that I’ve been using it for two months. The first thing I’d like to share is that the prices are steep with most providers. You end up paying 45 euros a month for a connection that usually gives you around 1.x Mbps instead of the advertised 3.6 or 7.2 Mbps, and on top of that you are limited to 5GB a month. Compared to a DSL connection it’s very expensive, even if you can carry your connection around with you.
In my own experience with Orange (this may not apply to other providers or even areas of the country), connection quality varies a lot depending on the time of the day. Many times I start my connection at 20:00 or 21:00 and it works without any problems. However, from Monday to Friday and without moving the modem or touching it or doing anything special, the connection degrades a lot when it’s between 23:00 to 00:00, probably because at that moment many people connect to the Internet using the service. I know several people who, after a busy day, connect for some minutes at 23:00 to check their email and browse the web a bit before going to bed. That would explain why the service quality drops at that time, so I avoid downloading large files in those hours. The same happens during weekends as lunch time approaches. Early in the morning the connection is fine. Yes, this is a shame and upsetting, but I’d like to remind you that I need more than 1 GB and there’s no other method to get broadband here, so I’m stuck.
I also noticed in some cases, and mine in particular, the exact position of the modem and its orientation make a lot of difference in the link quality. Unfortunately, these modems are usually distributed with one very short USB-to-miniUSB cable or even no cable at all in the case of modem E172. That makes it very difficult to play with it to achieve gains. That’s why I bought a USB extension cable 3 meters long and, with a little bit of do-it-yourself, I managed to put the modem always in a high position close to a window. With these, I usually get four or five signal bars (out of five) instead of one or two if I put the modem on the floor next to the computer.
I took a picture of the final result. I used a long and extensible pole (2 meters of height), stuck a cube of cork on top, and used adhesive tape to attach the modem to it. Cheap and ugly, but it works wonders. Thank you very much to the guy who game me that advice. The extension cable cost me about 3 euros and if you have link quality problem it’s probably worh trying. With adhesive tape you can stick it to the window crystal temporarily and test if it improves your signal before using other more aggressive methods.
Linux
The modem works flawlessly in Linux, as it has been widely reported around the web. It’s true that many of these modems work better with the PIN disabled. I had no problem with the PIN set and modem E220, but modem E172 works better without it. With the PIN enabled, it only established a connection once every several attempts. My personal advice is, then, to disable the PIN to make sure it works if you don’t mind. Take into account you should then make sure your modem is not stolen or anyone could use it for free.
It can be configured to work with wvdial, kppp and many other dialing tools. I set it up using bare pppd and sample scripts I found on the web. For most of these modems, the scripts are all the same and you only need to change the username and password, which vary from provider to provider and are usually the provider name (in my case, orange/orange), and the so-called “Internet APN”, which varies from provider to provider too. In my case and some others, it’s “internet”, but in Vodafone Spain it’s “ac.vodafone.es”, for example. You’ll have to find out with a web search.
As an example, I’m going to post my two pppd files.
/etc/ppp/peers/3gmodem:
/dev/ttyUSB0 460800 crtscts modem noauth #usepeerdns defaultroute noipdefault debug #noccp #nobsdcomp #novj #mtu 500 user "orange" password "orange" connect '/usr/sbin/chat -f /etc/ppp/chat-3gmodem'
/etc/ppp/chat-3gmodem:
ABORT BUSY ABORT ERROR ABORT 'NO CARRIER' REPORT CONNECT TIMEOUT 10 "" "ATZ" OK AT+CGDCONT=1,"ip","internet" OK "ATE1V1&D2&C1S0=0+IFC=2,2" OK "AT+IPR=115200" OK "ATE1" TIMEOUT 60 "" "ATD*99***1#" CONNECT \c
I would then connect calling pppd call 3gmodem nodetach as root, or maybe set up a loop to make it redial when the connection is lost.
youtube-dl is dead, long life to… youtube-dl
2008-07-24
A couple of days ago I had some spare time and rewrote youtube-dl from scratch. This now slightly popular script started its life as a simple, quick & dirty, way of downloading videos from YouTube. I originally wrote it because, back then (it’s been almost two years since the first version was released to the public), there were no real working alternatives. There was a Firefox VideoDownloader extension and a Greasemonkey script, but none of them were working at that moment. They problably work now. Another solution was to use a video downloader site, but I didn’t think it was a good idea in the long run.
Using a video downloader site meant that if the video site was down or blocked or something bad was happening at the moment, I couldn’t download the video. Option discarded. Using a Firefox extension was great and probably multiplatform, but I was using Konqueror (since then, I’ve moved back to Firefox). My proposal was a multiplatform command line program that mimicked what the web browser was doing to download the videos. Being command-line meant that it wasn’t a program for the masses, but it also meant batch downloads were possible easily, and that the program would run almost in any system, so I coded it in vanilla Python. I never ever intended it to be a popular program. It was quick and dirty and straight to the point. I didn’t waste a single second in thinking about its design. It was a “do this, do that and download this URL” program. Just in case someone else could find it useful, I put it in my webspace and added it to freshmeat.net. If I had known what was coming, I’d have created a project in SourceForge.
It became very popular for a single reason: it was featured in linux.com. Joe Barr (recently deceased, rest in peace) wrote an article about it which also found its way to the front page of digg.com (798 diggs). Well, not exactly popular. However, I know it has more than one thousand users. That’s popular enough for me. It took me a day to notice the events. Suddenly, I received a handful of emails about youtube-dl. I don’t remember how many, but probably 5 in a row. As surprising as it sounds, I had never received 5 emails in a row about any of my programs, and specially not about a Python script that was like 100 lines back then. It had become an instant and surprising success. It was not until the next day or later that day that I opened my web browser to read a few news sites, and noticed that my program was on the front page of linux.com. Jaw to the ground. That explained the emails, I supposed. Then, a friend of mine who reads digg.com congratulated me. I asked “Oh, you also read the article at linux.com?” and he replied “I don’t remember where the article is located, but it’s on the front page of digg.com”. My. bad.
As time passed, it had more users and many of them wrote me about possible improvements and reported bugs. That’s the great thing of having many users. I don’t use it that much, but when YouTube changes something and my program breaks, I inmediately get a couple of people complaining by email, and it can be fixed in less than 24 hours. Due to its internal no-design, just-do-the-job structure, some of the features that were requested needed a lot of hacking and modifying and tweaking the script until it became a real mess. Also, making the program download from more sites apart from YouTube didn’t feel alright because of the limited amount of code that could be shared. That’s why I created separate programs and shared the code internally only.
In the last months I felt that sooner or later I’d have to rewrite it. As Agent Smith would put it, “it’s inevitable”. At some point someone was going to request an interesting feature and it was going to require hacking the script beyond recognition. Finally, some days ago, I had some spare time and thought for some minutes about a possible design that could be used if I rewrote the program. I’m no design genius, but I believe any rational programmer can come up with a good way of solving a problem and future-proof parts of their code if they think about it for some minutes, so that’s what I did.
The new code was released two days ago in the usual place. Internally it’s now object oriented and I tried to make it very easy to integrate new sites and features into it. If it falls short in some aspect it could be tweaked and refactored further. It’s also easier than ever to integrate the code into a bigger program, in my humble opinion, as it could be imported as a module.
In the near future, my goals are to make it possible to download playlists with it and integrate the code from metacafe-dl so it’s able to download from metacafe.com too. My intention is that, if anyone wants support for more sites or meta-sites (like youtube playlists), they could inherit from a class and reimplement 2 or 3 methods. I can then put that class into the program code and hook it into the main file downloader. I am going to maintain the YouTube code and the metacafe code. If anyone submits code for more sites, I think I should require more responsibility and ask people to fix that code if it ever breaks. In the same line, it wouldn’t make sense that someone submits code to support 10 more sites becase they went on a hacking spiral if they are not really going to use that code in the future. In addition, take into account the program has changed from the MIT license to public domain. Code submitted would have to be put in the public domain, of course.
A quick guide on how to add more sites is: inherit from InfoExtractor, reimplement the suitable() method to define which URLs are suitable for it (using regular expressions or something like that), reimplement the _real_initialize() method to initialize the information extractor (authentication, confirmation, filling initial forms, etc) and the _real_extract() method to return a list of dictionaries with information about the content. If in doubt, have a look at the code of the YoutubeIE class taking it as an example.
Please read the program webpage, specially the new sections, and happy hacking!
