<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>503 Service Unavailable</title>
	<atom:link href="http://rg03.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://rg03.wordpress.com</link>
	<description>This is my blog&#039;s tagline. There are many like it, but this one is mine.</description>
	<lastBuildDate>Thu, 15 Dec 2011 06:13:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='rg03.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>503 Service Unavailable</title>
		<link>http://rg03.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://rg03.wordpress.com/osd.xml" title="503 Service Unavailable" />
	<atom:link rel='hub' href='http://rg03.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Frustration with Yahoo! and RFC 5965</title>
		<link>http://rg03.wordpress.com/2010/12/19/frustration-with-yahoo-and-rfc-5965/</link>
		<comments>http://rg03.wordpress.com/2010/12/19/frustration-with-yahoo-and-rfc-5965/#comments</comments>
		<pubDate>Sun, 19 Dec 2010 12:05:28 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=234</guid>
		<description><![CDATA[In the past, I mentioned several times that I don&#8217;t receive spam. It&#8217;s not completely true, but it&#8217;s very true. My normal level of spam messages is about one each month. I have achieved this by using approaches like Yahoo! AddressGuard, and translating that same scheme to GMail when I moved to GMail. My e-mail [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=234&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the past, I mentioned several times that I don&#8217;t receive spam. It&#8217;s not completely true, but it&#8217;s very true. My normal level of spam messages is about one each month. I have achieved this by using approaches like Yahoo! AddressGuard, and <a href="http://rg03.wordpress.com/2007/01/11/imitating-yahoos-addressguard-using-gmail/">translating that same scheme to GMail</a> when I moved to GMail. My e-mail address is publicly accessible by anyone and exposed in this blog (&#8220;Contact me&#8221; in the right column). If you take a look at the source of that page, you&#8217;ll notice how I wrote it so you can copy-paste the address to your e-mail client while keeping spammers at bay.</p>
<p>When one of my addresses is compromised, I can change it right away, yet I prefer not to change them so often to avoid annoying people wanting to contact me from time to time. For this reason, when I receive a spam message to one of my accounts, the first thing I do is reporting it. If I received 100 spam messages a day, I couldn&#8217;t do this, but as I receive one a month, I don&#8217;t mind spending 5 minutes reporting the message. Only if the spam doesn&#8217;t stop and apparently increases, I change that address.</p>
<p>Reporting a message is quite straightforward, but I don&#8217;t have it automated. I could, but I haven&#8217;t bothered yet. Basically, I view the source of the message and look for &#8220;Received from&#8221; headers. I find the first one, in chronological order, that appears to be a valid public SMTP server that people should trust. Then, I run &#8220;whois&#8221; on that IP address and find the ISP or organization owning that network block, and report the message to the abuse address they provide as part of the &#8220;whois&#8221; reply. If they don&#8217;t provide an abuse address, usually I send it to the technical contact that appears in the &#8220;whois&#8221; reply, and also to the &#8220;abuse@&#8221; address of the company&#8217;s main domain, just in case it actually exists and is being read.</p>
<p>In my e-mail client I have a template to report spam. I fire a new message from the template, fill the &#8220;To&#8221; field with the addresses just mentioned and copy-paste the full spam message source at the end of my message, which consists of a very brief message to the person that could be reading it, saying I received a spam message apparently coming from their network block. As I said, this takes 5 minutes and could be automated.</p>
<p>Sometimes, the spam message comes from a Yahoo! account, using their servers, and I follow the same procedure, emailing abuse@yahoo.com. This is the case of the latest spam message I received, two days ago. I proceeded to report the spam as I always do and received a reply from Yahoo! with the following contents.</p>
<blockquote><p>Thank you for your email, but this address now only accepts messages in <br />
Abuse Reporting Format (http://tools.ietf.org/html/rfc5965)</p>
<p>To report abuse manually (or to get help with security or abuse related <br />
issues), please go to Yahoo! Abuse: <br />
http://abuse.yahoo.com</p>
<p>For questions about using Yahoo! services, please visit Yahoo Help: <br />
http://help.yahoo.com</p>
<p>Thank you,<br />
 &#8211; Yahoo! Customer Care</p>
<p>Note: Please do not reply to this email as replies will not be answered.</p></blockquote>
<p>A quick Google search revealed a few people upset by this. Apparently, Yahoo! is applying this policy since the beginning of December. The RFC they mention in that first paragraph is from August. People are upset for several reasons. The RFC is so recent there are almost no tools to handle or create reports in that format yet. For that reason, they are cutting people out of the loop. The second option is going to their website and reporting the spam message there. This means two things: that you have to treat Yahoo! in a special way when reporting spam and that you have to be annoyed by their web form to report spam. It&#8217;s annoying because the landing page has no direct form to report spam. As of today, you have to click on &#8220;I want to report spam&#8221; (this opens a new window or tab), then copy, on separate locations, the full email headers on one box, and the message contents on another one. Fantastic. So you can&#8217;t simply upload the message for paste the full contents to a form. No, no. You have to carefully select the message headers first, then copy them, then paste them on the form, then copy the message body, then paste it on the form, then pass a captcha.</p>
<p>I was also a bit upset by this, so I read RFC 5965 a little bit. It looked simple if you only wanted to fill the required parts, and had a simple report example at the end, so I searched for a tool that would convert an e-mail message to a report based on these parameters. I didn&#8217;t find any tool immediately. I realized Python has a very comprehensive and easy to use package to handle e-mail messages, so I investigated a little bit and decided to spend the rest of the evening trying to create such a tool. The result has been uploaded to github as the <a href="https://github.com/rg3/spamreport/">spamreport</a> repository, but don&#8217;t try to use it immediately. I have some bad news. Python&#8217;s e-mail library is amazingly simple and, in the end, including all the code to check program options and such, the program is exactly 100 lines long, so it&#8217;s very short and straightforward, and should work perfectly. However, it doesn&#8217;t work.</p>
<p>I have tried submitting an abuse report to Yahoo! in that format several times, making minor changes to the code, tweaking my program here and there, and every time the report has been rejected. Yahoo! does not explain why the report is being rejected in their reply, which, by the way, is a bit against the RFC itself. Section 4:</p>
<blockquote><p>
   When an agent that accepts and handles ARF messages receives a<br />
   message that purports (by MIME type) to be an ARF message but<br />
   syntactically deviates from this specification, that agent SHOULD<br />
   ignore or reject the message.  Where rejection is performed, the<br />
   rejection notice (either via an SMTP reply or generation of a<br />
   DSN) SHOULD identify the specific cause for the rejection.
</p></blockquote>
<p>As they are replying via SMTP with a rejection, they SHOULD explain the reason but they&#8217;re not doing it, and that&#8217;s why this is so frustrating. At first, I thought GMail was mangling the reports so I sent one to my own accounts at another e-mail provider, and it came out unmangled on the other end. GMail is not manipulating the reports. Just so you get an idea, here&#8217;s a screenshot from a test case. I took the simple report example they give in the RFC and attempted to create a similar report with my tool, using the same spam message and the same notification text, just to see what the differences were. Click on the image to view it in full size.</p>
<p><a href="http://rg03.files.wordpress.com/2010/12/spamdiff.png"><img src="http://rg03.files.wordpress.com/2010/12/spamdiff.png?w=300&#038;h=243" alt="" title="spamdiff" width="300" height="243" class="aligncenter size-medium wp-image-238" /></a></p>
<p>As you can see, apparently the only differences are:</p>
<ol>
<li>The header order for &#8220;To&#8221;, &#8220;From&#8221;, &#8220;Date&#8221; and &#8220;Subject&#8221; differs (this should be irrelevant).</li>
<li>The words &#8220;feedback-report&#8221; are quoted in my output because Python writes them that way. This should also be irrelevant.</li>
<li>The MIME boundary markers differ (irrelevant and are generated randomly for each message).</li>
<li>The words &#8220;us-ascii&#8221; are in lowercase in my output. Python writes them in lowercase even if I put them in uppercase, and this should be irrelevant too.</li>
<li>The User-Agent string changes (obviously).</li>
</ol>
<p>Yet the reports are being rejected by Yahoo! I&#8217;m puzzled at this moment and won&#8217;t tag the release as 1.0.0 until the reports are accepted or proved to be correct, but I don&#8217;t know what more to check. I suspect there&#8217;s a minor flaw I haven&#8217;t detected. If you spot it, please let me know. The code is on the net.</p>
<p><a href="https://github.com/rg3/spamreport/">https://github.com/rg3/spamreport/</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/234/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/234/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/234/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/234/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/234/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/234/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/234/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/234/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/234/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/234/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/234/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/234/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/234/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/234/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=234&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2010/12/19/frustration-with-yahoo-and-rfc-5965/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>

		<media:content url="http://rg03.files.wordpress.com/2010/12/spamdiff.png?w=300" medium="image">
			<media:title type="html">spamdiff</media:title>
		</media:content>
	</item>
		<item>
		<title>Disabling antialiasing for a specific font with freetype</title>
		<link>http://rg03.wordpress.com/2010/11/20/freetype/</link>
		<comments>http://rg03.wordpress.com/2010/11/20/freetype/#comments</comments>
		<pubDate>Sat, 20 Nov 2010 17:39:43 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=218</guid>
		<description><![CDATA[In the following paragraphs I&#8217;ll describe how to disable antialiasing for a specific font with freetype. The individual pieces that need to be put together to achieve this are well documented, but a Google search didn&#8217;t turn up many relevant results regarding this specific topic, so I hope anyone else searching for quick instructions will [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=218&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the following paragraphs I&#8217;ll describe how to disable antialiasing for a specific font with freetype. The individual pieces that need to be put together to achieve this are well documented, but a Google search didn&#8217;t turn up many relevant results regarding this specific topic, so I hope anyone else searching for quick instructions will find the following text useful and in the first page of a web search.</p>
<p>As you may know, freetype is normally configured by creating files in /etc/fonts/conf.avail and creating symlinks to those files in /etc/fonts/conf.d. Normally, separating each configuration parameter or parameter group to individual files lets you easily enable and disable specific font-rendering features by creating and destroying symlinks. One of these configurable features usually enabled in any distribution is to parse the file ~/.fonts.conf to allow every user to set their own font rendering parameters. For example, when KDE configures the font rendering features from the &#8220;System Settings&#8221; panel, it overwrites your ~/.fonts.conf. If you want to disable antialiasing for a specific font in freetype, you can either create a new config file in /etc/fonts/conf.avail and link to it in /etc/fonts/conf.d, setting it for any user, or adding the setting in your own ~/.fonts.conf. If you do the later, be sure to back file up somewhere, because fiddling with the font settings in your Destkop Environment may overwrite the file.</p>
<p>Going to specific details, I recently installed the Tahoma font from my Windows installation and wanted to use it with the bytecode interpreter and without antialiasing in the GUI, so it would look like this:</p>
<p><a href="http://rg03.files.wordpress.com/2010/11/tahoma-no-antialiasing.png"><img src="http://rg03.files.wordpress.com/2010/11/tahoma-no-antialiasing.png" alt="KDE Style System Settings Windows showing Tahoma without antialiasing" title="tahoma-no-antialiasing" width="624" height="464" class="alignnone size-full wp-image-220" /></a></p>
<p>However, the rest of the fonts look ugly with those settings, so I wanted to disable antialiasing for the Tahoma font only, and only in sizes of 10 points or less. For bigger sizes, antialiasing would be enabled. Long story short, here are the settings that need to be integrated into your personal ~/.fonts.conf or put in an individual file in /etc/fonts/conf.{avail,d}. I&#8217;ll explain the contents next.</p>
<pre>
&lt;?xml version='1.0'?&gt;
&lt;!DOCTYPE fontconfig SYSTEM 'fonts.dtd'&gt;
&lt;fontconfig&gt;
  &lt;match target="font"&gt;
    &lt;test qual="any" name="family"&gt;
      &lt;string&gt;Tahoma&lt;/string&gt;
    &lt;/test&gt;
    &lt;!-- pixelsize or size --&gt;
    &lt;test compare="more_eq" name="size" qual="any"&gt;
      &lt;double&gt;1&lt;/double&gt;
    &lt;/test&gt;
    &lt;test compare="less_eq" name="size" qual="any"&gt;
      &lt;double&gt;10&lt;/double&gt;
    &lt;/test&gt;
    &lt;edit mode="assign" name="antialias"&gt;
      &lt;bool&gt;false&lt;/bool&gt;
    &lt;/edit&gt;
    &lt;edit name="autohint" mode="assign"&gt;&lt;bool&gt;false&lt;/bool&gt;&lt;/edit&gt;
  &lt;/match&gt;
  &lt;match target="font"&gt;
    &lt;test qual="any" name="family"&gt;
      &lt;string&gt;Tahoma&lt;/string&gt;
    &lt;/test&gt;
    &lt;!-- pixelsize or size --&gt;
    &lt;test compare="more_eq" name="pixelsize" qual="any"&gt;
      &lt;double&gt;1&lt;/double&gt;
    &lt;/test&gt;
    &lt;test compare="less_eq" name="pixelsize" qual="any"&gt;
      &lt;double&gt;14&lt;/double&gt;
    &lt;/test&gt;
    &lt;edit mode="assign" name="antialias"&gt;
      &lt;bool&gt;false&lt;/bool&gt;
    &lt;/edit&gt;
    &lt;edit name="autohint" mode="assign"&gt;&lt;bool&gt;false&lt;/bool&gt;&lt;/edit&gt;
  &lt;/match&gt;
&lt;/fontconfig&gt;
</pre>
<p>I don&#8217;t want  to go into specific details about the rules above. There is an XML header that needs to be present in any configuration file, and it contains a &#8220;fontconfig&#8221; section. Inside that section, you can put any number of &#8220;match&#8221; sections among other things, and we need two. One specifies the rules in terms of point size and another one in terms of pixel size. Both are needed for some reason.</p>
<p>The matches look for fonts named Tahoma and disable antialiasing and autohinting for them in some specific sizes. The exact point and pixel sizes depend on your X server and/or Xft settings. Most people set the DPI value to 75, 96 or 100. In KDE, you can override the current setting from the style configuration window. DPI stands for &#8220;Dots Per Inch&#8221;. In this case, pixels per inch. Normally it should really match your monitor. That is, if you have a 22&#8243; screen with a specific resolution in pixels, you&#8217;d specify a DPI setting that would match the real DPI. However, like I said, most people use 75, 96 or 100 (I set it to 96 myself) and it DOES NOT match the real DPI. Depending on the DPI setting, your fonts will look bigger or smaller at the same size in points. In my case, I was interested in sizes lower than 10. Hence the match you can read above.</p>
<p>To write the pixel size match you need to calculate the equivalent of those point-values in pixels. This is easily calculated knowing two constants: the DPI value you&#8217;re currently using and knowing that an inch has exactly 72 points. So the equivalent in pixels of a 10-point distance in a 96 DPI screen is the following:</p>
<p>10 points, in inches: 10 / 72 = 0.1388</p>
<p>With 96 pixels per inch, those are: 0.1388 * 96 = 13.33, or 14 pixels rounding the number up, which is what you see in the config file I pasted above.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/218/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/218/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/218/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/218/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/218/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/218/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/218/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/218/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/218/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/218/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/218/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/218/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/218/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/218/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=218&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2010/11/20/freetype/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>

		<media:content url="http://rg03.files.wordpress.com/2010/11/tahoma-no-antialiasing.png" medium="image">
			<media:title type="html">tahoma-no-antialiasing</media:title>
		</media:content>
	</item>
		<item>
		<title>youtube-dl has moved to github.com</title>
		<link>http://rg03.wordpress.com/2010/11/06/youtube-dl-has-moved-to-github-com/</link>
		<comments>http://rg03.wordpress.com/2010/11/06/youtube-dl-has-moved-to-github-com/#comments</comments>
		<pubDate>Sat, 06 Nov 2010 10:13:48 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=212</guid>
		<description><![CDATA[Some days ago, youtube-dl, my most popular project, moved from being managed using Mercurial at bitbucket.org to being managed using Git at github.com. Since the move, I&#8217;ve been wanting to write something about it. I&#8217;ve also been wanting to rewrite the program partly or completely every time I look at its source code, but that&#8217;s [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=212&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Some days ago, <a href="http://rg3.github.com/youtube-dl/">youtube-dl</a>, my most popular project, moved from being managed using Mercurial at bitbucket.org to being managed using Git at github.com. Since the move, I&#8217;ve been wanting to write something about it. I&#8217;ve also been wanting to rewrite the program partly or completely every time I look at its source code, but that&#8217;s a different matter. Back to the main topic.</p>
<p>I should start by apologizing to anyone who thinks this is a bad move either because they may have to rebase all their work in a new repository, bringing all their changes back, or simply registered at bitbucket.org to follow the project. It currently has 17 forks and 100 followers, and I&#8217;m pretty sure some of them registered there just to follow youtube-dl, and the move to github.com is, if anything, a problem because they would have to create an account somewhere else to continue following the project. Again, apologies to anyone for whom the move has no practical aspects.</p>
<p>That said, I&#8217;d like to explain why I made the move. You may recall I wrote an article some time ago about <a href="/2009/04/07/mercurial-vs-git/">Mercurial vs. Git</a>. Apart from explaining what I considered were the main differences between the two, I also wanted to express my indecision about which one was better. While I think Mercurial is and was great, the balance has been leaning towards Git for some time now, and I tend to use Git for all my personal projects. Many of the reasons, if not all of them, have been expressed by other people in the past. It&#8217;s a good moment to quote a very well known <a href="http://tytso.livejournal.com/29467.html">blog post</a> from <a href="http://en.wikipedia.org/wiki/Theodore_Tso">Theodore Tso</a>, written in 2007 when he was still planning to migrate e2fsprogs to Git from Mercurial:</p>
<blockquote><p>
The main reason why I&#8217;ve come out in favor of git is that I see its potential as being greater than hg, and so while it definitely has some ease-of-use and documentation shortcomings, in the long run I think it has &#8220;more legs&#8221; than hg, [...]
</p></blockquote>
<p>I think that paragraph describes with great accuracy what I think too.  In the medium and long run, Git&#8217;s problems almost vanish. Its documentation was a bit poor back then, but people have been writing more and more about Git and there are a few very good resources to learn its internals and basic features. Furthermore, once you have a simple idea about its internals and use it daily, you no longer need that much documentation. If you&#8217;re not sure how to do something, chances are a simple web search will tell you how to do what you wanted to achieve.</p>
<p>Also, as many people know, Mercurial was and is mostly about not modifying the project&#8217;s history, while Git has a lot of commands that directly modify the project&#8217;s history. With time, I&#8217;ve come to realize that modifying the project&#8217;s history is simply more practical in many cases and in a range of situations it leads to less confusion. In my day job, we are slowly moving from CVS to Subversion to manage the sources of a very old and important project, which exists since about 1984. At the same time, we are modifying our work flow here and there to take advantage of Subversion, and we&#8217;re heavily using branching and merging despite the fact that&#8217;s not one of Subversion&#8217;s greatest strengths, as you may know. That&#8217;s giving us some problems and it&#8217;s amazing how many times I caught myself thinking &#8220;this would be much easier if we were using git, because we would simply do this and that and job done&#8221;. Many of those actions would modify the project&#8217;s history and clean it up. I repeat, in real situations with a lot of people working on something and not doing everything exactly as it should be done, it&#8217;s only a matter of time that you miss a Git feature.</p>
<p>The only thing I don&#8217;t like about Git is its staging area. From a technical perspective, the staging area makes a lot of sense, and you can build many neat features based on it. However, one thing is <em>having</em> a staging area and a second thing is <em>exposing it</em> to end users. I think you can have a staging area and all the features it provides while hiding it from users in their most common work flows. Still, it&#8217;s something you get used to and everybody knows that, when your project is a bit mature, you spend way more time browsing the source code, debugging, running it and testing it than actually committing changes to the source tree. The staging area is not a big issue and &#8220;git commit -a&#8221; covers the most common cases.</p>
<p>Apart from Git itself, the move was partly motivated by the site, github.com. When I started using bitbucket.org I liked it a bit more than github.com, but things have changed slowly. github.com fixed a rendering bug that hid part of project top bar, got rid of its Flash-based file uploader and got an internal issue tracker with a web interface that works really really well. The site is very nice and the &#8220;pages&#8221; feature, that allows you to set up a simple web page for the project, is still not provided by bitbucket.org as far as I know. In addition, with the arrival of Firesheep, it quickly moved to using SSL for everything. It&#8217;s fantastic.</p>
<p>bitbucket.org was recently bought by Atlassian and their <a href="https://bitbucket.org/plans">plans</a> are indeed better. For me, however, the number of private repositories and private collaborators is not an issue, because all the projects I host on github.com are public.  Still, it&#8217;s fair to mention their plans because it could be a deciding factor for some people.</p>
<p>I wouldn&#8217;t like to close this article without mentioning the big improvement that both sites bring to the typical free and open source software developer. I still host a few projects on sourceforge.net, and I can tell you I&#8217;m not going back to it despite the great service they have provided for years for which I thank them sincerely.</p>
<p>It&#8217;s been months since I last used it so I apologize if things have changed without me noticing, but back then it was very hard to get your code on sourceforge.net. You didn&#8217;t perceive it was hard because there was no github.com. Once you try github.com or bitbucket.org, you realize how much the process can be simplified. Two key aspects to note. First, the project name doesn&#8217;t have to be unique. It only needs to have a unique name among your own projects, which is much easier and simplifies choosing the project name a lot. Second, once the project is created and has a basic description, without filling any form and without having to wait for anything, you are only a few commands away from uploading your code to the Internet. It can literally take less than 5 minutes to create a project and have your code publicly available, and that&#8217;s fantastic and motivating. You don&#8217;t need to find time to upload your code or thinking if the process is worth it for the size of the project.  You simply do it. That&#8217;s good news for everyone.</p>
<p>Let me finish by apologizing again to anyone for the inconveniences created by the move. I sincerely hope this will remain the project location for many years to come.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/212/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/212/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/212/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/212/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/212/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/212/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/212/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/212/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/212/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/212/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/212/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/212/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/212/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/212/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=212&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2010/11/06/youtube-dl-has-moved-to-github-com/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
		<item>
		<title>iptables rules for desktop computers</title>
		<link>http://rg03.wordpress.com/2010/04/01/iptables-rules-for-desktop-computers/</link>
		<comments>http://rg03.wordpress.com/2010/04/01/iptables-rules-for-desktop-computers/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 11:30:05 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=201</guid>
		<description><![CDATA[Today I will show you the iptables rules I set on my main personal computer, with detailed comments about why I came to use these rules after several years of Linux desktop usage. The rules I use now have been simplified as much as I could and are based on common rules and advice that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=201&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Today I will show you the iptables rules I set on my main personal computer, with detailed comments about why I came to use these rules after several years of Linux desktop usage. The rules I use now have been simplified as much as I could and are based on common rules and advice that can be found on the network and also on input I got from experienced network administrators. I&#8217;ve been using them unmodified for a few years. They are designed for desktop users either directly connected to the Internet or behind a router. They are a bit restrictive in some aspects but we&#8217;ll see you can easily create a few holes for specific purposes. So here they are:</p>
<pre>
# iptables -v -L
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
 663K  905M ACCEPT     all  --  any    any     anywhere             anywhere            state RELATED,ESTABLISHED
  105  6300 ACCEPT     all  --  lo     any     anywhere             anywhere
    0     0 ACCEPT     icmp --  any    any     anywhere             anywhere            icmp destination-unreachable
    0     0 ACCEPT     icmp --  any    any     anywhere             anywhere            icmp time-exceeded
    0     0 ACCEPT     icmp --  any    any     anywhere             anywhere            icmp source-quench
    0     0 ACCEPT     icmp --  any    any     anywhere             anywhere            icmp parameter-problem
    0     0 DROP       tcp  --  any    any     anywhere             anywhere            tcp flags:!FIN,SYN,RST,ACK/SYN state NEW

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
</pre>
<p>We&#8217;ll start by the most obvious rules. The FORWARD chain has a policy of &#8220;DROP&#8221; and no specific rules. A desktop computer isn&#8217;t usually employed as a router or to share an Internet connection, so there&#8217;s no reason in allowing forwarding.</p>
<p>The OUTPUT chain has a policy of &#8220;ACCEPT&#8221; and no rules. Basically, we are allowing everything going out of our computer. While this isn&#8217;t the most secure policy at all, it&#8217;s usually enough for a desktop computer. Many paranoid people would not let everything out. For example, to prevent their computers from being used to send spam due to a mistake somewhere else, sometimes people forbid from sending traffic from the source port 25, or in general from source ports below 1024, where most common services are. We could do that, but I think it&#8217;s not really needed for a desktop computer. We&#8217;ll put more effort blocking incoming traffic, and we can keep a relaxed policy on outgoing traffic.</p>
<p>Finally, the guts of the rules. The INPUT chain has a policy of DROP. That is, everything not explicitly allowed will be forbidden. If anything passes through all the rules, the traffic will be discarded silently without making noise.</p>
<p>The rules in the INPUT chain are sorted according to the typical frequency of hits. &#8220;Popular&#8221; and frequent traffic will be quickly accepted instead of having to check many rules before. That&#8217;s why the first rule is to allow RELATED and ESTABLISHED traffic, for <em>any</em> protocol. The <em>any</em> part is important. This is the rule that, basically, allows us to receive replies and normal traffic for connections we start ourselves. For example, when we open a web page with our web browser, we&#8217;ll send traffic one way and when we receive the reply, the connection will be ESTABLISHED and we&#8217;ll see the reply. This first rule is the most important one because, just due to it, we can use the computer &#8220;normally&#8221;.</p>
<p>The stateful packet firewall in Linux is quite clever and understands established connections even when the underlying protocol has no notion of connections. For example, that first rule allows us to receive DNS replies from queries we made ourselves, using the UDP protocol, or allows receiving ICMP echo replies from our own requests. In other words, we can ping other computers thanks to that rule.</p>
<p>On to the second rule, it looks like it would accept any traffic from anywhere, but the keyword here is <em>lo</em>:</p>
<pre>
  105  6300 ACCEPT     all  --  lo     any     anywhere             anywhere
</pre>
<p>This rule accepts all incoming traffic from interface &#8220;lo&#8221;, which is the loopback interface. This rule allows us to connect to services on our own machine by pointing to 127.0.0.1, or ::1 in IPv6. This rule would allow connecting to the CUPS printing service, for example, if we had a printer connected to our computer. A variant of this rule that can be frequently found on the Internet is to include a further check to verify the destination IP is 127.0.0.1, just to be more paranoid and forbid strange traffic. While this can increase security, I don&#8217;t think you <em>need</em> that further check generally. Just to clarify, browsing unsafe web pages with Javascript and/or Flash is more dangerous than not checking if traffic coming through &#8220;lo&#8221; is really directed to 127.0.0.1, so it&#8217;s not a priority.</p>
<p>Then, you can see I allow some specific types of ICMP packets that usually signal network problems. None of those require a reply to be sent, so we accept them and try to interpret what they would mean if they ever come in. I don&#8217;t think it&#8217;s possible to get anything more than a DoS attack with those rules, but comments are welcome. And, of course, you can be DoS&#8217;ed just by someone saturating you with incoming traffic. Again, this is a matter of getting your priorities sorted. If you feel paranoid, well, drop those rules.</p>
<p>Finally, at the end of the chain we have the famous specific rule to block incoming traffic with state &#8220;NEW&#8221; and the SYN flag not set in TCP. This rule is quite specific and <a href="http://www.faqs.org/docs/iptables/newnotsyn.html">an explanation for it</a> can be found in many iptables manuals, FAQs and tutorials. I put the rule in the end because the first rule is not affected by it, because the second rule isn&#8217;t either (we are allowing ALL traffic coming from &#8220;lo&#8221;, after all), and the ICMP rules are not affected either.</p>
<p>However, we still keep it there even if the traffic was going to be dropped anyway due to the chain policy, because when we want to create a hole in these rules, we do it by adding more rules at the end of the INPUT chain. For example, sometimes I want to allow incoming traffic to a specific port where I have configured a server that is supposed to be reached from other machines, to serve a specific content in a specific point in time. For that, I have created a couple of scripts called &#8220;service-open&#8221; and &#8220;service-close&#8221;, that can be used followed by a list of service names or port numbers. For example, when I start a web server to allow someone in my home network to get a file from my computer, I usually run the command &#8220;service-open 8080&#8243; (the server would be listening on that port). Once the file is served, I run &#8220;service-close 8080&#8243; and shut the server down. Those commands add and remove rules at the end of the INPUT chain, so that&#8217;s why I put the last rule there, so it&#8217;s present before any holes I punch through my firewall in those special cases. If you frequently run a P2P application on your computer, you may want to open a hole permanently to some port and save it as part of your usual rules. I don&#8217;t, so I keep everything closed.</p>
<p>The content of my scripts are:</p>
<pre>
# cat /usr/local/sbin/service-open
#!/bin/sh
if test $# -eq 0; then
        echo usage: $( basename $0 ) service ... 1&gt;&amp;2
        exit 1
fi
while test $# -ne 0; do
        /usr/sbin/iptables -A INPUT -p tcp --dport "$1" -j ACCEPT
        /usr/sbin/iptables -A INPUT -p udp --dport "$1" -j ACCEPT
        shift
done
</pre>
<pre>
# cat /usr/local/sbin/service-close
#!/bin/sh
if test $# -eq 0; then
        echo usage: $( basename $0 ) service ... 1&gt;&amp;2
        exit 1
fi
while test $# -ne 0; do
        /usr/sbin/iptables -D INPUT -p tcp --dport "$1" -j ACCEPT
        /usr/sbin/iptables -D INPUT -p udp --dport "$1" -j ACCEPT
        shift
done
</pre>
<p>Those scripts play nicely with my set of rules because they are designed with my rules in mind. Also, you can see they are dead simple.</p>
<p>With the set of rules I have described, you can use your computer normally, you can easily let more traffic through in specific cases and, more importantly, you&#8217;ll be &#8220;invisible&#8221; on the network. Nobody will know if your computer is really there or not unless you send them traffic or if they found out by other means. And, also, it&#8217;s a very small set of rules and it&#8217;s very easy to remember and understand, and to create scripts that modify it easily.</p>
<p>Edit: The commands needed to create those rules:</p>
<pre>
iptables -P FORWARD DROP
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT
iptables -A INPUT -p icmp -m icmp --icmp-type 11 -j ACCEPT
iptables -A INPUT -p icmp -m icmp --icmp-type 4 -j ACCEPT
iptables -A INPUT -p icmp -m icmp --icmp-type 12 -j ACCEPT
iptables -A INPUT -p tcp -m tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -m state --state NEW -j DROP
iptables -P INPUT DROP
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/201/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=201&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2010/04/01/iptables-rules-for-desktop-computers/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
		<item>
		<title>Poor Dillinger</title>
		<link>http://rg03.wordpress.com/2010/03/20/poor-dillinger/</link>
		<comments>http://rg03.wordpress.com/2010/03/20/poor-dillinger/#comments</comments>
		<pubDate>Sat, 20 Mar 2010 19:32:17 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Communication]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=195</guid>
		<description><![CDATA[I&#8217;ve been enjoying the two Tron Legacy Official Trailers that have been released so far. My first contact with the original Tron movie was not long ago, when an uncle of mine gave me the Collector&#8217;s Edition DVD a few years back as my birthday present. Tron was released before I was born and it&#8217;s [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=195&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been enjoying the two <a href="http://www.youtube.com/watch?v=5y-7-Mt6uYk">Tron Legacy</a> <a href="http://www.youtube.com/watch?v=L9szn1QQfas">Official Trailers</a> that have been released so far. My first contact with the original Tron movie was not long ago, when an uncle of mine gave me the Collector&#8217;s Edition DVD a few years back as my birthday present. Tron was released before I was born and it&#8217;s a very uncommon movie, at least in my country, so I didn&#8217;t have many opportunities to watch it until recently, with the 20th anniversary of the film release and, even more recently, the arrival of its sequel.</p>
<p>The first impression I got from the movie was so-so. It&#8217;s fun and, for a computer engineer, the references to mainframes, IO ports, programs and games are enjoyable. After watching the movie I went directly to disc 2 and watched the documentary on how it was done. It was then when I started enjoying the film much more. The documentary helps you appreciate TRON as the piece of art it really is and all the attention paid to the different details in the movie.</p>
<p>By coincidence, I watched the film last Christmas with a friend of mine and we both share a fun interpretation of its script. TRON is a movie you can enjoy because, as in many other good fantasy and science-fiction films, the bad guys win in the end. Now, before you jump at me and wonder what I&#8217;ve been smoking to say that, just think about it. Do you really think Dillinger and the Master Control Program are the bad guys in the movie? The bad guy is Flynn! And that little program, TRON! It&#8217;s a tragic and realistic story.</p>
<p>A company, ENCOM, and two programmers: Ed Dillinger and Kevin Flynn. Ed is the good guy in the company. Working hard to improve technology in his cubicle until late hours, motivated by the need to create something bigger, better and never seen before. He&#8217;s a shy guy with brilliant ideas and creates a program, called the Master Control Program, based originally on a chess program, with several features that will be a breakthrough in computing history. First, the Master Control Program allows for real multitasking. Programmers don&#8217;t interfere with each other and they no longer have direct access to the computer hardware. The modern operating system is born, also with a built-in firewall to monitor and control connections to and from external systems. Second, this program is powered by an incredibly advanced AI system capable of developing primitive feelings, and also features natural language parsing via audio input and replies in the same language, with a voice synthesizer.</p>
<p>The Master Control Program is amazing and could push ENCOM from being a medium-sized company into a big corporation in every field of technology. However, management are too short-sighted to pay attention to it and the shy guy who created it, and are amused by the extrovert programmer Kevin Flynn. Much younger than Ed Dillinger, as we can see in the film, he enjoys creating video games and breaking into different systems and, with such a personality, the company board is waiting for their golden boy to do something spectacular that will never really arrive, because Flynn uses the company resources to create games that he will keep for himself. He won&#8217;t let the company see the real good games and will be jumping ship as soon as he finds a good deal with a big game publisher.</p>
<p>Dillinger, untalented for creating popular games, sees envy grow at the core of his heart and one day decides to steal the good games from Flynn and presents them to the company board as his work. He shouldn&#8217;t have done that, but poor Dillinger thought that was the only way to get attention from the board. From then on, they finally pay attention to him and he can push the Master Control Program forward as a way to manage the company&#8217;s computing resources and is promoted to the position he really deserves. They even start investigating on teleportation. Of course, TRON (the program) is rejected by Dillinger and the MCP. After all, TRON is redundant and its tasks are already being performed by the MCP. No good engineer would tolerate such an evident duplication in functionality. Alan Bradley suffers from the NIH syndrome.</p>
<p>All that technology never reaches the market because, in the film, we see the bad guys preying on this good guy for his only mistake until he is defeated, his programs are deleted forever from the hard drive and he, probably, he&#8217;s fired from the company.</p>
<p>I&#8217;ll be watching Tron Legacy to follow the adventures of Flynn and the result of his evil and ego-driven plot to control the world with his videogames, unable to realize he lacks the talent Dillinger had. If you watch the trailers released so far you&#8217;ll see Flynn is really evil, as he has always been.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/195/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=195&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2010/03/20/poor-dillinger/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
		<item>
		<title>Managing Linux kernel sources using Git</title>
		<link>http://rg03.wordpress.com/2010/01/20/managing-linux-kernel-sources-using-git/</link>
		<comments>http://rg03.wordpress.com/2010/01/20/managing-linux-kernel-sources-using-git/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 19:53:14 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=183</guid>
		<description><![CDATA[This will be a short and easy tutorial on how to use Git to manage your kernel sources. Before Git, the easiest way to manage your kernel sources was to download the kernel using the provided tarballs from kernel.org and update them downloading the provided patches between releases, which was very important to keep the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=183&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This will be a short and easy tutorial on how to use Git to manage your kernel sources.</p>
<p>Before Git, the easiest way to manage your kernel sources was to download the kernel using the provided tarballs from kernel.org and update them downloading the provided patches between releases, which was very important to keep the download size small, instead of downloading complete tarballs each time. Also, by applying patches, you only needed to rebuild stuff that changed between releases instead of the full kernel once more. This is a good method that can be applied today and will probably never disappear. Simple HTTP and FTP downloads are very convenient in many situations.</p>
<p>However, with the arrival of kernel 2.6, its stable branches (e.g. the 2.6.32.y branch) and Git, there have been some changes. First of all, the process is now a bit more complicated. Stable patches are applied against the base release. If you have the kernel sources for version 2.6.32.1 and want to jump to version 2.6.32.2, you first have to revert the changes of release 2.6.32.1 (<code>patch --reverse</code>) and then apply the 2.6.32.2 patch. Slightly less convenient and, furthermore, you&#8217;ll modify every file that changed with every patch until that moment. This will affect the compilation process that would follow afterwards. In other words, if patch 2.6.32.1 meant (hypothetically speaking) a long build because it changed stuff that affected a lot of systems, so will be the build process for any other subsequent release in the 2.6.32.y branch. It was this small glitch that prompted me to manage my kernel sources the way I&#8217;m going to describe. Also, using Git is fun. :)</p>
<p>We will try to achieve the following:</p>
<pre>
-------------------------------------------------&gt; Linus Torvalds' master branch
           \                   \
            \                   \
             A stable release    Another stable release
</pre>
<p>We will have a master branch that will follow Torvalds&#8217; master branch and will be updated from time to time, or when he releases a new stable version of the Linux kernel (e.g. 2.6.32).</p>
<p>We will have other local branches that follow the stable releases by Greg K-H (e.g. 2.6.30.y, 2.6.31.y, 2.6.32.y, etc).</p>
<p>Git is very flexible and simple, and allows more than one way to do things. I will try to explain why I do things this way and why they make sense to me, and will try to avoid shortcuts, i.e. I will use one command for each action even if two actions could be compressed into a single command.</p>
<p>First, we will create a directory to hold the kernel sources. Let&#8217;s name it /path/to/kernel. In it we&#8217;ll have a directory named &#8220;src&#8221; that will hold the unmodified kernel sources and a second directory named &#8220;build&#8221; that we&#8217;ll use to build the kernel and keep the sources intact, for clarity. We start by cloning Torvalds&#8217; branch:</p>
<pre>
cd /path/to/kernel
mkdir build
git clone 'git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git' src
</pre>
<p>This will create a directory named &#8220;src&#8221; with the sources. Take into account you&#8217;ll be downloading the full repository with a lot of revision history. It&#8217;s a relatively long download that requires a lot of patience or a good broadband connection. Whatever you have at hand. At the moment I&#8217;m writing this, it&#8217;s several hundred MBs but less than 1 GB, if I recall correctly.</p>
<p>If you issue a &#8220;git branch&#8221; command you&#8217;ll see you only have a local branch named &#8220;master&#8221;. This local branch follows Torvalds&#8217; master branch. You can update your kernel sources when you are in this branch issuing a simple &#8220;git pull&#8221; command.</p>
<p>Now, we will add a second local branch to follow the stable 2.6.32.y kernel. In other words, our master branch follows Torvalds&#8217; master branch and our &#8220;branch_2.6.32.y&#8221; (let&#8217;s call it that way) will have to follow the master branch in the stable 2.6.32.y repository.</p>
<p>First, we create a shortcut to the 2.6.32.y repository for convenience:</p>
<pre>
git remote add remote_2.6.32.y \
    'git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.32.y.git'
</pre>
<p>The name &#8220;remote_2.6.32.y&#8221; is arbitrary. At this moment, that name is only like an alias for that long URL and barely anything more. The next step is very important so that the name becomes something more and the following git commands understand what you mean when you use it. It will download data to your repository under that name.</p>
<pre>
git fetch remote_2.6.32.y
</pre>
<p>After you run that, which will take considerably less time that the full repository clone we did previously, remote_2.6.32.y will have a meaning in your hard drive. You can then use the following command:</p>
<pre>
git branch --track branch_2.6.32.y remote_2.6.32.y/master
</pre>
<p>This will create a new branch in your local repository that will be tracking the master branch at the 2.6.32.y repository. If you issue a &#8220;git branch&#8221; command you&#8217;ll now see you have two branches. Being a &#8220;tracking branch&#8221; means several things. You can change between the master branch and the new branch using &#8220;git checkout &lt;branch name&gt;&#8221; and, in each branch, you can perform a simple &#8220;git pull&#8221; to retrieve changes to that branch from the remote repository. From this point you&#8217;re on your own using Git to manage the sources and perform more operations if you need them, but the many tutorials available on the web will get you going in the basics of Git and that&#8217;s the only thing needed to manage the kernel sources with the only purpose of easing the downloading and building process.</p>
<p>Note that, between release 2.6.32.1 and 2.6.32.2, for example, you will only download the changes between those releases and a painful build for 2.6.32.1 does not have to mean a painful build for 2.6.32.2 if you update your sources this way.</p>
<p>Finally, we had created a &#8220;build&#8221; directory previously, in parallel to the &#8220;src&#8221; directory, in order to keep the sources directory clean. We can use this directory easily. When we are at the &#8220;src&#8221; directory, any &#8220;make command&#8221; we use can and would have to be replaced by &#8220;make O=../build&#8221;. To avoid mistakes, I have created a global alias in my system called &#8220;kmake&#8221;, aliased precisely to &#8220;make O=../build&#8221;. It affects the regular user account that I use to compile the kernel sources and the root account that I use in the installation step, to perform the &#8220;modules_install&#8221;, &#8220;firmware_install&#8221; and &#8220;install&#8221; operations.</p>
<p>As a regular user account:</p>
<ul>
<li>kmake menuconfig</li>
<li>kmake</li>
<li>kmake oldconfig</li>
<li>etc</li>
</ul>
<p>As the root account:</p>
<ul>
<li>kmake modules_install</li>
<li>kmake firmware_install</li>
<li>kmake install</li>
</ul>
<p>These aliases could be tuned further to install the kernel image, modules, firmware, etc to a sandbox directory if you intend to create packages with them, for example. The README file in the kernel sources directory has more information about this topic.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/183/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/183/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/183/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/183/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/183/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/183/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/183/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/183/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/183/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/183/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/183/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/183/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/183/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/183/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=183&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2010/01/20/managing-linux-kernel-sources-using-git/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
		<item>
		<title>When your hobby becomes a job: reflections on the em28xx driver situation</title>
		<link>http://rg03.wordpress.com/2009/12/15/when-your-hobby-becomes-a-job-reflections-on-the-em28xx-driver-situation/</link>
		<comments>http://rg03.wordpress.com/2009/12/15/when-your-hobby-becomes-a-job-reflections-on-the-em28xx-driver-situation/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 19:59:14 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=179</guid>
		<description><![CDATA[More than one year ago I bought a TV USB stick to be able to watch analog and digital TV in my computer running Linux. It was not an easy task. As you may know, usually it&#8217;s not hard to find hardware that is supported by Linux. Sometimes, however, while there are multiple devices supported [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=179&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>More than one year ago I bought a TV USB stick to be able to watch analog and digital TV in my computer running Linux. It was not an easy task. As you may know, usually it&#8217;s not hard to find hardware that is supported by Linux. Sometimes, however, while there are multiple devices supported that would serve your purposes, the trouble will be locating a place or site that will have one of those models available for you to buy. This was my case. I printed a list of supported digital and/or analog TV tuner USB devices and went to most computer stores and malls in my area trying to locate at least one of them and compare prices, and I went back home with hands empty.</p>
<p>I had to change the strategy and get the list of devices I could buy, and then search for them on the Internet, trying to know if any of them were supported by an out-of-tree driver or something similar. After a couple of returns, thanks to some manufacturers changing devices internally while keeping the product name unchanged, I finally arrived home with a working, hybrid, TV USB stick, the Pinnacle PCVT Hybrid Pro Stick, sold in some countries as model 330e. It costed just over 100 euros.</p>
<p>My main target being digital TV, I quickly got it working with an out-of-tree driver by Markus Rechberger. This out-of-tree driver was part of a project that tried to create the possibility of having user-space tuners for TV cards. While I am nobody to judge if that&#8217;s a good or bad idea, it was different enough to not make it into the main kernel tree. The author, then, appeared to change the approach and created a different out-of-tree driver called &#8220;em28xx-new&#8221;, based on the in-kernel &#8220;em28xx&#8221; driver that he had already contributed. This driver used a more traditional approach, and worked like a charm too. Unfortunately, it never made it into the vanilla kernel either, for whatever reasons.</p>
<p>I contacted Markus Rechberger a couple of times, if I recall correctly. I thanked him for his efforts and time put into creating the driver and asked a couple of questions once, and also sent him a patch for the build scripts some time later. I don&#8217;t recall if the patch was applied or not. He was always very nice and polite.</p>
<p>However, one day I had just compiled a new kernel and was about to build the driver for it. Before doing that, I always downloaded the latest copy of the driver source code from its Mercurial repository. This time, when I ran Mercurial it exited with a confusing error message, saying the remote tree was not the same repository I had in my hard drive. I supposed the author would have created a new repository for the driver, so I cloned it to a new directory. It turned out there was only a README file in the repository. I opened it and&#8230; uh oh. A note saying the old driver had been pulled from the Internet and giving a URL that led to the web site of a TV card manufacturer offering products that were supposedly supported by Linux. The equivalent USB stick costed just about 100 euros, like the one I had. But, of course, it was too late for me to return the one I had bought. I had been using the device for months.</p>
<p>I searched on the Internet again trying to find the reason that led to the driver being pulled from the web site, and everything I got was <a href="http://www.mathematik.uni-marburg.de/~kosslerj/em28xx-new/">the site of an Arch Linux user</a> that uploaded the latest version he got from the repository and even offers some patches to make the code work with more recent kernels. However, as of the time I&#8217;m writing this, the latest patch is for kernel 2.6.30 and the driver does not compile for the recently released kernel 2.6.32. So the status of this device is that it works, but only if you have a specific kernel version. At the top of that page, you can see a huge banner that reads like this:</p>
<blockquote><p>DISCLAIMER: Don&#8217;t bother me or the original author, Markus Rechberger, with any questions about problems with this driver, because Markus Rechberger deleted it because of these questions and because I just host these files.</p></blockquote>
<p>I thought the driver may have been pulled from the Internet for some kind of legal reasons, but the disclaimer suggests a different reason. I don&#8217;t know if I buy the reason. I&#8217;m not sure it&#8217;s entirely credible but there&#8217;s no point in not believing those words are true. Markus Rechberger, for all we know, got burned out maintaining the driver and decided not to maintain it any longer.</p>
<p><a href="http://lwn.net/Articles/306601/">A story published months ago at lwn.net</a> explains this case with more details and further information. The situation for people owning this device and wanting to use it under a recent kernel is that you are supposed to be using the in-kernel em28xx driver. However, as the <a href="http://www.linuxtv.org/wiki/index.php/Pinnacle_PCTV_Hybrid_Pro_Stick_%28330e%29">linuxtv.org page for the device</a> says, the difficulty in supporting digital TV for it has its source in the Micronas DRX3975D DVB-T chipset it features. This chipset already has an in-kernel driver, which can be located at <em>Device Drivers &gt; Multimedia support &gt; DVB/ATSC adapters &gt; Customize the frontend modules to build &gt; Customize DVB Frontends &gt; Micronas DRX3975D/DRX3977D based</em>. The location may change in the future (2.6.32 as I&#8217;m writing this).</p>
<p>Unfortunately, the driver cannot be used by now. As its help text mentions, this driver needs external firmware which currently cannot be obtained. Marked as &#8220;TODO&#8221; in the help text, you are told to run &#8220;&lt;kerneldir&gt;/Documentation/dvb/get_dvb_firmware drx397xD&#8221;. But, if you try, you&#8217;ll get an error saying that drx397xD is not a know component.</p>
<p>It&#8217;s an appropriate moment to thank and encourage the developers that are working on this, being the last missing piece. <a href="http://www.kernellabs.com/blog/?p=761">Devin Heitmueller has done a good job trying to keep people up-to-date with information on the progress and the difficulties encountered.</a> The last comment on that blog post is from December 6 and says:</p>
<blockquote><p>Unfortunately, at this point the answer is “not right now”. I’m waiting for the DVB generator to arrive, at which point I should be able to complete the work.</p></blockquote>
<p>Again, thanks for working on this, keep up the good work and we&#8217;re eager to make our 330e USB cards work again with recent kernels, Devin!</p>
<p>While reflecting on the driver situation and putting together the different pieces of this soap opera, it all reminded me of the situation we professional programmers face from time to time while maintaining open source software. Many of us really love programming and we have tried to make it our job, successfully. There&#8217;s a difference, however, when you change from student to professional programmer.</p>
<p>When you are a student, you have a lot of time in your hands. It&#8217;s a wonderful experience going to college and learning new things everyday, buy books, read about different languages and technology, and the amount of spare time to learn and have fun programming is incredible. Later, however, you become professional and you start working for a company in a full time job. You leave home before dawn everyday and, at least in winter and in my case, you arrive home after sunset. It&#8217;s incredibly depressing if you think about it. You spend the day coding, fixing issues in programs, debugging, testing, etc. This kind of life doesn&#8217;t make it impossible to enjoy programming again, but if you arrive home and find that you have a popular open source program in your hands with users reporting bugs and requesting new features you may feel as if you were still at the office.</p>
<p>My advice here is obvious. Don&#8217;t stop coding in your spare time, but do it for fun. If you don&#8217;t feel like adding a new feature someone requested, don&#8217;t add it. It&#8217;s very important to say &#8220;no&#8221; often so your program will still be your program, the product you wanted. If a user or a group of users are still in the fortunate situation in which they are students and have a lot of spare time, they can always fork your code. That is the beauty of free and open source software.</p>
<p>I couldn&#8217;t care less if some people would like to use all this text above to attack FOSS and say bad things about it: non-working drivers, unresponsive maintainers, lack of documentation, user unfriendliness. Mental health of the people writing the code is more important. Don&#8217;t burn out. Produce something and let others make better things out of it if you don&#8217;t have the time. Start new projects all the time. Handle maintenance of old projects to new people. Have fun. Enjoy. Code. Help others. Submit patches with bug reports if possible. Appreciate the effort of others and thank them for the work they provide you. Try to be kind and explain your users the reasons behind your &#8220;noes&#8221;.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/179/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/179/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/179/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/179/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/179/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/179/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/179/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/179/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/179/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/179/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/179/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/179/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/179/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/179/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=179&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2009/12/15/when-your-hobby-becomes-a-job-reflections-on-the-em28xx-driver-situation/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
		<item>
		<title>New tiny project: lddsafe</title>
		<link>http://rg03.wordpress.com/2009/11/01/new-tiny-project-lddsafe/</link>
		<comments>http://rg03.wordpress.com/2009/11/01/new-tiny-project-lddsafe/#comments</comments>
		<pubDate>Sun, 01 Nov 2009 08:27:30 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=174</guid>
		<description><![CDATA[Some days ago we could all read that &#8220;ldd&#8221;, a tool which prints shared library dependencies, should not be run on untrusted binaries. I read it first on Hacker News and later it hit Slashdot&#8217;s frontpage. In some operating systems, this is stated clearly in the man page for the program, while in others it&#8217;s [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=174&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Some days ago we could all read that &#8220;ldd&#8221;, a tool which prints shared library dependencies, <a href="http://www.catonmat.net/blog/ldd-arbitrary-code-execution/">should not be run on untrusted binaries</a>. I read it first on Hacker News and later it hit Slashdot&#8217;s frontpage. In some operating systems, this is stated clearly in the man page for the program, while in others it&#8217;s not mentioned at all. I belonged to the camp that didn&#8217;t know about it and I was a bit surprised. I supposed ldd was doing its job by examining the binary file and not by running it setting some special environment variables.</p>
<p>A Hacker News user, anyway, pointed out something interesting. You can easily get information about the needed shared library dependencies for a program or library using &#8220;objdump&#8221;, so I spent a few hours writing and tweaking a small script called <a href="http://github.com/rg3/lddsafe">lddsafe</a> that prints almost the same information as &#8220;ldd&#8221; using &#8220;objdump&#8221; and avoiding the security problems, as it doesn&#8217;t have to run the program. Two major caveats at this point in time:</p>
<ul>
<li>It requires bash and, more specifically, bash version 4 or later. I needed to use associative arrays to make the program reasonably fast and they are only available in bash 4.</li>
<li>It&#8217;s only been tested under Slackware Linux. However, bug reports and patches are welcome if it doesn&#8217;t run properly in other distributions.</li>
</ul>
<p>Future improvements may include rewriting it in Perl so as not to require bash 4, knowing that Perl is present in most Unix systems.</p>
<p>A picture is worth a thousand words:</p>
<pre>$ lddsafe /usr/bin/xcalc
        libXaw.so.7 =&gt; /usr/lib/libXaw.so.7
        libXmu.so.6 =&gt; /usr/lib/libXmu.so.6
        libXt.so.6 =&gt; /usr/lib/libXt.so.6
        libSM.so.6 =&gt; /usr/lib/libSM.so.6
        libICE.so.6 =&gt; /usr/lib/libICE.so.6
        libc.so.6 =&gt; /lib/libc.so.6
        ld-linux.so.2 =&gt; /lib/ld-linux.so.2
        libuuid.so.1 =&gt; /lib/libuuid.so.1
        libX11.so.6 =&gt; /usr/lib/libX11.so.6
        libxcb.so.1 =&gt; /usr/lib/libxcb.so.1
        libXau.so.6 =&gt; /usr/lib/libXau.so.6
        libXdmcp.so.6 =&gt; /usr/lib/libXdmcp.so.6
        libdl.so.2 =&gt; /lib/libdl.so.2
        libXext.so.6 =&gt; /usr/lib/libXext.so.6
        libXpm.so.4 =&gt; /usr/lib/libXpm.so.4
        libm.so.6 =&gt; /lib/libm.so.6
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/174/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=174&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2009/11/01/new-tiny-project-lddsafe/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
		<item>
		<title>GDB now supports stopping on system calls</title>
		<link>http://rg03.wordpress.com/2009/10/09/gdb-now-supports-stopping-on-system-calls/</link>
		<comments>http://rg03.wordpress.com/2009/10/09/gdb-now-supports-stopping-on-system-calls/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 18:41:08 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=170</guid>
		<description><![CDATA[One of the best moments in my professional career, from a pure personal perspective, came about 6 weeks ago when I was able to find out the cause of a memory leak one of our programs was suffering, and it turned out to be a problem in a standard library function completely unrelated to memory [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=170&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>One of the best moments in my professional career, from a pure personal perspective, came about 6 weeks ago when I was able to find out the cause of a memory leak one of our programs was suffering, and it turned out to be a problem in a standard library function completely unrelated to memory management. I am very proud of that moment because it took me a lot of time to find the problem and I had to apply a good amount of knowledge I did not expect to apply as a professional. When I could finally prove the memory leak was in the library function, using a short (20 lines) demonstration program, I felt simply happy. The drama was over.</p>
<p>Our program has soft real-time requirements and is written mostly in Ada, with a few calls to C library functions via pragmas. Due to the nature of the program, it avoids memory allocation as frequently as possible and uses static-sized arrays for its data structures, and O(1) algorithms whenever possible to behave properly in the context it is being used.</p>
<p>I will completely skip the part of the story dealing with analysis of the source code in vain to find where the program was leaking memory, but I can tell you it took a lot of time and did not give any positive results, making us quite angry and desperate. We would have been unable to find the problem this way. As I said before, the leak was in a standard library function from an expensive software development kit for Ada, and had nothing to do with memory allocation functions. To be more precise, the memory leak was in <em>Text_IO.Reset</em>, a procedure that resets the state of a text file, very similar to <em>rewind</em> in C. I will head to the final steps, what I considered interesting.</p>
<p>The program runs on Solaris, so we monitored the process using <em>pmap</em>. This gave us precise information and told us clearly that the memory region that was growing was the heap, where memory allocation happens. I thought that, if our program was barely doing any memory allocation operations, and normally it should be doing none, according to the code, we had a good chance of catching it leaking memory. When a program in Unix needs more memory, it calls either <em>mmap</em>, <em>brk</em> or <em>sbrk</em>. I could not come up with more system calls that allocated memory. Normally, when you program in C or C++ you use <em>malloc</em>, <em>free</em>, <em>new</em> and <em>delete</em>. These language operators or library functions in turn manage memory blocks but request more memory to the operating system with the previous system calls I mentioned. It is explained in many books and tutorials over the Internet but I would say, and maybe I am wrong, that it is not exactly common knowledge.</p>
<p>My first approach, which did not work, involved creating a shared library that I would load using LD_PRELOAD, which intercepted calls to <em>sbrk</em>, <em>brk</em> and <em>mmap</em>. When intercepted, it would call <em>pstack</em> on the current process (a program that prints the call stack of any process given its pid), save the call stack to a text file and proceed with the normal system call. Hackish and clever, I thought while laughing like a maniac when I was coding that. Well, that did not work, I repeat. While the program was indeed calling <em>sbrk</em> as confirmed by <em>truss</em> (for Linux users, <em>truss</em> is very similar to <em>strace</em>), it was not calling, apparently, any function called <em>sbrk</em> that I could intercept. I created a test program to see if my library worked and it did, but it did not create any stack trace for the program in question.</p>
<p>Still, I had already started using <em>truss</em> to verify the program was allocating memory with <em>sbrk</em>, so I dived into the <em>truss</em> manual to see if I could use it for something else. This way I discovered that <em>truss</em> was able to stop the program execution when it made any system call I specified. My new approach was, then, tracing the program with <em>truss</em>, stopping in <em>sbrk</em>, then calling <em>pstack</em> on the PID and then telling the program to continue running. This almost worked. The printed stack did not have any symbols, probably due to the Ada compiler not populating the executable file with the debugging information as the C compiler did. So close yet so far. Our programs were indeed compiled with debugging information, and a minor change to the strategy was enough. Instead of printing the stack with <em>pstack</em>, I would attach the Ada debugger to the program and print the call stack. This way, I finally witnessed the program leaking memory in what seemed to be a call to <em>Text_IO.Reset</em>.</p>
<p>I thought this could be wrong, so I created a test program that read a file over and over again, calling <em>Text_IO.Reset</em> when reaching EOF. The test program, indeed, leaked memory at an alarming rate. Case closed, smile and surprise in my face. Well, to be honest, we replaced the calls to <em>Text_IO.Reset</em> with something else and tested again, to confirm the program had stopped leaking memory. But I already knew the problem had been found after running the test program.</p>
<p>When I came home I wondered if I could have done something similar in my Linux system. I read the man page for <em>strace</em> to see if it could stop programs when a specific system call was made, but I found no way of doing so. Apparently, the solution and the strategy I had employed was Solaris (or maybe UNIX) specific. Several Google searches did not give me any clue about doing the same in Linux.</p>
<p>Yesterday, however, GDB 7.0 was released. I took a look at the new features and I found this little sentence at the end of it:</p>
<blockquote><p>  * New command to stop execution when a system call is made</p></blockquote>
<p>According to the documentation, you only need to <a href="http://sourceware.org/gdb/current/onlinedocs/gdb_6.html#SEC36">set a cathpoint for the program</a> like <em>catch syscall sbrk</em> to achieve the same. Two months ago, I would have read the features and forgotten them five minutes later. But, yesterday, I again smiled, then laughed like a maniac and shouted &#8220;BEGONE, MEMORY LEAKS!!!&#8221;.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/170/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=170&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2009/10/09/gdb-now-supports-stopping-on-system-calls/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
		<item>
		<title>GNU grep is slow on UTF-8</title>
		<link>http://rg03.wordpress.com/2009/09/09/gnu-grep-is-slow-on-utf-8/</link>
		<comments>http://rg03.wordpress.com/2009/09/09/gnu-grep-is-slow-on-utf-8/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 20:23:55 +0000</pubDate>
		<dc:creator>rg3</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://rg03.wordpress.com/?p=164</guid>
		<description><![CDATA[Update on 2010/10/28: GNU grep is no longer slow on UTF-8. The problem was fixed with the release of GNU grep 2.7. The rest of the article can now be considered obsolete. Thanks to someone on the ##slackware FreeNode IRC channel that mentioned the problem some weeks ago, I discovered that GNU grep is very [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=164&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Update on 2010/10/28: GNU grep is no longer slow on UTF-8. The problem was fixed with the release of GNU grep 2.7. The rest of the article can now be considered obsolete.</strong></p>
<p>Thanks to someone on the ##slackware FreeNode IRC channel that mentioned the problem some weeks ago, I discovered that GNU grep is very slow when working on UTF-8 files, and possibly other Unicode encodings. This, apparently, is <a href="http://savannah.gnu.org/bugs/?14472">a long-standing bug</a> that hasn&#8217;t been officially fixed yet. The problem manifests itself when you run <em>grep</em> using locale settings that involve using UTF-8. Let&#8217;s see the following example:</p>
<pre>
$ echo $LANG
en_US.UTF-8
$ time grep '^....' /usr/share/dict/words &gt;/dev/null 

real    2m16.795s
user    2m10.536s
sys     0m0.087s
$ export LANG=C
$ time grep '^....' /usr/share/dict/words &gt;/dev/null 

real    0m0.031s
user    0m0.028s
sys     0m0.003s
</pre>
<p>In the previous text, /usr/share/dict/words is a file part of the <em>bsd-games</em> package in my Slackware system. It contains a list of English words and it&#8217;s not too long. It has below 40000 lines, each line having a word, and weights about 345 KB. Still, as you can see in the previous example, it takes more than 2 minutes in my computer to search for words having at least 4 characters. When I change my locale settings to &#8220;C&#8221; (ASCII), it only takes 31 milliseconds. The difference is amazing. Does <em>grep</em> behave differently in both cases? The answer is yes.</p>
<p>When <em>grep</em> runs in UTF-8 mode, the dot character, for example, represents any multi-byte character, while in ASCII mode the dot represents a single byte. See for example the following, using an accented Spanish character to form a 5-letter word.</p>
<pre>
$ echo ámbar | LANG=C grep '^.....$'
$ echo ámbar | LANG=en_US.UTF-8 grep '^.....$'
ámbar
</pre>
<p>The <em>á</em> character is represented using two bytes in UTF-8. Using the UTF-8 locale, grep correctly identifies it as a single character. Hence, my search for a 5-character word inside the file correctly returns 1 result. With LANG=C, no results are found. This feature is not, however, worth making <em>grep</em> so slow.</p>
<p>If you try to reproduce the problem above, probably you will not succeed, at least in your Linux system. This is because most Linux distributions are well aware of the problem and ship a patched GNU grep, and have been doing so for years. <a href="http://patch-tracking.debian.net/package/grep/2.5.3~dfsg-6">Debian</a> does it (and with it, Ubuntu), <a href="http://repos.archlinux.org/viewvc.cgi/grep/repos/core-i686/">Archlinux</a> does it, <a href="http://cvs.fedoraproject.org/viewvc/rpms/grep/F-9/">Fedora</a> does it, etc. Other distributions like Slackware traditionally ship software as vanilla as possible, and the problem shows, as seen above. Slackware&#8217;s GNU grep is completely vanilla. Most distributions use slightly different versions of the same patch, which replaces the MBS (Multi-Byte Sequence) treatment almost completely.</p>
<p>In my most recent scripts, I avoid GNU grep altogether, and use the fantastic and very efficient PCRE library (Perl Compatible Regular Expressions), used by many open source software projects (e.g. the Apache web server). The <em>pcre</em> package is present in most Linux distributions and BSD ports systems. It will probably ship the <em>pcregrep</em> tool inside. This is an alternative <em>grep</em> which features compatibility option-wise with the most common POSIX and GNU options, like -n, -l, -r, -w, etc. It expects, however, a Perl regular expression. They are, in the most common cases, like every other regular expression syntax out there, but closer to egrep than grep. By default, <em>pcregrep</em> behaves like grep with the LANG=C locale, even if your locale specifies that you are using UTF-8. It&#8217;s this fast:</p>
<pre>
$ time pcregrep '^....' /usr/share/dict/words &gt;/dev/null 

real    0m0.061s
user    0m0.042s
sys     0m0.003s
</pre>
<p>A bit slower than <em>grep</em> with C locale, yes, but not a problem. In addition, you can activate UTF-8 mode to enable compatibility with multi-byte characters by using the -u option, explicitly. In this mode, <em>pcregrep</em> is not much slower:</p>
<pre>
$ time pcregrep -u '^....' /usr/share/dict/words &gt;/dev/null

real    0m0.068s
user    0m0.049s
sys     0m0.002s
</pre>
<p>Of course, it&#8217;s able to behave correctly in the previous UTF-8 test with the -u flag:</p>
<pre>
$ echo ámbar | pcregrep -u '^.....$'
ámbar
</pre>
<p>Moving away from GNU grep to pcregrep is not a bad option. You get consistently fast behavior, regular expression syntax compatible with Perl, and get to choose if you want UTF-8 compatibility or not by providing an explicit option. So long, GNU grep! Welcome, pcregrep!</p>
<p>Final note: GNU awk suffers from this problem too, but its behavior with a UTF-8 locale is more or less equivalent to a patched <em>grep</em>. Still a bit slow, though.</p>
<pre>
$ time awk '/^..../' /usr/share/dict/words &gt;/dev/null

real    0m0.373s
user    0m0.342s
sys     0m0.003s
$ export LANG=C
$ time awk '/^..../' /usr/share/dict/words &gt;/dev/null

real    0m0.075s
user    0m0.055s
sys     0m0.002s
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rg03.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rg03.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rg03.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rg03.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/rg03.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/rg03.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/rg03.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/rg03.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rg03.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rg03.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rg03.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rg03.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rg03.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rg03.wordpress.com/164/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rg03.wordpress.com&amp;blog=654644&amp;post=164&amp;subd=rg03&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://rg03.wordpress.com/2009/09/09/gnu-grep-is-slow-on-utf-8/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">rg03</media:title>
		</media:content>
	</item>
	</channel>
</rss>
