Quit Bugging Me: Some debugging methodology by example

Code
Laurence Liss
Laurence Liss

I was recently given an interesting assignment. One of our clients had found a bug in Print module. If you're not familiar with it, Print module is a simple tool that gives you links on your content to create a printer friendly page from otherwise color and complex content. It also allows you to email content to a friend and create PDFs of the content.

The bug that had been discovered was that certain paths aliases in Drupal would result in the print module's page returning a nasty MySQL error. It is unlikely that this could be used to exploit Drupal but it seemed better to simply fix the issue.

The paths that were causing the error looked like this:

http://mysite.com/print/1e309

That may seem a strange path name and surely it is an edge case, but it is still a flaw that this path should return an error. Being diligent coders we decided to track down the issue and squash it if possible. In the process I also had to rely on some knowledge of PHP and MySQL that I thought might be helpful to share with others who work with similar systems and have to find code problems.

Here's a picture of the error:

First I had to reproduce the error. Reproducing the error is the first and arguably most important step when it comes to fixing a bug. If you can't reproduce the error you can't start to guess at what conditions cause it to happen.

I downloaded print module and set up a Drupal 6 test site with a bare minimum of modules enabled to ensure I didn't get any interference from other possible code issues. I then created a node with a path that was exactly the same as the one that our client had used to find the issue:

http://mysite.com/node/1e309

Then I visited the page and clicked the “printer friendly version” link at the bottom of the page. No error. In fact the page looked fine. I noticed that the page I was directed to didn't have the right path. It was taking me to

http://mysite.com/print/2

So I realized that Print module had some settings that were in play on the client site and not on my test site. When I visited the configuration page for Print module at admin/settings/print I found a checkbox labeled “Use URL alias instead of node ID.” After enabling this and then revisiting the print page, found myself at http://mysite.com/print/1e309 and I was able to get the error. Step one completed.

So what was causing this issue. I tried a few other paths and a few other nodes. Here were my findings: The error would occur on http://mysite.com/print/1e309 even if http://mysite.com/node/1e309 did not exist, so I knew the error wasn't a problem coming from the node itself.

The error would not occur on other gibberish paths like “123”, “1hello2”, or “1f309” so I knew that the error was somehow related to the specific path or a property of the specific path.

This part is a little tricky. If you're a math nerd (as I myself am) you may notice something interesting about the alias “1e309”. This particular sequence of characters is common scientific notation for the number 1 * 10 ^ 309, that is 10 raised to the 309th power. It's a number, and a really big number at that. To test my suspicion I entered some other paths in scientific notation.

  • 1e10 – no error
  • 1e100 – no error
  • 1e308 – no error
  • 1e310 – error

So my suspicion led me to guess that the string “1e309” was being interpreted as a number and that it was too big to be properly handled by either MySQL or PHP. My first guess was that it was too big for MySQL so I enabled query logging in MySQL and went to see what queries were run when I visited the suspect URL.

Query logging is a very useful tool for debugging. When enabled every query that gets passed to MySQL gets recorded in a log. It's the best way to look for problems you think may originate in the database because it always logs the raw queries and makes it easy to spot errors in them. We work on virtual Ubuntu Linux servers for development. To enable query logging in this environment I go to /etc/mysql and then edit the my.cnf file (sudo vim my.cnf). You need to un-comment the following two lines by removing the # symbol: #

general_log_file = /var/log/mysql/mysql.log # general_log = 1

and then restart MySQL (sudo /etc/init.d/mysql restart). Once this is done, all the queries that Drupal runs as part of its normal operations will be captured. Note that the first line you un-commented designates where the log file is written. So change to that directory and take a look at it. For me it found what I was looking for pretty quickly:

SELECT n.nid, n.type, n.language, n.uid, n.status, n.created, n.changed, n.comment, n.promote, n.moderate, n.sticky, n.tnid, n.translate, r.vid, r.uid AS revision_uid, r.title, r.body, r.teaser, r.log, r.timestamp AS revision_timestamp, r.format, u.name, u.picture, u.data FROM node n INNER JOIN users u ON u.uid = n.uid INNER JOIN node_revisions r ON r.vid = n.vid WHERE

n.nid = INF

What this says to me is that the problem isn't with MySQL because the value that MySQL is receiving from PHP is “INF” rather than a logically formed number. Now that you've narrowed the problem and no longer need query logging you should turn it off as it will generate a huge amount of data that you probably don't want. I left this on for a week once and filled an entire hard drive before I realized what the problem was. So re-comment those two lines and then restart MySQL again.

Knowing that the error was in PHP I did a quick search of the PHP documentation and found that the largest sizes for numbers are variable depending on your system but I also found that when an integer passes it's maximum size it automatically recasts as a float, and when a float passes it's maximum size it is automatically reset as the number INF (infinity).

http://www.php.net/manual/en/language.types.integer.php

http://www.php.net/manual/en/language.types.float.php

I wrote a small PHP script to test this:

$x = 1e309;
if (is_numeric($x)) {
echo "x is a number\n";
}
if (is_infinite($x)) {
echo "x is infinite\n";
}
echo $x . "\n";

The output of this was:

x is a number

x is infinite

INF

Further testing I changed the value of x to 1e308 and got the following result:

1.0E+308 is a number

1.0E+308

So to recap, here's what we know:

  1. Certain paths are causing errors
  2. These paths are formed with a number followed by the letter e followed by more numbers
  3. These paths are being interpreted as numbers
  4. Numbers that are too large in this context are being converted to INF by PHP
  5. When PHP creates a query for a node looking for the nid of INF an error is created.
  6. Other textual paths don't cause errors.

This tells me that at some point there is a test in the Print module that is looking for numeric entries, if it finds a number it assumes that the path refers to a node id rather than a user generated path. And if we add a another test in those places we should be able to stop the error.

I found a few places in the module with tests similar to the following:

if (is_numeric($path)) {

and so I replaced each one with a more restrictive test:

if (is_numeric($path) && is_finite($path)) {

Behold, the problems seems to be fixed. I rolled a patch and posted an issue to the issue queue (http://drupal.org/node/1196588).

My patch worked but as you can see from the thread, the module maintainer had a better solution, replacing is_finite with ctype_digit() which checks that all the characters are digits. That's the beauty of open source software, one answer may pave the way to a better solution.

Understanding the environment you rely on for your work is essential to the debugging process. While I don't expect many of you to have the same problem that I did, I thought this issue did cause me to run through a standard list of tests to track down and resolve errors and hopefully it will give some insight into how to work on problems you may be facing in your own development.

Ready to get started?

Tell us about your project