Stringing me along

Laurence Liss
Laurence Liss

Comparing strings is one of those programming tasks I tend to dread. There's so much room for error and variation that I never feel like I have a clean solution that covers all the possible problems that can arise. Just today I spent an excessive amount of time trying to figure out why “Alabama” didn't equal “Alabama”. I thought I'd go through some of the steps that got me to a solution. It's pretty clear just by looking at the output of the strings that they are, from a human standpoint, the same. So what to do next? My first thought was to check the length of the strings and see if they were in fact equal. A simple made it pretty clear the something was awry. The output was: 7 8 So what to do next? I turned to the assumption that the there was probably an extra space at the beginning or end of the line, which is a fairly common problem. A nice way to check this without altering the string is to use the sub-string function to grab the last character and run a test against it. You can use substr($string, 0, 1) to get the first character in a string and substr($string, -1, 1) to get the last. That turned out to be a dud. So now I just wanted to test if the problem was actually a space at all. You can use the string position function function for that. String position returns the numerical position in the string of the first character that you're searching for. So will return 0 and will return 2 In my case returned nothing, which means it came back false because it didn't find the space. With my initial guesses about the problem coming back as incorrect, I decided to take a look at each character in the string by position. This way I can see where the offending character is. The function to convert a string to an array is str_split(). nicely returned this: Array ( [0] => A [1] => l [2] => a [3] => b [4] => a [5] => m [6] => a [7] => ) So I knew the problem was some invisible character at the end of the word. After messing around with it for a few minutes, Matt threw me a bit of advice in the form of the rtrim() function. Rtrim returns a string after any white-space characters have been pulled of the end. I tested it out. returned Array ( [0] => A [1] => l [2] => a [3] => b [4] => a [5] => m [6] => a ) and I was then able to compare my strings properly with But what was that final character anyway. I couldn't move on without knowing. Rtrim() works on a few well known characters including the null byte, the line break (\n), and the carriage return (\r). The full list is on the function's page. Here it turned out to be a carriage return that got left over from separating a list of states into an array. This is one of those errors that non-programmers simply hate thinking about, and programmers know all too well. Spending an hour (or more) looking for a single misplaced character is pretty common under normal circumstances and even more so when doing work with complex user entered strings. One suggestion I have is to decide if you can safely eliminate some of the string complexity before working with the data. In my case, comparing the names of states, there's no need for special characters at all. Running a regular expression to strip all non-alphanumeric characters from my strings and convert them to lowercase certainly wouldn't pose a problem in this case and would probably have saved me the time of having to troubleshoot a white-space character. I could even have gone a step further and converted the string to lowercase because the capital letters in a state's name don't help or hurt identification at all. Something like: would work work well. Even on states such as “New Hampshire” and “New York” we still get unique and easily comparable strings (“newhampshire” and “newyork”) that are human readable for debugging. If you identify the ways to reduce string complexity and decide what data you need and don't you're sure to save yourself some headaches.

Ready to get started?

Tell us about your project