sexta-feira, 9 de junho de 2017

Subtle Bug: Python, Mac and Unicode

I was studying Django by doing a project. Every test was passing so apparently everything was fine. But after deploying the application and trying to save a subscription I got a 404.

I little investigation revealed that Django's template tag url was generating "/inscricao/ˆ" instead of expected "/inscricao/". So I triple checked my urls. First the my base

And after my app urls:

Everything looked nice. So I thought: "Maybe I found a bug on Django, what a chance to contribute to a huge Python web framework!". So I wrote a test case on my application to uncover the bug and started debugging it. The test could be used latter to send a PR to Django:

So the failing test helped me to find the line where "^" is removed from pattern to build the paths:

On debugger session my p_pattern was "'ˆ$". Besides of that the conditions was returning False. I thought: WTF!. But keeping attention I spotted the issue.

The "ˆ" from my code was a bit smaller the "^" on DJango's code. The issue happens because when you configure Mac's Keyboard to international and you press SHIFT + ^ you get "ˆ". Most of times Brazilians type a letter after that so you have char like "ê" or "â". In this case If you really want only the circumflex accent you need hit space button on your keyword.

On other systems, like Linux and Windows, you get no char when typing this button, so you are visually aware that you need to take another action to make it appear. That is the reason I got the wrong char there.

I could also have spotted that paying attention to Pycharm IDE sintax highlight. On raw strings mostly used for regex it color special char with orange. Check the difference:

Wrong char

Right char

After all of this I also checked the names of the chars using python:

And off course, fixed it:

So after been a little angry for spending 2 hours at this bug I decided to write this blog post to realize the things I learned on investigation:
  1. unicode;
  2. regex;
  3. more knowledge about Django's internal code;
  4. knowing "^" char is called circumflex accent in English.

Most of times when I find myself struggling with a bug the cause was a simple (and mine) mistake which led to it. And you? Do you have a history with a bug like this? Share on the comments!