For Hack Night, Nitin and I set about teaching students how to read and write x86 assembly. I was tasked with teaching students how to write x86. Naturally, being short on time I decided to cheat by writing some C code, compiling it, and taking a peek at the output before walking students through how to manually compile code by hand. When I compiled my demos, there was one particular control flow structure that threw me for a loop (no pun intended.) It was a particular kind of switch; I’d describe it to you but I may as well just share the code.
The code is very simple, it takes the number of a month and tells you how many days are in it disregarding leap years. Now one thing I hear people say over and over again is that switches are implemented as jump tables when they are compiled. While that may be true for some switch statements or some older versions of compilers, it’s not true for this particular switch using gcc 4.6.3-1ubuntu5 with a plain Jane compile (gcc -m32 months.c -o months). Now, I was getting this all done about an hour before Hack Night so I was rather disappointed that the functionality I wanted to teach students, namely jump tables, were not being emitted and instead I saw something like this:
What happens above this little snippet of code is the call to gets and atoi. The return from atoi is put in var_c. You can see that on the middle of the first basic block that the return value from atoi is being used to shift the value 1 to the left. What happens next is the result of the shift is saved then anded with the value 0x15AA. If the result of that and is not zero the left most branch is taken and the number of months is set as 31 then printed to the screen. Now how did the compiler come up with this magic number 0x15AA? Well if you look at the binary representation of the number 0x15AA you might notice something special about it
0x15AA == 1010110101010
And a closer look still
1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0
12 11 10 9 8 7 6 5 4 3 2 1 0
Bits 12, 10, 8, 7, 5, 3 and 1 are set! The placement of the bits corresponds to cases in the switch. Now while learning about this little compiler trick was fun I didn’t think it would have made good learning material for a bunch of reversing novices so I avoided using it. But then I got thinking, this is much less readable than a jump table. IDA will very happily make jump tables look very pretty and often annotates them with the switch and case number as you will have noticed if you have played with the binary bomb from CMU. Now all I had to do was solve this problem of readability and I could go back to doing something a little more important.
Have you ever played with IDApython? It’s free and it works with IDA demo so I highly recommend you install it this instant. What I wanted to do was write a script in IDApython that would make analyzing switch statements that look like this much easier if I had a program with many switches like this.
Now I make no claim to be an IDApython guru or to have solved this problem in the general case but I hope that if you’ve never seen IDApython before this will inspire you to download and play with it .
What the script below does is take the address of the instruction that you’ve highlighted, and start disassembling forward. If it comes to an and instruction it takes the immediate value of that and and makes a list of the numbers of the bit positions that are high from the immediate operand. It then looks for the nearest jnz and calculates the address that the jnz would have jumped to and writes a comment there with the cases of the switch that would have landed you there(calculated from the immediate value.) The script will repeat this action until it reaches a jmp instruction indicating the end of the switch. This is by no means a general solution and will only work on code that looks exactly like this, but If I had a program with many switches like this I would be glad to have a tool like this to make sense of all of them.
If you save that to a .py file and execute it as a script in IDA (after you have installed IDApython) you will see that it had correctly identified the cases of the switch.