Oh Compiler, You so Crazy...

For Hack NightNitin and I set about teaching students how to read and write x86 assembly. I was tasked with teaching students how to write x86. Naturally, being short on time I decided to  cheat by writing some C code, compiling it, and taking a peek at the output before walking students through how to manually compile code by hand. When I compiled my demos, there was one particular control flow structure that threw me for a loop (no pun intended.)  It was a particular kind of switch; I’d describe it to you but I may as well just share the code.

#include <stdio.h>
#include <stdlib.h>

January                 31       1
February                28*      2
March                   31       3
April                   30       4
May                     31       5
June                    30       6
July                    31       7
August                  31       8
September               30       9
October	                31      10
November                30      11
December                31      12
int main(int argc,char** argv){
  char buf[3];
  gets(buf); //lol
  int monthNumber = atoi(buf);
  int days;
  case 1:
  case 3:
  case 5:
  case 7:
  case 8:
  case 10:
  case 12:

  case 4:
  case 6:
  case 9:
  case 11:
  case 2:
    puts("Whut r u doin!? Sthap");
  printf("The number of days in month %d is %d\n", monthNumber,days);
  return 0;

The code is very simple, it takes the number of a month and tells you how many days are in it disregarding leap years. Now one thing I hear people say over and over again is that switches are implemented as jump tables when they are compiled. While that may be true for some switch statements or some older versions of compilers, it’s not true for this particular switch using gcc 4.6.3-1ubuntu5 with a plain Jane compile (gcc -m32 months.c -o months). Now, I was getting this all done about an hour before Hack Night so I was rather disappointed that the functionality I wanted to teach students, namely jump tables, were not being emitted and instead I saw something like this:

Disassembled months program
Disassembled months program

What happens above this little snippet of code is the call to gets and atoi. The return from atoi is put in var_c. You can see that on the middle of the first basic block that the return value from atoi is being used to shift the value 1 to the left. What happens next is the result of the shift is saved then anded with the value 0x15AA. If the result of that and is not zero the left most branch is taken and the number of months is set as 31 then printed to the screen. Now how did the compiler come up with this magic number 0x15AA? Well if you look at the binary representation of the number 0x15AA you might notice something special about it

0x15AA  ==  1010110101010

And a closer look still

1  |  0  |  1  |  0  |  1  |  1  |  0  |  1  |  0  |  1  |  0  |  1  |  0
12    11    10    9     8     7     6     5     4     3     2     1     0

Bits 12, 10, 8, 7, 5, 3 and 1 are set! The placement of the bits corresponds to cases in the switch. Now while learning about this little compiler trick was fun I didn’t think it would have made good learning material for a bunch of reversing novices so I avoided using it. But then I got thinking, this is much less readable than a jump table. IDA will very happily make jump tables look very pretty and often annotates them with the switch and case number as you will have noticed if you have played with the binary bomb from CMU. Now all I had to do was solve this problem of readability and I could go back to doing something a little more important.

Have you ever played with IDApython? It’s free and it works with IDA demo so I highly recommend you install it this instant. What I wanted to do was write a script in IDApython that would make analyzing switch statements that look like this much easier if I had a program with many switches like this.

Now I make no claim to be an IDApython guru or to have solved this problem in the general case but I hope that if you’ve never seen IDApython before this will inspire you to download and play with it .

What the script below does is take the address of the instruction that you’ve highlighted, and start disassembling forward. If it comes to an and instruction it takes the immediate value of that and and makes a list of the numbers of the bit positions that are high from the immediate operand. It then looks for the nearest jnz and calculates the address that the jnz would have jumped to and writes a comment there with the cases of the switch that would have landed you there(calculated from the immediate value.) The script will repeat this action until it reaches a jmp instruction indicating the end of the switch. This is by no means a general solution and will only work on code that looks exactly like this, but If I had a program with many switches like this I would be glad to have a tool like this to make sense of all of them.

from idaapi import *
from idc import *
from idautils import *
def bins(*a):
		for i in range(len(binary)):
			if(binary[i] is '1'):
		return ret
	return map(bins, a)

print "Running on %s" % hex(functionStart)

while ea and 'jmp' not in GetDisasm(ea):
	if('and' in GetDisasm(ea)):
			print "Error, no immediate Vals"
		#get the cases of the switch that the value represents

	if('jnz' in GetDisasm(ea) and foundAnd):
		print GetDisasm(ea)+" GOT IT"
		for xref in XrefsFrom(ea, 0):
			if(xref.type is 21): #Ordinary_Flow
				MakeComm(commentEA, str(cases))
print "END"

If you save that to a .py file and execute it as a script in IDA (after you have installed IDApython) you will see that it had correctly identified the cases of the switch.

Annotated switch disassembly via IDApython
Annotated switch disassembly via IDApython