Saturday, November 26, 2011

gcc does binary search in switch (not nested if blocks)

So I started looking at Write Great Code, and one of the examples he offers is a comparison of a switch(x) versus if(x==1) else if (x==2) else if ... there were 1-4 matching and a default case.

so I threw that together into a dummy source file, asked for an assembly output, and looked. I was a little surprised to see this:
movl -4(%rbp), %eax
cmpl $2, %eax
je .L4
cmpl $2, %eax
jg .L7
cmpl $1, %eax
je .L3
jmp .L2
.L7:
cmpl $3, %eax
je .L5
cmpl $4, %eax
je .L6
jmp .L2

It looks like it peeks into our variable (argc gets loaded into eax for this example) and starts a binary search of the possible literal values it expects. I didn't turn on any extra optimizations, so I assume this is just a practical good idea. I feel slightly less dumb (at the machine level) feeding ploddingly literal switch statements to gcc now, though I still feel like a chump typing them.

The nested if blocks follow a more familiar pattern... is it 1? well, is it 2? Ok, is it 3? Hmm, what about 4? Well, I give up, lets take this else else else else action. I guess its nice to know this. It may be a reason only a literal can be used in a switch, that the language designers were expecting this type of optimization on the back end.

My take-away from this is that if there is a higher probability of some options, nested if/else blocks make sense. If the possibilities are equally likely, switch makes sense. If the purpose is to guard against bad things (dereferencing a null pointer) if/else is a necessity, and it's proper that gcc didn't optimize it away (it would be disastrous if it did!)

1 comment:

Dan said...

As a note, I tried timing this on various optimization settings. In some cases -O2 and higher totally broke the switch (interesting to see the output) and just chomped cpu cycles until killed.

It looks like in the moderate four to five option test I set up, switch gets roughly a 10-20% boost over if. I modified the code to update the value of x in each case, to keep it bouncing from one place to the other inside the statements. After adding this modification, the optimized code started working again. Not sure what happened, but I get wary of using optimization flags without unit tests now.