It’s easy when solving a problem to become irritated and just push out a clumsy solution. In doing this we completely forget about the art of programming! While every now and then, there are problems that can just have a solution pushed out, generally its better to spend some time in how you solve the problem as well as how you implement the solution.

This is the first blog post in a series of blog posts that will cover problem solving as an art. This first post is going to cover the basics of problem solving as well as some of the different steps involved. Later on we will cover the individual steps in greater detail.

Step 1 Defining the problem

The first thing that must be done is defining the problem. We’ll start off with a simple problem that I enjoy as an example problem: How do you decide if a number is even or odd? Of course most programmers know how to solve this problem, which makes it a prime first example.

Our original way of stating the problem seems to be fine (that is: How do you decide if a number is even or odd?) Although this seems like a fine way to put the question, it is not specific enough. We don’t mention any constraints, as well as we don’t mention any specifics. So, lets redefine our question mentioning the language we want to solve this problem in, and a engineering constraint.

How do you decide if a number is even or odd efficiently in C? While this looks quite similar, it is much more clear. We state our final goal (finding out if a number is even or odd), our secondary focus (efficiency), and what language we will be solving this problem in (C).

Step 2 Defining possible solutions

Now that we have a problem, we must create a solution to this problem. Depending on the type of problem, there are several ways to solve it. This problem happens to be one that humans can accomplish. We have the ability to tell if a number is even or odd. This gives us a head start. We should ask our selves: How do I tell if a number is even or odd?

There are several ways we determine if a number is even or odd. If the number ends in 0, 2, 4, 6, or 8 the number is even, if not the number is odd. Another way to tell is if the number is divisible by 2.

Step 3 Writing out possible solutions

Now we need to write out possible solutions for our problem. In a simple problem like this, it will usually entail writing out the actual problems in language . In more complex problems, you might need to write out pseudo solutions first.

// Solution number 1

_Bool IsEven (int i) {

int x = i % 10;

if (x == 2 || x == 4 || x == 6 || x == 8 || x == 0)

return 1;

else

return 0;

}

// Solution number 2

_Bool IsEven(int i) {

if (i % 2 == 0)

return 1;

else

return 0;

}

Step 4 Deciding which solution to use

The first thing that we check is which one is cheaper; which one is more efficient? There are many machine specific ways to determine this, however, in this example it is clear that solution two is cheaper. Although we used the same amount of modulo, we used much less logical operators in that solution.

So, we have decided that checking if a number is divisible by 2 is the best method of determining if a number is even or odd. Now we’re going to make it more efficient.

Step 5 Optimizing your solution

Often times programmers who use compilers will make excuses when writing their code that there is no need for them to optimize their code, as the compiler will do this for them. That is true to a degree; which is the danger of that statement. Seeing as you are likely to use more than one compiler/optimizer, and seeing as you probably didn’t write the optimizer yourself, you probably don’t know enough about the particular optimizer to presume what it will and won’t optimize. Because of this, code that will be frequently ran, or expensive code, should be optimized by you, not just the compiler.

So, what are some ways we can optimize our solution? If we look at our code, we have 2 conditional statements (if and else) and 1 conditional operator (==). None of those are actually required.

What is the result of modulo? Given the arguments x % y, the outcome has a range of 0 to y – 1 presuming y < x. What this means is that the result of our use of modulo (the result of i % 2) will be either 0 or 1. If the result is 0, we can return 1 (remember, in C true =1), and if the result is 1, we can return 0 (false =0). Notice, we are just returning the opposite boolean value that we gain from the modulo. Instead of using if and else to return the opposite value, we can just not (!) the value returned by i % 2, which is several cycles cheaper.

// Improved solution

_Bool IsEven (int i) {

return !(i % 2);

}

We have shortened our algorithm by quite a bit! The optimization isn't done though. On a large amount of machine architects, the way modulo is achieved is by completing either division, or a form of division. Division is an extremely slow operation (about 40-90X slower than addition). So, what can we do? It just so happens that we are programming a computer. We're programming a machine that thinks in binary. It just so happens that if the first digit of a binary number is one, the number is odd, and if not, it is even! With this knowledge we can make our program tremendously faster!

// Improved^2 solution

_Bool IsEven (int i) {

return !(i & 1);

}

By replacing modulo with a logical &, we have increased the speed of our algorithm by a landslide! However, there is still another optimization we can complete. Currently, we have two operations to determine if a number is even or odd. A not, and an and. Both of these operations are extremely fast, however, the not operation isn't needed if we change our program slightly.

// Improved^3 solution

#define IsOdd(i) (i & 1)

This algorithm is now in its most optimized form. By changing the function name from IsEven to IsOdd, we no longer need to not the result of the and, this increases speed by a large amount. Also since this is such a small algorithm, we don't need to have it be a function, so we turn it into a macro and remove the expense of a function call.

Step 6 Documenting the solution

So, now we have our solution! It's smaller, more elegant, and beautiful than our original solution, however, it can still be made more elegant and beautiful, simply by the power of comments.

// ******IsOdd******

// The IsOdd macro does exactly what it sounds like it'd do, it tells if * is odd.
// We test to see if the first bit is one or zero, if the first bit is one, then is clearly odd,
// if not, it’s even.
#define IsOdd(i) (i & 1)*

Any time you’ve optimized a function, or have written a complex function, it’s a good idea to go ahead and write comments explaining what the algorithm does, and how it works.

So, what are the steps to problem solving?

Define the problem

Define potential solutions

Write out potential solutions

Decide on the solution you want to use

Optimize solution

Document solution

It’s easy to spend too little, or even too much time on any one of these tasks, and in doing so, a bit of the art in programming is taken away. Sometimes it’s important to go back to the basics and remember what we love about programming; sometimes it’s important to remember that programming is an art form; a form of expression as much as a science.

Hmm.

Well, the steps you give are laudable, but the specific details in this example are a bit arguable.

Firstly, you list transforming this:

int IsEven(int i)

{

if (i % 2 == 0)

return 1;

else

return 0;

}

..into this:

int IsEven(int i)

{

return !(i % 2);

}

..as an optimization. In fact, today’s optimizing compilers are good enough that these are completely identical. I made a simple test application using Visual C++ 2010:

#include

int IsEven1(int i)

{

if (i % 2 == 0)

return 1;

else

return 0;

}

int IsEven2(int i)

{

return !(i % 2);

}

`int main()`

{

printf("%d\n", IsEven1(5));

printf("%d\n", IsEven2(5));

}

When I inspected the compiler output from this, I couldn’t find the IsEven1 and IsEven2 functions. They didn’t exist at all! The compiler had emitted code that just directly passed the value 0 to printf. So, I changed the driver to this:

int main()

{

int i;

scanf("%d", &i);

`printf("%d\n", IsEven1(i));`

printf("%d\n", IsEven2(i));

}

The functions still didn’t exist, but the logic for them was there. It had been inlined directly in. For both IsEven1 and IsEven2, the identical machine code was emitted:

011C1014 mov eax,dword ptr [i]

011C1017 and eax,80000001h

011C101C jns main+23h (11C1023h)

011C101E dec eax

011C101F or eax,0FFFFFFFEh

011C1022 inc eax

011C1023 ...

011C1029 neg eax

011C102B sbb eax,eax

011C102D inc eax

This code looks pretty far removed from an ‘if’ statement with a % modulus operation (which is, in x86 machine code, returned at the same time as the results of a division). So, what on earth is it actually doing?

It loads the value to be processed into the EAX register, and then it turns off all but the highest and lowest bits. As you pointed out later in the article, the lowest bit is the one that indicates whether a number is odd or even. The highest bit is one that indicates whether a number is positive or negative. Okay, so why does it care whether the number is positive or negative? Well, we asked it to test the result of “i % 2”, and the % function can return negative numbers for negative input. The value of “-5 % 2” is -1, not +1.

The JNS operation is a conditional jump that skips over a few instructions if the number is “Not Signed”, that is to say, not negative, so if it’s positive, it jumps immediately to the NEG instruction. Otherwise, it runs this little sequence:

0010101E dec eax

0010101F or eax,0FFFFFFFEh

00101022 inc eax

What does this do? Well, let’s look at the possible values of EAX. We know that the number is signed, because otherwise this branch would have been skipped. (If it is a positive number, then the AND instruction leaves EAX with simply 0 or 1 in it.) So,

EAX=80000000 for values of i: -2, -4, -6, -8, -10, …

EAX=80000001 for values of i: -1, -3, -5, -7, -9, …

After the DEC operation, we get:

EAX=7FFFFFFF for values of i: -2, -4, -6, -8, -10, …

EAX=80000000 for values of i: -1, -3, -5, -7, -9, …

Then, it uses OR to turn on all but the least significant bit:

EAX=FFFFFFFF for values of i: -2, -4, -6, -8, -10, …

EAX=FFFFFFFE for values of i: -1, -3, -5, -7, -9, …

Finally, it INCrements the value:

EAX=00000000 for values of i: -2, -4, -6, -8, -10, …

EAX=FFFFFFFF for values of i: -1, -3, -5, -7, -9, …

This pattern, FFFFFFFF, with all bits set is the number -1. So, once we reach the NEG instruction, EAX has one of three values:

EAX=00000001 for values of i: 1, 3, 5, 7, 9, …

EAX=00000000 for values of i: -8, -6, -4, -2, 0, 2, 4, 6, 8, …

EAX=FFFFFFFF for values of i: -1, -3, -5, -7, -9, …

That is, -1, 0 or +1. We now have the result of “i % 2”. Now we compare that to zero:

011C1029 neg eax

011C102B sbb eax,eax

011C102D inc eax

There is some more arcane magic going on here. The key is in a register that isn’t visible to us here, the Flags register. One flag in particular, CF, gets set to 0 or 1 automatically when any arithmetic operation is done. The NEG operation simply negates a number, so -1 becomes +1, 73 becomes -73, and 0 becomes 0. The key here is that conceptually, it is modeled as:

eax = 0 – eax

Because there’s actually a subtraction here, if EAX is anything other than 0, it has to “carry” (“borrow”) in order to complete the subtraction (remember grade school? :-). So, the NEG operation sets CF to 0 if EAX is 0, and 1 if EAX is anything other than zero. So, after the NEG, we have:

EAX=00000001 => EAX=FFFFFFFF, CF = 1

EAX=00000000 => EAX=00000000, CF = 0

EAX=FFFFFFFF => EAX=00000001, CF = 1

This ties into the SBB instruction. The SBB instruction is meant to be the second half of a subtraction that is too wide for a single register; it implements that borrowing by looking in CF to see if a previous subtraction needed to borrow. Normally, you subtract two different numbers one from the other, but in this case, the operation is being used to subtract the register from itself. Essentially:

eax = eax – eax – CF

Well, obviously eax – eax is just 0, so what’s *

really* happening here is:eax = -CF

So what we get is that if EAX was 0 for the NEG, and thus CF was zero, the new value is also zero. If EAX was *

not* zero (doesn’t matter whether it was +1 or -1 — see the last little table, CF becomes 1 either way), the new value is now -1. So:if ((i % 2) == 0)

eax = 0

else

eax = -1

Well, this isn’t *

quite* what we wrote, but wait, we’re not quite done yet!00101034 inc eax

So, we add one to it. A value of 0 becomes +1, -1 becomes 0. So, we finally have precisely what we originally coded:

if ((i % 2) == 0)

eax = 1 ("return 1")

else

eax = 0 ("return 0")

So, that sequence of operations is definitely implementing what we wanted, and it isn’t using the division operation we were expecting to see, so we’ve already been unexpectedly outsmarted by the compiler, but the real amazing thing is that the sequence of operations is *

exactly the same* for the alternate implementation! Check it out:011C1036 mov ecx,dword ptr [i]

011C1039 and ecx,80000001h

011C103F jns main+46h (11C1046h)

011C1041 dec ecx

011C1042 or ecx,0FFFFFFFEh

011C1045 inc ecx

011C1046 neg ecx

011C1048 sbb ecx,ecx

011C104A inc ecx

So, changing the implementation to a single line had *

no effect whatsoever* on the runtime! The only reason, then, to choose one over the other, is aesthetic. I personally find the 4-line version a little bit easier to read; I don’t have to remember and apply the semantic of the ! operator (“1 if it’s zero, 0 for any other number”).That leads to the second point. The subsequent optimization you gave using the AND operator is undeniably an improvement:

int IsEven3(int i)

{

return !(i & 1);

}

..compiles to:

011C1053 mov edx,dword ptr [i]

011C1056 not edx

011C1058 and edx,1

Why does it NOT first? Because NOT flips *

every* bit. NOT is the ~ operator, not the ! operator. I don’t think too much explanation is needed; this code should be fairly explanatory. Anyone who understand bits and the C language should see clearly that this:!(i & 1)

..and this:

(~i) & 1

..are numerically equivalent. So far so good. But now you propose transforming this to a macro. Essentially, we’re forcing the compiler to inline it. Well, we’ve already seen that it’s not necessary to do that; the compiler already inlined the IsEven methods without any prompting beyond changing the build configuration to the predefined “Release” mode. However, setting that aside, let’s examine your macro a bit more closely:

#define IsOdd(i) (i & 1)

There is a very serious problem with this macro, and it is illustrated quite handily with this test driver:

int main()

{

int i;

scanf("%d", &i);

`printf("%d\n", IsOdd(i ^ 1));`

printf("%d\n", IsOdd(i ^ 2));

printf("%d\n", IsOdd(i ^ 3));

printf("%d\n", IsOdd(i ^ 4));

}

I’ll leave it as an exercise to the reader to figure out why. 🙂

Two final notes:

– You may have noticed a skipped instruction or two in the disassembly, denoted with an ellipsis (“…”). What’s this all about? The compiler is interleaving statements in the program, rather than doing them strictly in sequence, to better employ the pipelining offered on modern processors.

– If you were particularly astute, you will have noticed that I did not end my ‘main’ functions in a ‘return’ statement. While this is technically compilable under standard C, and while its behaviour is undefined, I am not actually exercising either situation. Rather, I did my tests with C++ which, as a special case, does not require ‘main’ to return. 🙂

Greetings, first of all, thanks for your analysis! Second of all, there are a couple problems with your optimization theory. Most of them stem from you only (or at least presumably) only testing using Visual C++.

If you test this on GCC, you will get a different assembly output than LLVM, or if you use borland you’ll get a different assembly output than you would from a digital mars C compiler. If you compile using cc on an debian system, you will get a different assembly output than if you use cc on minix. The point of all of this is, while in some situations it’s okay to rely on your compiler, its best to write your code so it will be quite efficient on any compiler despite how good or bad the optimization is for that particular compiler.

As for some of your other notes, such as stating that ~ is not and ! isn’t; ~ is a bitwise not, and ! is a logical not (or logical negation). In this case we want ! as ~ will reverse the bits, and if you are going to multiply by the returned value, you are going to want a logical not instead of a bitwise not.

As for your last critique in which you note that:

printf(“%d\n”, IsOdd(i ^ 4));

brings an unwanted response, you forget a couple of things. One of which, is IsOdd wouldn’t be used in real life. In real life, a programmer would just write ((i ^ 4) & 1). Two, this isn’t a problem with the macro design, this is a problem with the macro use. This problem can easily be fixed with IsOdd((i ^ 4));