C++ 64-bit Inline Assembly Primer – Part 2

In this series we examine the relationship and implementations of C++ and raw assembly code. In this post we create our own add function in gcc extended assembly.

In the previous post we wrote a short C++ program that loaded two numbers. Despite its simplicity in C++, we saw how the assembly version was comprised of several dozen individual instructions in a rather cryptic format. Much of this complexity stems from the fact that in our sample program we called a function to perform our addition. Calling functions means dealing with a stack, base pointers, and the setup and maintenance of that stack. It means dealing with memory offsets, relative positions, and several other factors. The good news is that at this point we can safely ignore these details. In fact, we will do well to ignore them and focus on just the core competencies of function implementation code. In other words, we’ll let gcc create the function shells, calls, and stack management, we’ll focus on the core logic.

The general idea is that we want to write assembly that adds two numbers together. There is a direct assembly instruction for doing so, add, and in this post we’ll implement it.

One of the key things to remember about writing assembly is that we’re dealing with very few abstractions. In C++:

  1. int t = 10;

Has a very specific meaning, but its true meaning to the machine it runs on is hidden. It’s only after gcc compiles the code that int, t, and 10 mean anything to the processor. In assembly we free ourselves from these abstractions and deal with the very direct process of moving bits around.

Thus, our first step in writing assembly to add two numbers is to realize that the add instruction we’ll be using expects two parameters, neither of which can be memory locations. They can however, be registers. Thus, our first steps in code must be to load two registers with the values we want to add.

To load a value into a register we use the mov instruction, which for us in AT&T syntax takes the form of:

  1. mov $0×2, %%rax;

This moves the immediate value (2) into the %rax register. Please keep in mind %rax is a 64-bit register, it would be %eax on 32-bit systems.

We then push our second number:

  1. mov $0×1, %%rcx;

And finally, call the add instruction, passing in our two registers:

  1. add %%rcx, %%rax;

The whole function in extended assembly looks like:

  1.     __asm__(
  2.             "mov $0×2, %%rax;    \n\t"
  3.             "mov $0×1, %%rcx;      \n\t"
  4.             "add %%rcx, %%rax;  \n\t"
  5.             :
  6.             :
  7.             :"rax", "rcx");

Should we run this in our IDE, we would watch the rax and rcx register’s end up with 0×1 and 0×3, respectively.

Thing is, this is not a very useful construct, as immediate values means we’ve hard coded the return in by default. It’s much more realistic to accept parameters.

To do so, we take advantage of gcc extended syntax like so:

  1.     int rax = 2;
  2.     int rcx = 1;
  3.  
  4.     cout << "(rax) before: " << rax << endl;
  5.  
  6.     __asm__(
  7.             "mov %1, %%rax;  \n\t"
  8.             "mov %2, %%rcx;  \n\t"
  9.             "add %%rcx, %0;  \n\t"
  10.             : "=m" (rax)
  11.             : "m" (rax), "m" (rcx)
  12.             :"rax", "rcx");
  13.  
  14.     cout << "(rax) after: " << rax << endl;

Will output:

  1. (rax) before: 2
  2. (rax) after: 3
  3. Press [Enter] to close the terminal …

While we now have plenty of control over implementation, it’s still a bit heavy on syntax. Thus, we can further optimize by letting gcc have full control over register assignment:

  1.     int rax = 2;
  2.     int rcx = 1;
  3.  
  4.     cout << "(rax) before: " << rax << endl;
  5.  
  6.     __asm__(
  7.             "add %2, %0;  \n\t"
  8.             : "=r" (rax)
  9.             : "r" (rax), "r" (rcx)
  10.             :);
  11.  
  12.     cout << "(rax) after: " << rax << endl;

Here we let gcc decide which registers to use, which leads to slightly faster code generation, but we may lose the exacting control we need.

At this point we now have a functioning addition routine. Of course we already know it’s not the most efficient, as we could/should be using lea. This is because while our actual hand-written assembly is shorter, we still need to place values in registers and use the two-step addition instruction.

Leave a Comment

* are Required fields