Lecture 13 - Introduction to Assembly

Goals

  • Learn how to assemble and disassemble data processing operations on ARM32
  • Describe the effect of the add operation on processor state for ARM32 and x86 assembly

Last time we looked at Instruction Set Architectures, and we learned about machine code, the native code of the computer

There was a time when computers were programmed directly in machine code, but it was a relatively short period of time

Assembly Language is essentially a human readable form of the same instructions an assembler is a program that can translate from assembly to machine code.

this is not the most difficult task since there is a very close, if not direct one-to-one connection between an instruction in assembly and an instruction in machine code

An instruction in assembly consists of a single short mnemonic word followed by the arguments for the instruction, which could consist of the source and destination registers, a value or a memory address

In this class, we will looks at two different assembly languages, ARM32 and x86-64

ARM32 follows the RISC approach and will be a little easier to wrap your head around since it has fewer instructions and they are a consistent size and structure

On the other hand, x86-64 is the native language for most of your computers (unless you are using a new M1 or M2 Mac)

We will start with ARM32

  • we will look at how ARM32 instructions can be assembled and disassembled
  • you will be expected to understand the mechanics of what the instructions do to the machine state
  • the next project will be to demonstrate that knowledge by building an ARM processor in Logisim

Later, we will look at x86

  • we will only be looking at the mechanics of what the instructions to to the machine state

ARM32 ADD

ADD: Bitwise Add (register)


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 0|0|0 1 0 0| |  Rn   |  Rd   |               |  Rm   |
+---------------------------------------------------------------+

assembly:   ADD Rd, Rn, Rm

effect:     Rd <- Rn + Rm

example:    e0823003    ADD r3, r2, r3

the diagram at the top shows the meaning of each bit of the machine-code instructions it is a 32-bit instruction and the numerals in the top two rows identify the place of each bit the left-most (most significant) bit is bit 31, the right-most (least significant) bit is bit 0 for this instructions, bits 0-3 are interpreted as a single value with the name Rm bits 12-15 are interpreted as a single value with the name Rd bits 16-19 are interpreted as a single value with the name Rn bits 26-27 have to be 00 (this says which "category" the instruction is -- in this case "data processing") bit 25 is 0 (we'll talk about why later) bits 21-24 have to have the value 0100 -- this is the opcode for the instruction there are some gaps in here that are either not used for this instruction, OR that we won't worry about at the moment

Rd, Rn, and Rm specify the source and destination registers Note that they are four bits each, which implies that we have how many registers?... 16

assembling / disassembling

This diagram tells us how to translate between assembly and machine code for this instruction

If I give you the following value and tell you it is an instruction (as with our numbers, we have to be specific since there is now other way to know what this is)


e0845002

we would break that down into bits


1110 0000 1000 0100 0101 0000 0000 0010

We would look at the 0000 1000 and recognize our opcode in there for ADD (we can ignore the spare 0 for the moment)

Next we would look for our three registers

  • Rn = 0100 or r4
  • Rd = 0101 or r5
  • Rm = 0010 or r2

So our instruction is


ADD r5, r4, r2

meaning

So what does this do?

The effect is Rd ← Rn + Rm

So this instruction says "add together the contents of register 4 and register 2 and store the result in register 5"

It is important to pay attention to the order of the register specifiers. Notice that in the machine code, the order goes first operand, destination, and then second operand. The ordering comes because there are a number of different variants of the instruction, but we won't dive too far into the whys -- just be aware that you need to stay on your toes (and that different assembly languages will make different choices)

x86 ADD

We are going to look at the same instruction in x86 assembly

You should be aware that there are actually two different assembly "dialects" for x86: Intel and AT&T The actual instructions don't really differ, just how they are written (either one will produce code that will run on an x86 machine)

I mention this primarily if you get inclined to go looking things up online. We will be sticking to the Intel assembly, partly because it is closer to our ARM assembly, and is a little nicer syntacticly

The same instruction in x86 is


01 d0                add eax, edx

effect eax <= eax + edx

I've given you the machine code on the left and the assembly on the right

I generally won't share the machine code, but I wanted you to see that it is quite short

recall that CISC ISAs have variable length instructions. In general, more common instructions will have shorter representations

Looking at the instruction itself, you can see we have the same mnemonic, but the arguments look a little bit different

eax and edx are registers

For historical reasons, registers have names that somewhat reflect their purpose (and size) We will return to this later

The other thing to notice is that there are only two arguments

These are the operands for the operation. The destination isn't specified because it is always eax -- it is the designated "return value register".

Yes, this requires some planning and data movement to make sure you don't compute something and then overwrite it with the next operation

Data processing instructions

let's take a look at the SUB instructions


 3   2                   1                   0
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
|       |0 0|0|0 0 1 0| |  Rn   |  Rd   |               |  Rm   |
+---------------------------------------------------------------+

assembly:   SUB Rd, Rn, Rm

effect:     Rd <- Rn - Rm

example:    e0423003    SUB r3, r2, r3

How does this compare to our ADD instruction?

It is virtually the same!

We have a new mnemonic and a new operation, but the only change to the machine code is the opcode

Some other instructions we have include EOR (bitwise xor), AND (bitwise and) and ORR (bitwise or)

Mechanical level

vocabulary

Skills

  • Disassemble machine code into the equivalent assembly instruction
  • Assemble an assembly instruction into the equivalent machine code

Last updated 04/05/2023