Lecture 13 - Introduction to Assembly
Goals
- Learn how to assemble and disassemble data processing operations on ARM32
- Describe the effect of the
add
operation on processor state for ARM32 and x86 assembly
Last time we looked at Instruction Set Architectures, and we learned about machine code, the native code of the computer
There was a time when computers were programmed directly in machine code, but it was a relatively short period of time
Assembly Language is essentially a human readable form of the same instructions an assembler is a program that can translate from assembly to machine code.
this is not the most difficult task since there is a very close, if not direct one-to-one connection between an instruction in assembly and an instruction in machine code
An instruction in assembly consists of a single short mnemonic word followed by the arguments for the instruction, which could consist of the source and destination registers, a value or a memory address
In this class, we will looks at two different assembly languages, ARM32 and x86-64
ARM32 follows the RISC approach and will be a little easier to wrap your head around since it has fewer instructions and they are a consistent size and structure
On the other hand, x86-64 is the native language for most of your computers (unless you are using a new M1 or M2 Mac)
We will start with ARM32
- we will look at how ARM32 instructions can be assembled and disassembled
- you will be expected to understand the mechanics of what the instructions do to the machine state
- the next project will be to demonstrate that knowledge by building an ARM processor in Logisim
Later, we will look at x86
- we will only be looking at the mechanics of what the instructions to to the machine state
ARM32 ADD
ADD: Bitwise Add (register)
3 2 1 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
| |0 0|0|0 1 0 0| | Rn | Rd | | Rm |
+---------------------------------------------------------------+
assembly: ADD Rd, Rn, Rm
effect: Rd <- Rn + Rm
example: e0823003 ADD r3, r2, r3
the diagram at the top shows the meaning of each bit of the machine-code instructions it is a 32-bit instruction and the numerals in the top two rows identify the place of each bit the left-most (most significant) bit is bit 31, the right-most (least significant) bit is bit 0 for this instructions, bits 0-3 are interpreted as a single value with the name Rm bits 12-15 are interpreted as a single value with the name Rd bits 16-19 are interpreted as a single value with the name Rn bits 26-27 have to be 00 (this says which "category" the instruction is -- in this case "data processing") bit 25 is 0 (we'll talk about why later) bits 21-24 have to have the value 0100 -- this is the opcode for the instruction there are some gaps in here that are either not used for this instruction, OR that we won't worry about at the moment
Rd
, Rn
, and Rm
specify the source and destination registers
Note that they are four bits each, which implies that we have how many registers?... 16
assembling / disassembling
This diagram tells us how to translate between assembly and machine code for this instruction
If I give you the following value and tell you it is an instruction (as with our numbers, we have to be specific since there is now other way to know what this is)
e0845002
we would break that down into bits
1110 0000 1000 0100 0101 0000 0000 0010
We would look at the 0000 1000
and recognize our opcode in there for ADD
(we can ignore the spare 0 for the moment)
Next we would look for our three registers
Rn
= 0100 or r4Rd
= 0101 or r5Rm
= 0010 or r2
So our instruction is
ADD r5, r4, r2
meaning
So what does this do?
The effect is Rd ← Rn + Rm
So this instruction says "add together the contents of register 4 and register 2 and store the result in register 5"
It is important to pay attention to the order of the register specifiers. Notice that in the machine code, the order goes first operand, destination, and then second operand. The ordering comes because there are a number of different variants of the instruction, but we won't dive too far into the whys -- just be aware that you need to stay on your toes (and that different assembly languages will make different choices)
x86 ADD
We are going to look at the same instruction in x86 assembly
You should be aware that there are actually two different assembly "dialects" for x86: Intel and AT&T The actual instructions don't really differ, just how they are written (either one will produce code that will run on an x86 machine)
I mention this primarily if you get inclined to go looking things up online. We will be sticking to the Intel assembly, partly because it is closer to our ARM assembly, and is a little nicer syntacticly
The same instruction in x86 is
01 d0 add eax, edx
effect eax <= eax + edx
I've given you the machine code on the left and the assembly on the right
I generally won't share the machine code, but I wanted you to see that it is quite short
recall that CISC ISAs have variable length instructions. In general, more common instructions will have shorter representations
Looking at the instruction itself, you can see we have the same mnemonic, but the arguments look a little bit different
eax
and edx
are registers
For historical reasons, registers have names that somewhat reflect their purpose (and size) We will return to this later
The other thing to notice is that there are only two arguments
These are the operands for the operation. The destination isn't specified because it is always eax
-- it is the designated "return value register".
Yes, this requires some planning and data movement to make sure you don't compute something and then overwrite it with the next operation
Data processing instructions
let's take a look at the SUB instructions
3 2 1 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
| |0 0|0|0 0 1 0| | Rn | Rd | | Rm |
+---------------------------------------------------------------+
assembly: SUB Rd, Rn, Rm
effect: Rd <- Rn - Rm
example: e0423003 SUB r3, r2, r3
How does this compare to our ADD instruction?
It is virtually the same!
We have a new mnemonic and a new operation, but the only change to the machine code is the opcode
Some other instructions we have include EOR (bitwise xor), AND (bitwise and) and ORR (bitwise or)
Mechanical level
vocabulary
Skills
- Disassemble machine code into the equivalent assembly instruction
- Assemble an assembly instruction into the equivalent machine code
Last updated 04/05/2023