CS 202 - Notes 2018-10-17

x86-64 Assembly

Data Formats

We only sort of have data types in assembly. The suffixes above are added onto instructions to tell the instruction how large a chunk of data it is working on. So, for example, if an instruction has a 'q' on the end, we know it will work on 64-bit data.

Registers

We have 16 64-bit registers.

Each register can be dealt with as containing a 64,32,16,or 8 bit value depending on the name you use (e.g., %rax gives you all 64 bits, while %eax gives you the lower 32).

At one time they all had specific roles, which is somewhat reflected in their names. As we go, we will learn about the remaining roles and rules of use.

Operands

Operands to assembly instructions can be immediate (actual values), register (read from the register file), or memory (read out of memory).

The most important of the memory address schemes to remember is the scaled and indexed with displacement format (imm(Rb, Ri, s)), which is the most general form. All of the others are merely variants of it with one or more of the arguments missing.

Arithmetic

Mostly, these are what you would expect.

The lea instruction is an interesting one. The main purpose of the instruction is to compute memory addresses without the accompanying memory access. You will see it in a lot of generated assembly, because it can be used to quickly add numbers together using the faster address computation hardware.

Example

We looked at a very simple piece of code.

sum.c

int simple(int x, int y){
  int sum = x + y;
  return sum;

}

We compiled it (`gcc -S sum.c) and found this:

	.file	"sum.c"
	.text
	.globl	simple
	.type	simple, @function
simple:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	%edi, -20(%rbp)
	movl	%esi, -24(%rbp)
	movl	-20(%rbp), %edx
	movl	-24(%rbp), %eax
	addl	%edx, %eax
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	simple, .-simple
	.ident	"GCC: (GNU) 7.2.1 20170915 (Red Hat 7.2.1-2)"
	.section	.note.GNU-stack,"",@progbits

We talked about the different levels of optimization, which is specified with the -O flag. The default is -O0, or no optimization. A more interesting level is -Og, which performs some optimization, but not so much that it would interfere with debugging.

After gcc -S sum.c -Og


	.file	"sum.c"
	.text
	.globl	simple
	.type	simple, @function
simple:
.LFB0:
	.cfi_startproc
	leal	(%rdi,%rsi), %eax
	ret
	.cfi_endproc
.LFE0:
	.size	simple, .-simple
	.ident	"GCC: (GNU) 7.2.1 20170915 (Red Hat 7.2.1-2)"
	.section	.note.GNU-stack,"",@progbits

We can see that this has optimized the code down to two lines. The first one uses leal to add together %rdi and %rsi and store the result in %eax. If you look at the list of registers, you wll see that %rdi and %rsi and argument 1 and argument 2 of the function (so x and y respectively). %eax is the return register. So, this added together the two arguments, put the result in the return register and then returned.

# CS 202 - Notes 2018-10-17

# x86-64 Assembly

# Data Formats

# Registers

# Operands

# Arithmetic