Lecture 20 - C data structures
Goals
- Learn about the basic types in C
- Look at how arrays are handled in assembly
- Learn about the struct type
Building an x86 executable
For the above, we are using a special tool to produce ARM assembly, and we will continue to use that
However, it is worth taking a moment to talk about building something we can actually run
Let’s start with something closer to the starter program:
#include <stdio.h>
int main(int argc, char* argv[]){
printf("Don't Panic!\n");
printf("%d\n", 42);
}There are a couple of new things in here. - in order to print things out we use a special function called printf (the f is for “formatted”) - in order to use printf, we have to include the standard IO library stdio (we will talk more about including and .h files later) - note that we need to specify the new line character when we print - the printf function takes an arbitrary number of arguments which are substituted in for the substitution codes in the string (these all start with %)
If we are targeting x86, we can just use
$ gcc -S -masm=intel starter.c
$ gcc starter.s -o starter
$ ./starter
-Sjust run the assembler-masm=intelset the assembly “flavor” to Intel instead of the default AT&T-o starterproduce a binary file calledstarter
gcc is also smart enough to do the whole process in one step
$ gcc starter.c -o starter
$ ./starter
Storing data
basic data types
Java
byte - 1 byte
short - 2 bytes
int - 4 bytes
long - 8 bytes
float - 4 bytes
double - 8 bytes
C
char - 1 byte
short - at least two bytes
int - at least two bytes (and guaranteed to be at least the size of a short)
long - at least 4 bytes
long long - at least 8 bytes
float - 4 bytes
double - 8 bytes
For the integer types, you can also add unsigned in front of them to make them unsigned (otherwise they are interpreted as two’s compliment numbers)
Java types sizes are well defined because they have control of the architecture The C standard is looser since it covered a large number of different architectures. It has largely settled down, but there are no guarantees and you need to look at what your compiler emits
You can find the limits in /usr/include/limits.h or by using sizeof()
On basin,
char: 1 bytes
-128 - 127 [0X80 - 0X7F]
unsigned char: 1 bytes
0 - 255 [0 - 0XFF]
short: 2 bytes
-32768 - 32767 [0X8000 - 0X7FFF]
unsigned short: 2 bytes
0 - 65535 [0 - 0XFFFF]
int: 4 bytes
-2147483648 - 2147483647 [0X80000000 - 0X7FFFFFFF]
unsigned int: 4 bytes
0 - 4294967295 [0 - 0XFFFFFFFF]
long: 8 bytes
-9223372036854775808 - 9223372036854775807 [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
long: 8 bytes
0 - 18446744073709551615 [0 - 0XFFFFFFFFFFFFFFFF]
long long: 8 bytes
-9223372036854775808 - 9223372036854775807 [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
unsigned long long: 8 bytes
0 - 18446744073709551615 [0 - 0XFFFFFFFFFFFFFFFF]
We can take a look inside of sizes.c to see some more options for formatting in printf
I we want more control, we have some more specific types like int32_t, uint16_t, etc, that allow us to have more precise control over the size of our data
These can be found in /usr/include/bits/stdint-intn.h and /usr/include/bits/stdint-uintn.h
Data structures
let’s look at some assembly and see if we can figure out what it does
mov r3, #0
str r3, [fp, #-8]
b .L2
.L3:
ldr r3, [fp, #-8]
lsl r3, r3, #2 ; logical shift left (or just x4)
sub r3, r3, #4
add r3, r3, fp
ldr r2, [fp, #-8]
str r2, [r3, #-24]
ldr r3, [fp, #-8]
add r3, r3, #1
str r3, [fp, #-8]
.L2:
ldr r3, [fp, #-8]
cmp r3, #4
ble .L3
We can tell by the branches that we have a loop
The condition is with respect to r3, so we can start just looking at that - it starts as 0 - we are continuing while it is less than or equal to 4 - it is stored in fp -8 - it is being incremented every loop
So… for loop Let’s call the variable in fp-8 i
let’s work backwards for the second part str r2, [r3, #-24] - we are storing whatever is in r2 into r3 - 24 ldr r2, [fp, #-8] - just before we loaded [fp-8] into r2; that’s the value of i
So, what is in r3?
ldr r3, [fp, #-8] ; r3 ← i
lsl r3, r3, #2 ; r3 ← r3*4 (i*4)
sub r3, r3, #4 ; r3 ← r3 - 4 (i*4 - 4)
add r3, r3, fp ; r3 ← r3 + fp (fp -4 + i*4)
If we combine that with the -24, we are storing things into \(\text{fp} -4 - 24 + i \times 4\) or \(\text{fp} - 28 + i \times 4\)
To abstract that a little bit, we are talking about \(\text{some memory address} + i \times 4\)
We are storing integers, which we know are 4 bytes, so we are really talking about adjacent memory locations.
What we have is assignments into an array
Mechanical level
vocabulary
Skills