CS 202 - Notes 2018-09-28
Integer sizes
Values in the computer have to be stored in some fixed size location
As programmers, our control over this storage comes when we declare variable types
In Java, the various integer types are well defined
Name | # of bytes |
---|---|
byte | 1 byte |
short | 2 bytes |
int | 4 bytes |
long | 8 bytes |
This is because Java is designed to run on a virtual machine, so the developers of the language have control over the underlying storage.
In Python, we have dynamic typing and we don't clare our variables, so everything is fairly hidden. The most common flavor of Python is written in C uses longs to store integers. However, they have some trick data structures under the hood that can allow integers to be essentially unbounded if they grow beyond the capacity of the long.
In C, things are a little less straightforward
Name | # of bytes |
---|---|
char | 1 byte |
short | at least 2 bytes |
int | at least as big as the short |
long | at least 4 bytes |
long long | at least 8 bytes |
On most current desktops, we will find that the int is 4 bytes and the long is 8 (making long long redundant).
Here is a short program designed to test the size of each integer datatype in C.
/*
This is a short program that lists the sizes of the basic numeric types.
It illustrates some of the options for using fprintf as well as the constants
provided in limits.h.
To compile: gcc -o sizes sizes.c
C. Andrews
*/
#include <stdio.h>
#include <limits.h>
/*
Our simple main method that prints out the sizes.
fprintf (and printf) are _formatted_ output. We use replacement codes within
the string, which are then replaced by the items in the comma separated list
that follows it. The replacement codes start with a %, and the characters
that immediately follow tell fprintf how to interpret the data. This can be
a collection of modifiers followed by a final character that provides the
type of the output.
You can find the full list of codes online, but here are the ones that we use
below:
Types:
d - signed decimal integer
u - unsigned decimal integer
X - unsigned hexadecimal integer (uppercase)
Size modifiers:
hh - char/ byte
h - short
l - long
ll - long long
Other modifier:
# - for octal or hexadecimal, preceed number with O, 0x or 0X as appropriate
Examples:
%lu - long unsigned decimal
%#hhX - one byte upper-case hexadecimal number preceeded by 0X (e.g., 0xF2)
The other function in use here is sizeof(), which returns the number of bytes
used for storage by the provided type (the input to this function is either
a variable or an actual type like int or some structure that you have
created).
*/
int main(int argc, char * argv[]){
printf("char: %lu bytes\n\t%hhd - %hhd [%#hhX - %#hhX]\n",
sizeof(char), SCHAR_MIN, SCHAR_MAX, SCHAR_MIN, SCHAR_MAX);
printf("unsigned char: %lu bytes\n\t%hhu - %hhu [%#hhX - %#hhX]\n",
sizeof(unsigned char), 0, UCHAR_MAX, 0, UCHAR_MAX);
printf("short: %lu bytes\n\t%hd - %hd [%#hX - %#hX]\n",
sizeof(short), SHRT_MIN, SHRT_MAX, SHRT_MIN, SHRT_MAX);
printf("unsigned short: %lu bytes\n\t%hu - %hu [%#hX - %#hX]\n",
sizeof(short), 0, USHRT_MAX, 0, USHRT_MAX);
printf("int: %lu bytes\n\t%d - %d [%#X - %#X]\n",
sizeof(int),INT_MIN, INT_MAX,INT_MIN, INT_MAX);
printf("unsigned int: %lu bytes\n\t%u - %u [%#X - %#X]\n",
sizeof(int),0, UINT_MAX, 0, UINT_MAX);
printf("long: %lu bytes\n\t%ld - %ld [%#lX - %#lX]\n",
sizeof(long), LONG_MIN, LONG_MAX, LONG_MIN, LONG_MAX);
printf("long: %lu bytes\n\t%lu - %lu [%#lX - %#lX]\n",
sizeof(long), 0UL, ULONG_MAX, 0UL, ULONG_MAX);
printf("long long: %lu bytes\n\t%lld - %lld [%#llX - %#llX]\n",
sizeof(long long), LLONG_MIN, LLONG_MAX, LLONG_MIN, LLONG_MAX);
printf("unsigned long long: %lu bytes\n\t%llu - %llu [%#llX - %#llX]\n",
sizeof(long long), 0ULL, ULLONG_MAX, 0ULL, ULLONG_MAX);
return 0;
}
When we run this in the lab, this is what we get:
char: 1 bytes
-128 - 127 [0X80 - 0X7F]
unsigned char: 1 bytes
0 - 255 [0 - 0XFF]
short: 2 bytes
-32768 - 32767 [0X8000 - 0X7FFF]
unsigned short: 2 bytes
0 - 65535 [0 - 0XFFFF]
int: 4 bytes
-2147483648 - 2147483647 [0X80000000 - 0X7FFFFFFF]
unsigned int: 4 bytes
0 - 4294967295 [0 - 0XFFFFFFFF]
long: 8 bytes
-9223372036854775808 - 9223372036854775807 [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
long: 8 bytes
0 - 18446744073709551615 [0 - 0XFFFFFFFFFFFFFFFF]
long long: 8 bytes
-9223372036854775808 - 9223372036854775807 [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
unsigned long long: 8 bytes
0 - 18446744073709551615 [0 - 0XFFFFFFFFFFFFFFFF]
In C, all of those types are stored as two's-compliment numbers. We can add unsigned
to the start to store unsigned values.
Note that char
is an integer type. In C, we can freely switch between characters (a letter in single quotes) and the ASCII values that the system actually stores. We can also use char
variables to hold small numbers that will never be interpreted as characters since we don't have a byte type.
Endianness
Another consideration for data storage is endianness. This has to do with the order in which bytes are laid out in memory. This is mostly hidden from us until we have to work with data produced on a machine with a different endianness.
Big Endian means that integers are laid out as we write them, with the most significant or "left" byte coming first.
Little Endian means that integers are laid out with the least significant byte coming first.
Here is a short program that makes use of the C union
to trick the computer into revealing if it stores data as big or little endian.
/*
This is a short program to illustrate endianness. This performs a simple
memory trick that allows us to view the actual order of bytes in memory.
it then prints out what the program perceives as the byte order and the
actual order in memory.
C. Andrews
2013-09-20
*/
#include <stdio.h>
#include <stdint.h>
int main(int argc, char * argv[]){
// the union stores both of these items in the same location in memory
// So, if I access this using .i, I can treat this memory as an int, and if
// I use C, it is an array of characters.
union {
uint32_t i;
char c[sizeof(uint32_t)];
} u;
// Set the value using the integer
u.i = 0x12345678;
// test the first byte in the array to see if it has the "high" byte
if (u.c[0] == 0x12){
printf("Big Endian\n");
}else{
printf("Little Endian\n");
}
// print out what the program sees
// we are using bit twiddling to mask out and extract the actual bytes
printf("Apparent storage: ");
for (int i = 3; i >=0; i--){
printf("%hhx ", (u.i >> (i * 8)) & 0xff);
}
printf( "\n");
// print out how it is stored in memory
// this just reads the data out of the array
printf("Actual storage: ");
for (int i = 0; i < 4; i++){
printf("%hhx ", u.c[i]);
}
printf( "\n");
return 0;
}
If we run this in the lab, this is what we get
Little Endian
Apparent storage: 12 34 56 78
Actual storage: 78 56 34 12
Boolean operators
Basic Boolean operations: AND, OR, NOT, (and XOR)
Logical operators
Your exposure to this is with the logical operators we use to work with Boolean values. In C (as in Java), these operators are && (AND), || (OR), and ! (NOT).
We use these in conditional statements to combine conditional expressions. (x>3) && (x<10) for example.
Note that in C, we have no Boolean type, and no True/False values. In C, 0 is treated as false and anything else is true. If you were to print out the value of an expression that would evaluate to true or false, you will get either 0 or 1.
Bitwise operators
C also provides bitwise operators. This work at the bit level, allowing us to work with the underlying representation of values.
Our operators look much like the logical operators: & (bitwise AND), | (bitwise OR), ~ (bitwise NOT), and ^ (bitwise XOR).
Example: 2^4 = 6
To understand this, we need to think at the binary level 010 ^ 100 = 110. We are taking each bit and applying the operation to it individually.
A common use for this is to do masking, if we are interested in just part of the number.
Example: 0x1AB4 & 0xFF
In bits, this is 0001 1010 1011 0100 & 0000 0000 1111 1111
When we AND these together at the bit level, we are left with 0000 0000 1011 0100, or 0xB4. So, in essence, we have "extracted" the least significant byte of the number.
The other important bitwise operators are >> and <<, which are the right and left shift operators. They shift all of the bits of a number by the requested amount.
Example: 6 >> 1 = 3 (0110 >> 1 = 0011)
Note that this is the equivalent to dividing the number by 2. Correspondingly, left shifting multiplies the number by two (just as similar shifts would divide and multiply by 10 in decimal). You can see this in action in the endian check code.