Values in the computer have to be stored in some fixed size location
As programmers, our control over this storage comes when we declare variable types
In Java, the various integer types are well defined
Name | # of bytes |
---|---|
byte | 1 byte |
short | 2 bytes |
int | 4 bytes |
long | 8 bytes |
This is because Java is designed to run on a virtual machine, so the developers of the language have control over the underlying storage.
In Python, we have dynamic typing and we don’t clare our variables, so everything is fairly hidden. The most common flavor of Python is written in C uses longs to store integers. However, they have some trick data structures under the hood that can allow integers to be essentially unbounded if they grow beyond the capacity of the long.
In C, things are a little less straightforward
Name | # of bytes |
---|---|
char | 1 byte |
short | at least 2 bytes |
int | at least as big as the short |
long | at least 4 bytes |
long long | at least 8 bytes |
On most current desktops, we will find that the int is 4 bytes and the long is 8 (making long long redundant).
Check out 07-sizes.c for the program to test your particular setup.
In C, all of those types are stored as two’s-compliment numbers. We can add unsigned
to the start to store unsigned values.
Note that char
is an integer type. In C, we can freely switch between characters (a letter in single quotes) and the ASCII values that the system actually stores. We can also use char
variables to hold small numbers that will never be interpreted as characters since we don’t have a byte type.
Another consideration for data storage is endianness. This has to do with the order in which bytes are laid out in memory. This is mostly hidden from us until we have to work with data produced on a machine with a different endianness.
Big Endian means that integers are laid out as we write them, with the most significant or “left” byte coming first.
Little Endian means that integers are laid out with the least significant byte coming first.
The example in 08-endian.c shows how to use a union
to trick the computer into revealing the byte order in memory.
Basic Boolean operations: AND, OR, NOT, (and XOR)
Your exposure to this is with the logical operators we use to work with Boolean values. In C (as in Java), these operators are && (AND), || (OR), and ! (NOT).
We use these in conditional statements to combine conditional expressions. (x>3) && (x<10) for example.
Note that in C, we have no Boolean type, and no True/False values. In C, 0 is treated as false and anything else is true. If you were to print out the value of an expression that would evaluate to true or false, you will get either 0 or 1.
C also provides bitwise operators. This work at the bit level, allowing us to work with the underlying representation of values.
Our operators look much like the logical operators: & (bitwise AND), | (bitwise OR), ~ (bitwise NOT), and ^ (bitwise XOR).
Example: 2^4 = 6
To understand this, we need to think at the binary level 010 ^ 100 = 110. We are taking each bit and applying the operation to it individually.
A common use for this is to do masking, if we are intersted in just part of the number.
Example: 0x1AB4 & 0xFF
In bits, this is 0001 1010 1011 0100 & 0000 0000 1111 1111
When we AND these together at the bit level, we are left with 0000 0000 1011 0100, or 0xB4. So, in essence, we have “extracted” the least significant byte of the number.