CS 202 - Notes 2018-09-28

Integer sizes

Values in the computer have to be stored in some fixed size location

As programmers, our control over this storage comes when we declare variable types

In Java, the various integer types are well defined

Name	# of bytes
byte	1 byte
short	2 bytes
int	4 bytes
long	8 bytes

This is because Java is designed to run on a virtual machine, so the developers of the language have control over the underlying storage.

In Python, we have dynamic typing and we don't clare our variables, so everything is fairly hidden. The most common flavor of Python is written in C uses longs to store integers. However, they have some trick data structures under the hood that can allow integers to be essentially unbounded if they grow beyond the capacity of the long.

In C, things are a little less straightforward

Name	# of bytes
char	1 byte
short	at least 2 bytes
int	at least as big as the short
long	at least 4 bytes
long long	at least 8 bytes

On most current desktops, we will find that the int is 4 bytes and the long is 8 (making long long redundant).

Here is a short program designed to test the size of each integer datatype in C.

/*                                                                              
This is a short program that lists the sizes of the basic numeric types.        
                                                                                
It illustrates some of the options for using fprintf as well as the constants   
provided in limits.h.                                                           
                                                                                
To compile: gcc -o sizes sizes.c                                             
                                                                                
C. Andrews                                                                                                                                     
*/                                                                              
                                                                                
                                                                                
                                                                                
#include <stdio.h>                                                              
#include <limits.h>                                                             
                                                                                
/*                                                                              
  Our simple main method that prints out the sizes.                             
                                                                                
  fprintf (and printf) are _formatted_ output. We use replacement codes within  
  the string, which are then replaced by the items in the comma separated list  
  that follows it. The replacement codes start with a %, and the characters     
  that immediately follow tell fprintf how to interpret the data. This can be   
  a collection of modifiers followed by a final character that provides the     
  type of the output.                                                           
                                                                                
  You can find the full list of codes online, but here are the ones that we use 
  below:                                                                        
                                                                                
  Types:                                                                        
  d - signed decimal integer                                                    
  u - unsigned decimal integer                                                  
  X - unsigned hexadecimal integer (uppercase)                                  
                                                                                
  Size modifiers:                                                               
  hh - char/ byte                                                               
  h - short                                                                     
  l - long                                                                      
  ll - long long                                                                
                                                                                
  Other modifier:                                                               
  # - for octal or hexadecimal, preceed number with O, 0x or 0X as appropriate  
                                                                                
                                                                                
  Examples:                                                                     
  %lu - long unsigned decimal                                                   
  %#hhX - one byte upper-case hexadecimal number preceeded by 0X (e.g., 0xF2)   
                                                                                
                                                                                
  The other function in use here is sizeof(), which returns the number of bytes 
  used for storage by the provided type (the input to this function is either   
  a variable or an actual type like int or some structure that you have         
  created).                                                                     
*/                                                                              
                                                                                
                                                                                
                                                                                
int main(int argc, char * argv[]){                                              
                                                                                
   printf("char: %lu bytes\n\t%hhd - %hhd [%#hhX - %#hhX]\n",                   
      sizeof(char), SCHAR_MIN, SCHAR_MAX,  SCHAR_MIN, SCHAR_MAX);               
   printf("unsigned char: %lu bytes\n\t%hhu - %hhu [%#hhX - %#hhX]\n",          
      sizeof(unsigned char), 0, UCHAR_MAX,  0, UCHAR_MAX);                      
  printf("short: %lu bytes\n\t%hd - %hd [%#hX - %#hX]\n",                       
      sizeof(short), SHRT_MIN, SHRT_MAX,  SHRT_MIN, SHRT_MAX);                  
  printf("unsigned short: %lu bytes\n\t%hu - %hu [%#hX - %#hX]\n",              
      sizeof(short), 0, USHRT_MAX, 0, USHRT_MAX);                               
  printf("int: %lu bytes\n\t%d - %d  [%#X - %#X]\n",                            
      sizeof(int),INT_MIN, INT_MAX,INT_MIN, INT_MAX);                           
  printf("unsigned int: %lu bytes\n\t%u - %u  [%#X - %#X]\n",                   
      sizeof(int),0, UINT_MAX, 0, UINT_MAX);                                    
  printf("long: %lu bytes\n\t%ld - %ld  [%#lX - %#lX]\n",                       
      sizeof(long), LONG_MIN, LONG_MAX, LONG_MIN, LONG_MAX);                    
  printf("long: %lu bytes\n\t%lu - %lu  [%#lX - %#lX]\n",                       
      sizeof(long), 0UL, ULONG_MAX, 0UL, ULONG_MAX);                            
  printf("long long: %lu bytes\n\t%lld - %lld  [%#llX - %#llX]\n",              
      sizeof(long long), LLONG_MIN, LLONG_MAX, LLONG_MIN, LLONG_MAX);           
  printf("unsigned long long: %lu bytes\n\t%llu - %llu  [%#llX - %#llX]\n",     
      sizeof(long long), 0ULL, ULLONG_MAX, 0ULL, ULLONG_MAX);                   
                                                                                
  return 0;                                                                     
}

When we run this in the lab, this is what we get:

char: 1 bytes
	-128 - 127 [0X80 - 0X7F]
unsigned char: 1 bytes
	0 - 255 [0 - 0XFF]
short: 2 bytes
	-32768 - 32767 [0X8000 - 0X7FFF]
unsigned short: 2 bytes
	0 - 65535 [0 - 0XFFFF]
int: 4 bytes
	-2147483648 - 2147483647  [0X80000000 - 0X7FFFFFFF]
unsigned int: 4 bytes
	0 - 4294967295  [0 - 0XFFFFFFFF]
long: 8 bytes
	-9223372036854775808 - 9223372036854775807  [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
long: 8 bytes
	0 - 18446744073709551615  [0 - 0XFFFFFFFFFFFFFFFF]
long long: 8 bytes
	-9223372036854775808 - 9223372036854775807  [0X8000000000000000 - 0X7FFFFFFFFFFFFFFF]
unsigned long long: 8 bytes
	0 - 18446744073709551615  [0 - 0XFFFFFFFFFFFFFFFF]

In C, all of those types are stored as two's-compliment numbers. We can add unsigned to the start to store unsigned values.

Note that char is an integer type. In C, we can freely switch between characters (a letter in single quotes) and the ASCII values that the system actually stores. We can also use char variables to hold small numbers that will never be interpreted as characters since we don't have a byte type.

Endianness

Another consideration for data storage is endianness. This has to do with the order in which bytes are laid out in memory. This is mostly hidden from us until we have to work with data produced on a machine with a different endianness.

Big Endian means that integers are laid out as we write them, with the most significant or "left" byte coming first.

Little Endian means that integers are laid out with the least significant byte coming first.

Here is a short program that makes use of the C union to trick the computer into revealing if it stores data as big or little endian.


/*                                                                              
This is a short program to illustrate endianness. This performs a simple        
memory trick that allows us to view the actual order of bytes in memory.        
                                                                                
it then prints out what the program perceives as the byte order and the         
actual order in memory.                                                         
                                                                                
C. Andrews                                                                      
                                                                                
2013-09-20                                                                      
*/                                                                              
                                                                                
#include <stdio.h>                                                              
#include <stdint.h>                                                             
                                                                                
                                                                                
int main(int argc, char * argv[]){                                              
                                                                                
  // the union stores both of these items in the same location in memory        
  // So, if I access this using .i, I can treat this memory as an int, and if   
  // I use C, it is an array of characters.                                     
  union {                                                                       
    uint32_t i;                                                                 
    char c[sizeof(uint32_t)];                                                   
  } u;                                                                          
                                                                                
  // Set the value using the integer                                            
  u.i = 0x12345678;                                                             
                                                                                
                                                                                
  // test the first byte in the array to see if it has the "high" byte          
  if (u.c[0] == 0x12){                                                          
    printf("Big Endian\n");                                                     
  }else{                                                                        
    printf("Little Endian\n");                                                  
  }                                                                             
                                                                                
                                                                                
  // print out what the program sees                                            
  // we are using bit twiddling to mask out and extract the actual bytes        
  printf("Apparent storage: ");                                                 
                                                                                
  for (int i = 3; i >=0; i--){                                                  
    printf("%hhx ", (u.i >> (i * 8)) & 0xff);                                   
  }                                                                             
  printf( "\n");                                                                
                                                                                
                                                                                
  // print out how it is stored in memory                                       
  // this just reads the data out of the array                                  
  printf("Actual storage: ");                                                   
  for (int i = 0; i < 4; i++){                                                  
    printf("%hhx ", u.c[i]);                                                    
  }                                                                             
  printf( "\n");                                                                
                                                                                
                                                                                
                                                                                
  return 0;                                                                     
}

If we run this in the lab, this is what we get

Little Endian
Apparent storage: 12 34 56 78 
Actual storage: 78 56 34 12

Boolean operators

Basic Boolean operations: AND, OR, NOT, (and XOR)

Logical operators

Your exposure to this is with the logical operators we use to work with Boolean values. In C (as in Java), these operators are && (AND), || (OR), and ! (NOT).

We use these in conditional statements to combine conditional expressions. (x>3) && (x<10) for example.

Note that in C, we have no Boolean type, and no True/False values. In C, 0 is treated as false and anything else is true. If you were to print out the value of an expression that would evaluate to true or false, you will get either 0 or 1.

Bitwise operators

C also provides bitwise operators. This work at the bit level, allowing us to work with the underlying representation of values.

Our operators look much like the logical operators: & (bitwise AND), | (bitwise OR), ~ (bitwise NOT), and ^ (bitwise XOR).

Example: 2^4 = 6

To understand this, we need to think at the binary level 010 ^ 100 = 110. We are taking each bit and applying the operation to it individually.

A common use for this is to do masking, if we are interested in just part of the number.

Example: 0x1AB4 & 0xFF

In bits, this is 0001 1010 1011 0100 & 0000 0000 1111 1111

When we AND these together at the bit level, we are left with 0000 0000 1011 0100, or 0xB4. So, in essence, we have "extracted" the least significant byte of the number.

The other important bitwise operators are >> and <<, which are the right and left shift operators. They shift all of the bits of a number by the requested amount.

Example: 6 >> 1 = 3 (0110 >> 1 = 0011)

Note that this is the equivalent to dividing the number by 2. Correspondingly, left shifting multiplies the number by two (just as similar shifts would divide and multiply by 10 in decimal). You can see this in action in the endian check code.

# CS 202 - Notes 2018-09-28

# Integer sizes

# Endianness

# Boolean operators