Lecture 07 - Real numbers
Goals
- Learn how to represent real numbers
- Learn the process of converting to and from IEEE 754
- Learn how we represent a collection of other types of binary data
real numbers
all of our discussion to date has been about integers, what about real numbers?
If we are writing out numbers, then we can use the some ideas that we use in decimal
we will talk about the radix point instead of the decimal point, but the concept is the same
in decimal, the positional values continue to be powers of 10 – they are just negative powers
- \(0.1 = 10^{-1} = \frac{1}{10}\)
- \(0.01 = 10^{-2} = \frac{1}{100}\)
- etc…
In binary
- \(0.1 = 2^{-1} = \frac{1}{2}\)
- \(0.01 = 2^{-2} = \frac{1}{4}\)
- etc…
but we have the same problem we had with the negative sign – how do we expression the radix point?
floating point
there are a collection of different approaches we could use, but we are going to use something called floating point (this would be why the data type for real numbers is called float)
The basic concept is that we are going to represent our number in a form that resembles scientific notation
Let’s take an example: \(-5 \frac{3}{16}\)
we start with the \(5\)
- 101
Now we need to do the \(\frac{3}{16}\)
- 16 is \(2^{4}\) so \(\frac{1}{16} = 2^{-4}\)
- \(\frac{1}{16} = 0.0001\)
- we have three of those, so \(0.0011\)
Putting that together, we have \(-101.0011\)
Now we normalize the number by shifting the radix point to be immediately behind the most significant digit
\(-1.010011 \times 2^{2}\) Since we shifted two times to the left, we need to multiple the number by \(2^{2}\) to retain the value
Now we need to figure out how to store this number
- need to keep track of the sign, the significand and the exponent
IEEE 754
IEEE 754 is the standard that provides the details of how we will do this
the sign will be stored in the high order bit of the number (like signed magnitude)
the exponent will be stored in the next block of bits
we need to be able to express both positive and negative exponents instead of using two’s compliment, we are going to use something called excess notation Given \(n\) bits, we will pick a bias of either \(2^{n-1}\) or \(2^{n-1} - 1\) the choice is just based on whether we want an extra negative or an extra positive value
the idea is that we add the bias to the exponent to get the representation (basically we shift the most negative number up to 0)
to figure out which number is being represented, we reverse the process by subtracting the bias
the significand we take up the remaining available bits
we make one small tweak though however since the digit to the left of the radix point is always 1, we don’t both putting it in our representation
IEEE 754 provides the standard for two data types: the float and the double
float
- 32-bit number
- 1 bit sign
- 8 bit exponent using excess 127
- 23 bit significand
double
- 64 bit number
- 1 bit sign
- 11 bit exponent using excess 1023
- 52 bit significand
there are also some special patterns
| single exponent | single significand | double exponent | double significand | meaning |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 0 | non-zero | 0 | non-zero | +/- denormalized number |
| 1-254 | anything | 1-2046 | anything | +/- normalized number |
| 255 | 0 | 2047 | 0 | +/- infinity |
| 255 | non-zero | o | non-zero | NaN |
Returning to our example: \(-1.010011 \times 2^{2}\)
the exponent will be \(2 +127 = 129 = 10000001_{2}\)
so our representation will be
1 10000001 01001100000000000000000
| | |
| | significand with leading 1 removed
| exponent in excess 127
sign bit
Note that this representation can only represent a limited number of values it will struggle with values like .3, which requires an infinite number of bits to represent (0.01001100110011…) this is a little bit of a problem if you really need that precision (like say you are a bank) so there are other representations when we really need full precision
Mechanical level
vocabulary
Skills
- convert between binary and IEEE 754