CS 202 - Notes 2016-05-06

Superscalar

The superscalar processor can issue and execute multiple instructions in a single cycle

The changed fetch execute cycle

Fetch several instructions from the instruction cache at once

Decode the instructions to break them down into single operations / microinstructions

The micro instructions are sent to the Execution Unit. The execution unit routes the operations to an appropriate Functional Unit. The operation is put in a queue until the functional unit can execute it

As results are generated, they are sent to the Retirement Unit. The instructions are stored in a FIFO queue, where they wait until they are flushed out because we made a bad branch prediction or all of its operations are complete and the branches before it have been passed. At this point, the value associated it is considered valid.

Before the instruction is flushed, if it updates a register, we don’t actually write the value into the register file, we add a tag to the file. If later instructions need the value, they see the tag and look up the instruction’s result.

Each functional unit is pipelined. We can talk about the latency and the issue time. The latency is the number of clock cycles it takes the value to pass through the functional unit. The issue time is how many cycles we have to wait before we can add a new value into the pipeline. If you look at the examples on the slide, you can see that the integer multiply has a latency of 3 clock cycles, but it is fully pipelined so we can add a new set of values one cycle later. The division operations, on the other hand, are not pipelined, and the latency and issue time is the same.

Memory

We looked at a collection of different types of memory.

SRAM – This is used for registers and small caches. It is low density, so it is expensive (i.e., it takes up a lot of room). On the other hand it is very fast.

DRAM – This is what we use for main memory (i.e., RAM). The mechanics are a little more complex, but it is still fairly fast and we can make it relatively cheaply.

magnetic disk – This is what we make hard drives out of. Lots of room, very cheap, and very, very, slow.

solid state memory – We are starting to see this as the main drive in laptops. Much faster than hard drives, though still relatively expensive (though the price is dropping).

Accessing data on a hard drive

Hard drives are made up of a collection of platters stacked together on a central spindle. The data is broken up into tracks on the disk, and each track is broken up into sector, which will be our basic unit of reading and writing.

There are a number of factors that go into access time on a disk

seek time – We need to read a sector on a new track, so the read head needs to be moved forward or backward to be over the appropriate track. Typical time 3-9ms.

rotational latency – Once the head is positioned over the track, we need to wait for the sector to rotate under the head. On average we will have to wait half a rotation. 1/2 * 1/RPM * 60s/1 min

transfer time – Once the head is over the start of the sector, we now need to rotate the sector under the head. 1/RPM * 1/(average # sectors/ track) * 60s/1 min

Typical speeds range from 5400 (laptops) to 15000 RPM, with 7200 RPM being a typical speed for a desktop drive.

Example:

Consider a drive that spins at 720 RPM, with an average seek time of 9ms, and an average of 400 sectors per track.

The time for a full rotation is 1/RPM * 60s/1 min => 8.3ms (we will say about 8ms for convenience)

Rotational latency will then be, on average, 4ms

Transfer time will be 8ms * 1/400 => approximately 0.02ms

Total access time = 9 + 4 + 0.02ms = 13.02ms

From computer time to human time

You are working in the lab and you need find there is something you need to know how to do. There are some different ways you could get the answer.

You look down and the answer is in your notes right in front of you. We will use this as our baseline or the equivalent of using a value stored in a register. It takes you about 10 seconds to look down and read your note. it takes the computer about 2.5ns to read a value out of a register. So, our conversion factor is 1ns in computer time is the equivalent of 4s in human time.

What would be the equivalent of going to main memory? Assuming an access time of 70ns, that is 70*4=280 seconds. Which is about 5 minutes. So you could walk down the hall and ask me (if you are quick).

How about the hard drive? 5,000,000ns * 4 = 20,000,000s = 55,555 hours = 2,314 days = a little over 6 years. So you have time to get a Ph.D. in computer science and answer your own question…