CS 202 - Homework 9 - 11/12/07
Due: Monday, 11/19/07, in class (or by 2pm the latest)
This homework has two parts and covers circuits and optimization in C
and assembly. You may work in groups of two, or by yourself if you
prefer. I only need one submission per group (both electronic and
paper).
On your paper submission, please write down how much time it took
you.
Part 1: 8x8 RAM
Implement a 8x8 RAM in LogicWorks. We will use this circuit for the
implementation of the full Sniac circuit
in the next few weeks. Your RAM should have 6 inputs: the 3 address
lines A2, A1, A0, as well as CS (chip select), READ, and WRITE. It
should also have 8 data lines D7, ..., D0, that serve both as inputs
(if CS=1 and WRITE=1) and outputs (if CS=1 and READ=1). To make a
bidirectional pin in a subcircuit, use "Port Bidir". The data lines
should be in "Z" mode if CS=0 or both READ=0 and WRITE=0. You can
assume that READ and WRITE are never both 1 at the same time.
The internal organization of the RAM is up to you. It makes sense to
build the 8x8 RAM from 8 subcircuits: either 8x1 RAMS or 1x8 RAMS.
Each of these subcircuits in turn needs to be built from 8 D-latches,
and should probably have the same control inputs as the 8x8 circuit.
Use LogicWork's "D Latch wo/SQ/" as your basic 1-bit storage unit (set
the inverted R input permanently to 1). Note that the RAM has no clock
input and is thus not edge-triggered.
Once your circuit is completed, test
it using an arrangement like this (which includes an 8-bit tri-state buffer,
"Buffer-8 T. S."):
With WRITE=1, change the addresses and data inputs to write values
into all 8 data cells. Then, set the switch so that WRITE=0 and
READ=1. You should be able to go through the different addresses and
recall the values that you wrote earlier.
Hand in printouts of your top-level test circuit, of your 8x8 RAM
implementation, of your 1x8 or 8x1 RAM circuit, and of any other
subcircuits you define.
Part 2: Optimization
To get started, copy all files from ~schar/cs202/hw9. Type
"make" to compile. Your job is to optimize the C functions
in sumC.c and maxC.c, and write and optimize
assembly functions sumA.s and maxA.s. Use the
program hw9.c to test and time your programs. Your submission should
not only include the fastest version of your code, but also a (1-2
page) discussion of your experiments and results.
Here are some tips and some questions that you should address in your
discussion:
- Try out the different optimization techniques we discussed in class.
Keep a log of which optimization technique yields what speedup.
- Try out different array sizes. Which sizes yield the fastest running
times? Why? What conclusions can you draw about the specs of the machine
you are using?
- When you try loop unrolling and loop splitting, you don't
have to worry about the boundary cases since the number of elements gets
rounded up to the next higher multiple of 60 (which is divisible by
2, 3, 4, 5, and 6, among others).
- If possible, try out different machines with different types of
(Pentium / compatible) processors. Do you get different CPE numbers?
- Are your best assembly programs faster than your best C programs?
If yes, can you explain why your assembly optimization cannot be translated
into C? If no, do you think that it is always possible to write C code
that matches the performance of assembly code?
Submission details
Please submit your (fastest) programs for part 2 electronically by typing
~schar/bin/submit202
in the directory containing your files. Hand in a printout of your
circuit for part 1 and of the written discussion for part 2.
Don't forget to note the time it took you!