Wednesday, November 20, 2013

Ex. 3.28, 3.34 and 3.35 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution manual

Q.3.28: Assume a synchronous front-side processor-memory bus that operates at 100 Hz and has an 8-byte data bus. Arbitration for the bus takes one bus cycle (10 ns), issuing a cache line read command for 64 bytes of data takes one cycle, memory controller latency (including DRAM access) is 60 ns, after which data double words are returned in back-to back cycles. Further assume the bus is blocking or circuit-switched. Compute the latency to fill a single 64-byte cache line. Then compute the peak read bandwidth for this processor-memory bus, assuming the processor arbitrates for the bus for a new read in the bus cycle following completion of the last read.

Sol:   Arbitration                                                 : 1 cycle 10 ns
        Issuing                                                        : 1 cycle 10 ns
        Controller latency                                       : 60 ns
        Transmission                                              : 64 byte / 8 byte = 8 cycles 80 ns
        Total time needed to fill a single cache line : 10+10+60+80 = 160 ns 
        Bandwidth                                                  : 1/160 *10^9 * 64 bytes = 400 MB 


Q.3.34: Assume a single-platter disk drive with an average seek time of 4.5 ms, rotation speed of 7200 rpm, data transfer rate of 10 Mbytes/s per head, and controller overhead and queueing of 1 ms. What is the average access latency for a 4096-byte read?

Sol: Assume the block size is 512 bytes, 
               the transfer time for a block is 512/10 M  = 51.2 us
        Transfer time for 4096 byte read is 4096/10 M = 0.4096 ms
                            Rotational latency is 60/7200/2    = 1/120/2 s = 4.17 ms 
          Latency for a 4096-byte read is: 4.17 ms + 4.5 + 0.4096 + 1 = 10.0796 ms


Q.3.35: Recompute the average access latency for Problem 34 assuming a rotation speed of 15 K rpm, two platters, and an average seek time of 4.0 ms.

Sol: Assume the block size is 512 bytes, 
            the transfer time for a block is 512/10 M  = 51.2 us
     Transfer time for 4096 byte read is 4096/10 M = 0.4096 ms
                        Rotational latency is 60/15000/2   =1/250 /2 s = 2 ms 
        Latency for a 4096-byte read is: 2 + 4 + 0. 4096 + 1 = 7.4096 ms

Note: The read heads on multiple platters read data in serial, not in parallel.







Next Topic
Q.4.8: In an in-order pipelined processor, pipeline latches are used to hold result operands from the time an execution unit computes them until they are written back to the register file during the writeback stage. In an out-of-order processor, rename registers are used for the same purpose. Given four-wide out of order processor TYPpipeline, compute the minimum number of rename registers needed to prevent rename register starvation from limiting concurrency. What happens to this number if frequency demands force a designer to add five extra pipeline stages between dispatch and execute, and five more stages between execute and retire/writeback?

Previous Topic
Q.3.28: Assume a synchronous front-side processor-memory bus that operates at 100 Hz and has an 8-byte data bus. Arbitration for the bus takes one bus cycle (10 ns), issuing a cache line read command for 64 bytes of data takes one cycle, memory controller latency (including DRAM access) is 60 ns, after which data double words are returned in back-to back cycles. Further assume the bus is blocking or circuit-switched. Compute the latency to fill a single 64-byte cache line. Then compute the peak read bandwidth for this processor-memory bus, assuming the processor arbitrates for the bus for a new read in the bus cycle following completion of the last read.
Q.3.31: Consider finite DRAM bandwidth at a memory controller, as follows. Assume double-data-rate DRAM operating at 100 MHz in a parallel non-interleaved organization, with an 8 byte interface to the DRAM chips. Further assume that each cache line read results in a DRAM row miss, requiring a precharge and RAS cycle, followed by row-hit CAS cycles for each of the double words in the cache line. Assuming memory controller overhead of one cycle (10 ns) to initiate a read operation, and one cycle latency to transfer data from the DRAM data bus to the processor-memory bus, compute the latency for reading one 64 byte cache block. Now compute the peak data bandwidth for the memory interface, ignoring DRAM refresh cycles.

No comments:

Post a Comment