Friday, November 15, 2013

Ex. 3.1 & 3.3 Solution: Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution Manual

Q.3.1: Given the following benchmark code, and assuming a virtually-addressed fully-associative cache with infinite capacity and 64 byte blocks, compute the overall miss rate (number of misses divided by number of references). Assume that all variables except array locations reside in registers, and that arrays A, B, and C are placed consecutively in memory. 
                                                   double A[1024], B[1024], C[1024]; 
         for(int i=0;i<1000;i += 2) {
                                        A[i] = 35.0 * B[i] + C[i+1];
                                              }

Sol: Assume Double type is 8 bytes
       Each block can store 8 entries of array
If (the index of array) % 8 = 0, then it is a miss, result in hits of other memory access to the same block. For example, A[0] is a miss, then A[2], A[4],A[6] is a hit, since they are in the same block as A[0]. A[8] is a miss again. 


There are 3000/2 memory references (500 each to A, B, C) For each array, these span ceil (1000*8/64) = 125 blocks, hence there are 125 misses to each array (Array C has one more array compare to the other two arrays), leading to 3 x 125 + 1 = 376 total misses. The overall miss rate is 376/1500 = 25.07% misses per reference. 






Q.3.3: Given the example code in Problem 1, and assuming a virtually-addressed two-way set associative cache of capacity 8KB and 64 byte blocks, compute the overall miss rate (number of misses divided by number of references). Assume that all variables except array locations reside in registers, and that arrays A, B, and C are placed consecutively in memory.

Sol: The first iteration accesses memory location &B[0], &C[1], and & A[0]. Unfortunately,since the arrays are consecutive in memory, these locations are exactly 8 KB (1024 x 8 B per double) apart.  Hence, in a two-way set-associative cache they conflict, and the access to A[0] will evict B[0].  In the second iteration, the access to B[1] will evict C[1], and so on. 
However, since the access to C is offset by 1 double (8 bytes), in the seventh iteration it will access C[8], which does not conflict with B[7].  Hence, B[7] will hit, as will A[7].  In the eighth iteration, C[9] will also hit, but now B[8] and A[8] will again conflict, and no hits will result.  Hence, there are three hits every eight iterations, leading to a total number of hits of floor(1000/8)*3 = 375 hits.  The number of misses is 3000-375 = 2625, for an overall miss rate of 87.5% misses per reference.







Virtual Memory System:

In modern computer system, which are really high performing, the lowest level of memory hierarchy is implemented using magnetic disks. These memory disks can be used as a paging device or backing store for the physical memory which of virtual memory system. The blocks of memory present in backing store, is displayed from main memory because of capacity reasons. This is identical to as blocks are displayed from cache and placed either in the next level of the cache hierarchy or in main memory.
A virtual memory allows each concurrent program to allocate and occupy as much memory as the system's backing store and in virtual address space allows that can be up to 4 GB for a machine with 32 bit virtual addresses. Such a system is responsible for providing the illusion that all virtually addressable memory is resident in physical memory and can be transparently accessed by the program.
Finally, a virtual memory system must provide an architecture means for translating a virtual address to a physical address and a structure for storing these mapping.  




Next Topic
Q.3.4: Consider a cache with 256 bytes. Word size is 4 bytes and block size is 16 bytes. Show the values in the cache and tag bits after each of the following memory access operations for the two cache organizations direct mapped and 2-way associative. Also indicate whether the access was a hit or a miss. Justify. The addresses are in hexadecimal representation. Use LRU (least recently used) replacement algorithm wherever needed.
1.Read 0010
2.Read 001C
3.Read 0018
4.Write 0010
5.Read 0484
6.Read 051C
7.Read 001C
8.Read 0210
9.Read 051C

Previous Topic 
Q.2.16:  In a TYP-based pipeline design with a data cache, load instructions check the tag array for a cache hit in parallel with accessing the data array to read the corresponding memory location. Pipelining stores to such a cache is more difficult, since the processor must check the tag first, before it overwrites the data array. Otherwise, in the case of a cache miss, the wrong memory location may be overwritten by the store. Design a solution to this problem that does not require sending the store down the pipe twice, or stalling the pipe for every store instruction. Referring to Figure 2-15, are there any new RAW, WAR, and/or WAW memory hazards? 
Q.2.17: The MIPS pipeline shown in Table 2-7 employs a two-phase clocking scheme that makes efficient use of a shared TLB, since instruction fetch accesses the TLB in phase one and data fetch accesses in phase two. However, when resolving a conditional branch, both the branch target address and the branch fall-through address need to be translated during phase one--in parallel with the branch condition check in phase one of the ALU stage--to enable instruction fetch from either the target or the fall-through during phase two. This seems to imply a dual-ported TLB. Suggest an architected solution to this
problem that avoids dual-porting the TLB.

No comments:

Post a Comment