Sunday, November 17, 2013

Ex. 3.13 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution Manual

Q.3.13: Consider a processor with 32-bit virtual addresses, 4KB pages and 36-bit physical addresses. Assume memory is byte-addressable (i.e. the 32-bit VA specifies a byte in memory).
L1 instruction cache: 64 Kbytes, 128 byte blocks, 4-way set associative, indexed and tagged with virtual address.
L1 data cache: 32 Kbytes, 64 byte blocks, 2-way set associative, indexed and tagged with physical address, write-back.
4-way set associative TLB with 128 entries in all. Assume the TLB keeps a dirty bit, a reference bit, and 3 permission bits (read, write, execute) for each entry.

Specify the number of offset, index, and tag bits for each of these structures in the table below. Also, compute the total size in number of bit cells for each of the tag and data arrays.

Sol: Offset is the index to the block/page to find the right byte, so the bits needed for the offset is log (block size)

Index is the index to the cache or TLB to find the right block or page, so the bits needed for the index is (bits of address-bits of offset) 

Tag is the bits to differentiate the different block/pages that may map to the same block/page in cache/TLB.

Size of tag will be number of entries times number of tag bits in each block .
Size of data will be number of entries times number of data bits.








A translation lookaside buffer (TLB)

A translation lookaside buffer (TLB) is a cache that memory management hardware uses to improve virtual address translation speed. One or more TLBs are include in All current desktop, laptop, and server processors in the memory management hardware, and it is nearly always present in any hardware that utilizes paged virtual memory. 
A TLB has a fixed number of slots that contain page table entries, which map virtual addresses to physical addresses. From a process, the virtual memory is the space that is seen. This space is segmented in pages of a fixed size. The page table (generally stored in memory) keeps track of where the virtual pages are stored in the physical memory. The TLB is a cache of the page table; that is, only a subset of page table contents is held in TLB.
The TLB references physical memory addresses in its table. It may reside between the CPU and the CPU cache, between the CPU cache and primary storage memory, or between levels of a multi-level cache. The placement determines whether the cache uses physical or virtual addressing. If the cache is virtually addressed, requests are sent directly from the CPU to the cache, and the TLB is accessed only on a cache miss. If the cache is physically addressed, the CPU does a TLB lookup on every memory operation and the resulting physical address is sent to the cache. There are pros and cons to both implementations. Caches that use virtual addressing have for their key part of the virtual address plus, optionally, a key called an "address space identifier" (ASID). Caches that don't have ASIDs must be flushed every context switch in a multiprocessing environment.
In a Harvard architecture or hybrid thereof, a separate virtual address space or memory access hardware may exist for instructions and data. This can lead to distinct TLBs for each access type, an Instruction Translation Lookaside Buffer (ITLB) and a Data Translation Lookaside Buffer (DTLB).


Hit in the TLB

• TLB contains a translation for the virtual address and the physical address of the reference can be used to complete the memory reference in hardware without software involvement.  When a page is evicted from the main memory, translations for the page are evicted from the TLB as well.  A TLB hit means that the physical page containing the address is mapped in memory.

TLB miss and page mapping

• System accesses the page table to find the translation for the virtual address. it copies that translation into the TLB, and the memory reference proceeds. TLB misses generally take a relatively short time to resolve, because the system just has to access the page table. Assuming no page faults occur while accessing the page table. TLB misses can usually be resolved in a few hundred cycles. User program just waits until the TLB miss has been resolved. The system accesses the page table, determines that the address is not mapped, and a page fault occurs.



Next Topic:
Q.3.16: Assume a two-level cache hierarchy with a private level one instruction cache (L1I), a private level one data cache (L1D), and a shared level two data cache (L2). Given local miss rates for the 4% for L1I, 7.5% for L1D, and 35% for L2, compute the global miss rate for the L2 cache.
Q.3.17: Assuming 1 L1I access per instruction and 0.4 data accesses per instruction, compute the misses per instruction for the L1I, L1D, and L2 caches of Problem 16.
Q.3.18: Given the miss rates of Problem 16, and assuming that accesses to the L1I and L1 D caches take one cycle, accesses to the L2 take 12 cycles, accesses to main memory take 75 cycles, and a clock rate of 1GHz, compute the average memory reference latency for this cache hierarchy.
Q.3.19: Assuming a perfect cache CPI (cycles per instruction) for a pipelined processor equal to 1.15 CPI, compute the MCPI and overall CPI for a pipelined processor with the memory hierarchy described in Problem 18 and the miss rates and access rates specified in Problem 16 and Problem 17.

Previous Topic:
Q.3.4: Consider a cache with 256 bytes. Word size is 4 bytes and block size is 16 bytes. Show the values in the cache and tag bits after each of the following memory access operations for the two cache organizations direct mapped and 2-way associative. Also indicate whether the access was a hit or a miss. Justify. The addresses are in hexadecimal representation. Use LRU (least recently used) replacement algorithm wherever needed.
1.Read 0010
2.Read 001C
3.Read 0018
4.Write 0010
5.Read 0484
6.Read 051C
7.Read 001C
8.Read 0210
9.Read 051C
SOLUTION

1 comment:

  1. On Demand Intel Xeon Gold 6230 in UAE, Gold 6230 in UAE in UAE, Server Processor in UAE
    https://gccgamers.com/computer-parts-components/intel-xeon-gold-6230-socket-fclga3647-20cores-40threads-server-processor-bx806956230.html

    ReplyDelete