target IES: Ex 5.14 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution Manual

Q.5.14: Below is the control flow graph of a simple program. The CFG is annotated with three different execution trace paths. For each execution trace circle which branch predictor (bimodal, local, or Gselect) will best predict the branching behavior of the given trace. More than one predictor may perform equally well on a particular trace. However, you are to use each of the three predictors exactly once in choosing the best predictors for the three traces. Circle your choice for each of the three traces and add. (Assume each trace is executed many times and every node in the CFG is a conditional branch. The branch history register for the local, global, and Gselect predictors is limited to 4 bits.)

Sol: Gselect

Identical global history at b13 and b15, so the PC is need to differentiate them.

Sol: Local

Identical global history at b1, so global history doesn’t work. The local history of b1 shows it alternates taken and not taken.

Sol: Bimodal

All the branches in this trace have a constant behavior, so bimodal predicts well.

Instruction Flow Techniques:

In any processor, instructions and its flow techniques deals with early stages which includes fetching and decoding stages. Hence, instruction flow techniques are always considered first. The throughput of all subsequent stages is always imposed by throughput of the early pipeline stages. For complementary pipeline processor, the traditional partitioning of a processor into control path and the data path is no longer clear or effective. The early pipeline stages along with the branch execution unit can be viewed as corresponding to the traditional control path whose primary function is to enforce the control flow semantic of a program. The primary goal for all instruction flow techniques is to maximize the supply of instructions of the superscalar pipeline subject to the requirements of the control flow semantics of a program.

Successive high-performance processor generations have thus far relied on higher frequencies and more transistors to improve performance. Unfortunately, using more transistors clocked at higher frequencies requires more power. As a result, power dissipation in modern processors is now quickly approaching alarming levels jeopardizing further advances in performance. Finding ways to control further increases in power is imperative for future generation processors with billion transistors operating at multi-GHz frequencies.

The control flow semantic of a program are specified in the form of the control flow graph. In the control flow graph, the nodes represent basic blocks and the edges represent the transfer of control flow between basic blocks. The directed edges represent control flow between basic blocks. These edges are induced by conditional branch instructions. The run time execution of a program entails the dynamic traversal of the nodes and edges of its control flow graph. The actual path of traversal is dictated by the branch instructions and their branch conditions which can be dependent on run time data.

Next Topic:

Q.5.19 and 5.20: Register Renaming Given the DAXPY kernel shown in Figure 5.31 and the IBM RS/6000 (RIOS-I) floating-point load renaming scheme also discussed in class (both are shown in the following figure), simulate the execution of two iterations of the DAXPY loop and show the state of the floating-point map table, the pending target return queue, and the free list.

• Assume the initial state shown in the table for Problem 5.19.

• Note the table only contains columns for the registers that are referenced in the DAXPY loop.

• As in the RS/6000 implementation discussed, assume only a single load instruction is renamed per cycle and that only a single floating-point instruction can complete per cycle.

• Only floating-point load, multiply, and add instructions are shown in the table, since only these are relevant to the renaming scheme.

• Remember that only load destination registers are renamed.

• The first load from the loop prologue is filled in for you.

Q.5.19: Fill in the remaining rows in the following table with the map table state and pending target return queue state after the instruction is renamed, and the free list state after the instruction completes.

SOLUTION

Previous Topic:

Q.5.7 through Problem 5.13: Consider the following code segment within a loop body for problems 5:

if (x is even) then (branch b1)

increment a (b1 taken)

if (x is a multiple of 10) then (branch b2)

increment b (b2 taken)

Assume that the following list of 9 values of x is to be processed by 9 iterations of

this loop. 8, 9, 10, 11, 12, 20, 29, 30, 31

Note: assume that predictor entries are updated by each dynamic branch before

the next dynamic branch accesses the predictor (i.e., there is no update delay).

Q.5.7: Assume that an one-bit (history bit) state machine (see above) is used as the prediction algorithm for predicting the execution of the two branches in this loop. Indicate the predicted and actual branch directions of the b1 and b2 branch instructions for each iteration of this loop. Assume initial state of 0, i.e., NT, for the predictor.

Q.5.8: What are the prediction accuracies for b1 and b2?

Q.5.9: What is the overall prediction accuracy?

Q.5.10: Assume a two-level branch prediction scheme is used. In addition to the one-bit predictor, a one bit global register (g) is used. Register g stores the direction of the last branch executed (which may not be the same branch as the branch currently being predicted) and is used to index into two separate one-bit branch history tables (BHTs) as shown below. Depending on the value of g, one of the two BHTs is selected and used to do the normal one-bit prediction. Again, fill in the predicted and actual branch directions of b1 and b2 for nine iterations of the loop. Assume the initial value of g = 0, i.e., NT. For each prediction, depending on the current value of g, only one of the two BHTs is accessed and updated. Hence, some of the entries below should be empty.

Note: assume that predictor entries are updated by each dynamic branch before the next dynamic branch accesses the predictor (i.e. there is no update delay).

Q.5.11: What are the prediction accuracies for b1 and b2?

Q.5.12: What is the overall prediction accuracy?

Q.5.13: What is the prediction accuracy of b2 when g = 0? Explain why.

SOLUTION

target IES

Sunday, November 24, 2013

Ex 5.14 Solution : Modern Processor Design by John Paul Shen and Mikko H. Lipasti : Solution Manual

Instruction Flow Techniques:

No comments:

Post a Comment