CS2100-Tutorial 10 Pipelining Solved

30.00 $

Category:

Description

Rate this product

D1. Suppose the four stages in some 4-stage pipeline take the following timing: 2ns, 3ns, 4ns, and 2ns. Given 1000 instructions, what is the speedup (in two decimal places) of the pipelined processor compared to the non-pipelined single-cycle processor?

 

D2. Let’s try to understand pipeline processor by doing a detailed trace. Suppose the pipeline registers (also known as pipeline latches) store the following information:

IF / ID  

 

 

 

 

 

 

 

 

     

 

ID / EX  

 

 

 

 

 

 

 

 

     

 

EX / MEM  

 

 

 

 

 

 

 

 

     

 

MEM / WB
MemToReg   MToR   MToR  
RegWr   RegWr  
MemRd   MemRd  
MemWr  
Branch   MemWr   RegWr  
RegDst  
ALUsrc   Branch  
ALUop  
PC+4   PC+4   BrcTgt   MemRes  
OpCode   isZero?  
Rs   ALUOpr1   ALURes   ALURes  
Rt   ALUOpr2   ALUOpr2  
Rd   Rt   DstRNum  
Funct   Rd   DstRNum  
Imm(16)   Imm(32)  

 

Show the progress of the following instructions through the pipeline stages by filling in the content of pipeline registers. Note that these are the same instructions from Tutorial #5 Question 1 so that you can reuse some of the answers here.

  1.             0x8df80000 # lw $24, 0($15) #Inst.Addr = 0x100  ii. 0x1023000C # beq $1, $3, 12 #Inst.Addr = 0x100       iii.  0x0285c822 # sub $25, $20, $5 #Inst.Addr = 0x100

Assume that registers 1 to 31 have been initialized to a value that is equal to 101 + its register number. i.e. [$1] = 102, [$31] = 132 etc. You can put “X” in fields that are irrelevant for that instruction. Do note that in reality, these fields are actually generated but not utilized.

Part (i) has been worked out for you.

  1.                0x8df80000 # lw $24, 0($15)   #Inst.Addr = 0x100
IF / ID  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ID / EX  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

EX / MEM  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MEM / WB
MToR 1 MToR 1 MToR 1
RegWr 1 RegWr 1
MemRd 1 MemRd 1
MemWr 0
Branch 0 MemWr 0 RegWr 1
RegDst 0
ALUsrc 1 Branch 0
ALUop 00
PC+4 0x104 PC+4 0x104 BrcTgt X MemRes Mem(116)
OpCode 0x23 isZero? X
Rs $15 ALUOpr1 116 ALURes 116 ALURes X
Rt $24 ALUOpr2 X ALUOpr2 X
Rd X Rt $24 DstRNum $24
Funct X Rd X DstRNum $24
Imm(16) 0 Imm(32) 0

 

Tutorial Questions

  1. Given the following three formulas (See Lecture #20, Section 5 Performance):

𝑁

𝐶𝑇𝑠𝑒𝑞 = ∑         𝑇𝑘

𝑘=1

 

𝐶𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 = max(𝑇𝑘) + 𝑇𝑑

 

𝐶𝑇𝑠𝑒𝑞 × 𝐼𝑛𝑠𝑡𝑁𝑢𝑚

𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 =

𝐶𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 × (𝑁 + 𝐼𝑛𝑠𝑡𝑁𝑢𝑚 − 1)

 

      For each of the following processor parameters, calculate CTseq, CTpipeline and

Speeduppipeline (to two decimal places) for 10 instructions and for 10 million instructions.

  Stages Timing (for 5 stages, in ps) Latency of pipeline register (in ps)
a. 300, 100, 200, 300, 100  (slow memory) 0
b. 200, 200, 200, 200, 200 40
c. 200, 200, 200, 200, 200  (ideal) 0

 

  1. [AY2014/5 Semester 2 Exam]

Refer to the following MIPS program:

# register $s0 contains a 32-bit value

# register $s1 contains a non-zero 8-bit value      #       at the right most (least significant) byte      add  $t0, $s0, $zero     #inst A      add  $s2, $zero, $zero   #inst B  lp:  bne  $s2, $zero, done    #inst C       beq  $t0, $zero, done    #inst D      andi $t1, $t0, 0xFF      #inst E       bne  $s1, $t1, nt        #inst F       addi $s2, $s2, 1         #inst G  nt:  srl  $t0, $t0, 8         #inst H        j    lp                  #inst J  done:

 

We assume that the register $s0 contains 0xAFAFFAFA and $s1 contains 0xFF.

Given a 5-stage MIPS pipeline processor, for each of the parts below, give the total number of cycles needed for the first iteration of the execution from instructions A to H (i.e. excluding the “j lp” instruction). Remember to include the cycles needed for instruction H to finish the WB stage. Note that the questions are independent from each other.

  1. With only data forwarding mechanisms and no control hazard mechanism.

 

  1. With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.

[Recall that early branching means branch decision is made at stage 2 (Decode stage); no early branch means branch decision is made at stage 4 (Memory stage).]

 

  1. By swapping two instructions (from Instructions A to H), we can improve the performance of early branching (with all additional forwarding paths). Give the two instructions that can be swapped. You only need to indicate the instruction letters in your answer.

 

Give the total number of cycles needed for the execution of the whole code in the worst case for each of the following assumptions. You may assume that the jump instruction (j) computes the address of the instruction to jump to in the MEM stage.

  1. With only data forwarding mechanisms and no control hazard mechanism.
  2. With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.

 

  1. [AY2017/8 Semester 2 Exam]

Refer to the MIPS code below. A and B are integer arrays whose base addresses are in $s0 and $s1 respectively. The arrays are of the same size n (number of elements). $s2 contains the value n. For this question, we will focus on the code from Instruction 1 onwards.

 .data 

A: .word 11, 9, 31, 2, 9, 1, 6, 10  B: .word 3, 7, 2, 12, 11, 41, 19, 35 n: .word 8 .text main: la   $s0, A     # $s0 is the base address of array A       la   $s1, B     # $s1 is the base address of array B       la   $t0, n     # $t0 is the addr of n (size of array)

                      # $s2 is the content of n        beq  $s2, $zero, End   # Inst1       addi $t8, $s2, -1      # Inst2       sll  $t8, $t8, 2       # Inst3 Loop: add  $t0, $s0, $t8     # Inst4       add  $t1, $s1, $t8     # Inst5       lw   $t2, 0($t0)       # Inst6       lw   $t3, 0($t1)       # Inst7       andi $t4, $t3, 3       # Inst8       addi $t4, $t4, -3      # Inst9       beq  $t4, $zero, A1    # Inst10       add  $t2, $t2, $t3     # Inst11       j    A2                # Inst12 A1:   addi $t2, $t2, 1       # Inst13 A2:   sw   $t2, 0($t0)       # Inst14       addi $t8, $t8, -8      # Inst15       slt  $t7, $t8, $zero   # Inst16       beq  $t7, $zero, Loop  # Inst17 End:

 

Assuming a 5-stage MIPS pipeline system with forwarding and early branching, that is, the branch decision is made at the ID stage. No branch prediction is made and no delayed branching is used. For the jump (j) instruction, the computation of the target address to jump to is done at the ID stage as well.

Assume also that the first beq instruction begins at cycle 1.

        

 

  1. Suppose arrays A and B now each contains 200 positive integers. What is the minimum number and maximum number of instructions executed? (Consider only the above code segment from Inst1 to Inst17.)
  2. List out the instructions where some stall cycle(s) are inserted in executing that instruction in the pipeline. These include delay caused by data dependency and control hazard. You may write the instruction number InstX instead of writing out the instruction in full.
  3. How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 branches to A1? You have to count until the WB stage of Inst17.
  4. How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 does not branch to A1? You have to count until the WB stage of Inst17.
  • 10-35qmfr.zip