Description

Rate this product

D1. Suppose the four stages in some 4-stage pipeline take the following timing: 2ns, 3ns, 4ns, and 2ns. Given 1000 instructions, what is the speedup (in two decimal places) of the pipelined processor compared to the non-pipelined single-cycle processor?

D2. Let’s try to understand pipeline processor by doing a detailed trace. Suppose the pipeline registers (also known as pipeline latches) store the following information:

IF / ID	ID / EX	EX / MEM	MEM / WB
	MemToReg	MToR	MToR
	RegWr	RegWr
	MemRd	MemRd
	MemWr
	Branch	MemWr	RegWr
	RegDst
	ALUsrc	Branch
	ALUop
PC+4	PC+4	BrcTgt	MemRes
OpCode		isZero?
Rs	ALUOpr1	ALURes	ALURes
Rt	ALUOpr2	ALUOpr2
Rd	Rt		DstRNum
Funct	Rd	DstRNum
Imm(16)	Imm(32)

Show the progress of the following instructions through the pipeline stages by filling in the content of pipeline registers. Note that these are the same instructions from Tutorial #5 Question 1 so that you can reuse some of the answers here.

0x8df80000 # lw $24, 0($15) #Inst.Addr = 0x100 ii. 0x1023000C # beq $1, $3, 12 #Inst.Addr = 0x100 iii. 0x0285c822 # sub $25, $20, $5 #Inst.Addr = 0x100

Assume that registers 1 to 31 have been initialized to a value that is equal to 101 + its register number. i.e. [$1] = 102, [$31] = 132 etc. You can put “X” in fields that are irrelevant for that instruction. Do note that in reality, these fields are actually generated but not utilized.

Part (i) has been worked out for you.

0x8df80000 # lw $24, 0($15) #Inst.Addr = 0x100

IF / ID		ID / EX		EX / MEM		MEM / WB
		MToR	1	MToR	1	MToR	1
		RegWr	1	RegWr	1
		MemRd	1	MemRd	1
		MemWr	0
		Branch	0	MemWr	0	RegWr	1
		RegDst	0
		ALUsrc	1	Branch	0
		ALUop	00
PC+4	0x104	PC+4	0x104	BrcTgt	X	MemRes	Mem(116)
OpCode	0x23			isZero?	X
Rs	$15	ALUOpr1	116	ALURes	116	ALURes	X
Rt	$24	ALUOpr2	X	ALUOpr2	X
Rd	X	Rt	$24			DstRNum	$24
Funct	X	Rd	X	DstRNum	$24
Imm(16)	0	Imm(32)	0

Tutorial Questions

Given the following three formulas (See Lecture #20, Section 5 Performance):

𝐶𝑇𝑠𝑒𝑞 = ∑ 𝑇𝑘

𝐶𝑇𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒 = max(𝑇𝑘) + 𝑇𝑑

𝐶𝑇_𝑠𝑒𝑞× 𝐼𝑛𝑠𝑡𝑁𝑢𝑚

𝑆𝑝𝑒𝑒𝑑𝑢𝑝_{𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒}=

𝐶𝑇_{𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒}× (𝑁 + 𝐼𝑛𝑠𝑡𝑁𝑢𝑚 − 1)

For each of the following processor parameters, calculate CT_seq, CT_pipeline and

Speedup_pipeline (to two decimal places) for 10 instructions and for 10 million instructions.

	Stages Timing (for 5 stages, in ps)	Latency of pipeline register (in ps)
a.	300, 100, 200, 300, 100 (slow memory)	0
b.	200, 200, 200, 200, 200	40
c.	200, 200, 200, 200, 200 (ideal)	0

[AY2014/5 Semester 2 Exam]

Refer to the following MIPS program:

# register $s0 contains a 32-bit value

# register $s1 contains a non-zero 8-bit value # at the right most (least significant) byte add $t0, $s0, $zero #inst A add $s2, $zero, $zero #inst B lp: bne $s2, $zero, done #inst C beq $t0, $zero, done #inst D andi $t1, $t0, 0xFF #inst E bne $s1, $t1, nt #inst F addi $s2, $s2, 1 #inst G nt: srl $t0, $t0, 8 #inst H j lp #inst J done:

We assume that the register $s0 contains 0xAFAFFAFA and $s1 contains 0xFF.

Given a 5-stage MIPS pipeline processor, for each of the parts below, give the total number of cycles needed for the first iteration of the execution from instructions A to H (i.e. excluding the “j lp” instruction). Remember to include the cycles needed for instruction H to finish the WB stage. Note that the questions are independent from each other.

With only data forwarding mechanisms and no control hazard mechanism.

With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.

[Recall that early branching means branch decision is made at stage 2 (Decode stage); no early branch means branch decision is made at stage 4 (Memory stage).]

By swapping two instructions (from Instructions A to H), we can improve the performance of early branching (with all additional forwarding paths). Give the two instructions that can be swapped. You only need to indicate the instruction letters in your answer.

Give the total number of cycles needed for the execution of the whole code in the worst case for each of the following assumptions. You may assume that the jump instruction (j) computes the address of the instruction to jump to in the MEM stage.

With only data forwarding mechanisms and no control hazard mechanism.
With data forwarding and “assume not taken” branch prediction. Note that there is no early branching.

[AY2017/8 Semester 2 Exam]

Refer to the MIPS code below. A and B are integer arrays whose base addresses are in $s0 and $s1 respectively. The arrays are of the same size n (number of elements). $s2 contains the value n. For this question, we will focus on the code from Instruction 1 onwards.

.data

A: .word 11, 9, 31, 2, 9, 1, 6, 10 B: .word 3, 7, 2, 12, 11, 41, 19, 35 n: .word 8 .text main: la $s0, A # $s0 is the base address of array A la $s1, B # $s1 is the base address of array B la $t0, n # $t0 is the addr of n (size of array)

# $s2 is the content of n beq $s2, $zero, End # Inst1 addi $t8, $s2, -1 # Inst2 sll $t8, $t8, 2 # Inst3 Loop: add $t0, $s0, $t8 # Inst4 add $t1, $s1, $t8 # Inst5 lw $t2, 0($t0) # Inst6 lw $t3, 0($t1) # Inst7 andi $t4, $t3, 3 # Inst8 addi $t4, $t4, -3 # Inst9 beq $t4, $zero, A1 # Inst10 add $t2, $t2, $t3 # Inst11 j A2 # Inst12 A1: addi $t2, $t2, 1 # Inst13 A2: sw $t2, 0($t0) # Inst14 addi $t8, $t8, -8 # Inst15 slt $t7, $t8, $zero # Inst16 beq $t7, $zero, Loop # Inst17 End:

Assuming a 5-stage MIPS pipeline system with forwarding and early branching, that is, the branch decision is made at the ID stage. No branch prediction is made and no delayed branching is used. For the jump (j) instruction, the computation of the target address to jump to is done at the ID stage as well.

Assume also that the first beq instruction begins at cycle 1.

Suppose arrays A and B now each contains 200 positive integers. What is the minimum number and maximum number of instructions executed? (Consider only the above code segment from Inst1 to Inst17.)
List out the instructions where some stall cycle(s) are inserted in executing that instruction in the pipeline. These include delay caused by data dependency and control hazard. You may write the instruction number InstX instead of writing out the instruction in full.
How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 branches to A1? You have to count until the WB stage of Inst17.
How many cycles does one iteration of the loop (from Inst1 to Inst17) take if the beq instruction at Inst10 does not branch to A1? You have to count until the WB stage of Inst17.

10-35qmfr.zip

CS2100-Tutorial 10 Pipelining Solved

If Helpful Share:

Description

Tutorial Questions

Related products

CS2100 Computer Organisation Solved

CS2100-Assignment 2 Solved

CS2100-Tutorial 9 Sequential Circuits Solved