10. 파이프라인 (3)

Pipeline Performace

-> 100ps for register read or write (ID, WB)

-> 200ps for other stages (IF, EX, MEM)

Instruction	IF	ID	EX	MEM	WB	Total time
lw	200ps	100ps	200ps	200ps	100ps	800ps
sw	200ps	100ps	200ps	200ps		700ps
R-format	200ps	100ps	200ps		100ps	600ps
beq	200ps	100ps	200ps			500ps

Pipeline SpeedUp

만약 모든 stage의 시간이 똑같다라면

-> Time between instrunctions(pipelined) = Time between instructions(non-pipelined) / Number of stages

똑같지 않다면(MIPS)

-> 가장 긴 stage의 시간으로 clock period 결정 (MIPS의 경우 Critical path = lw)

Speedup은 처리량을 증폭 시킨다 (지연시간은 감소되지 않는다)

MIPS가 pipelining에 최적화인 이유

1) 모든 instruction이 32bit이다

-> fetch decode를 one cycle안에 하기 쉬움

2) instuction format이 적다

-> decode, read register를 한 단계에 가능

3) Load/Store addressing

-> 3번째 단계에서 address calculate , 4번째 단계에서 access memory

4) Memory operands Alignment

-> Memory access 는 only one cycle

Branch Prediction

- pipeline이 길어질수록 branch outcome을 예측하기 어려워짐

- 예측이 틀렸을 경우에만 stall

=> 간단한 해결책 : 무조건 branch가 no taken되었다고 가정하고 PC + 4로 지정 , 만약 예측이 틀렷을 경우(branch가 taken이 된 경우)에는 pipeline에서 실행중이던 instruction 없애고 PC값을 새로 지정해줌

실용적인 해결책

1) Static branch prediction

-> 전형적인 branch behavior에 기반

-> loop와 if문 branch

-> branch가 taken 되면 Predict backward

-> branch가 not taken되면 Predict forward

2) Dynamic branch prediction

-> h/w가 actual branch behavior를 측정 (각 브랜치의 최근 상태를 기록)

-> 경향에 따라 미래 상태를 예측 (만약 틀리면 re-fetching , update history 할동안 stall)

default (No forwarding) : ForwardA = 00

EX hazard

=> EX/MEM.RegWrite and EX/MEM.RegisterRD != 0 and (EX/MEM.RegisterRd = ID/EX.RegisterRs) : ForwardA = 10

첫번째 ALU operand가 이전의 ALU result로부터 forwarded

MEM hazard

=> MEM/WB.RegWrite and MEM/WB.RegisterRd != 0 and (MEM/WB.RegisterRd = ID/EX.RegisterRs) : ForwardA = 01

두번째 ALU operand가 data memory나 이전 ALU result로부터 forwarded

Double Data hazard

-> 가장 최근의 것을 사용하고자 하는 경우에 발생 가능함

Revised Register Forwarding 조건

EX hazard

=> EX/MEM.RegWrite and EX/MEM.RegisterRd !=0 and (EX/MEM.RegisterRd = ID/EX.RegisterRs) : ForwardA = 10

MEM hazard

=> MEM/WB.RegWrite and MEM/WB.RegisterRd != 0 and not(EX hazard 발생 조건) and (MEM/WB.RegisterRd = ID/EX.RegisterRs) : ForwardA = 01

Data Hazard for Branches

1) 만약 비교하는 register가 ALU 연산 결과의 목적지 register 이거나 2번째 이전의 load된 명령어의 register 일 경우

beq 이전에 1 stall cycle

2) 만약 비교하는 register가 바로 이전에 load된 명령어의 register 인 경우

beq 이전에 2 stall cycle

Add Bubble

1) EX, MEM, WB 단계는 아무 작업도 하지 않음 : 모든 control signal = 0

2) ID 단계를 1 more cycle : IF/ID Write Signal = 0 (IF/ID pipeline register 변경 x)

3) IF 단계를 1 more cycle : PCWrite Signal = 0 (PC 변경 x)

1비트 분기 예측 : NT(not taken) , T(taken)

2비트 분기 예측 : NT(00) : strongly predict not taken , NT(01) : weakly predict not taken , T(10) : weakly predict taken, T(11) : strongly predict taken

Branch prediction buffer

=> sizeof(instruction memory) >> sizeof(prediction buffer)

=> multiple instructions 들이 single entry로 사상됨

=> Aliasing 발생

Branch Target 계산

=> predictor로 Branch를 예측하였다 하더라도, target address를 계산하여야 함

=> taken branch에 1-cycle penalty

Branch Target buffer

=> target address의 Cache

=> instruction이 fetch 될때 PC에 의해 인덱스

ILP (Instruction - Level Parallelism)

pipelining : 병렬적으로 다수의 instruction들을 수행

ILP를 늘리기 위해서는

1) Deeper Pipeline : 각 단계 당 작업량 감소 => clock cycle 짧아짐

2) Multiple Issue

=> clock cycle 당 multiple instructions

=> pipeline 단계를 복사

=> CPI < 1, 그래서 Instructions Per Cycle(IPC) 사용

Static Dual Issue MIPS

2개의 issue packets

1) One ALU/branch instruction

2) One load/store instruction

64-bit 정렬됨

=> ALU branch 후 load/store

=> 안쓰는 instruction은 nop

Multiple Issue

1) Static multiple issue

- Compiler가 instruction들이 같이 issued 되게 묶음

- 묶음이 issue slot에 담김

- Compiler가 hazard를 발견하고 회피함

2) Dynamic multiple issue

- CPU가 instruction stream을 조사하여 각 cycle에 issue할 instruction을 선택

- Compiler는 reordering instruction을 통해 지원함

- CPU가 실행 시간에 향상된 기술로 hazard 해결

CPU 종류

1)cycle당 issued 되는 instructions의 수

Single issue

Multiple issue - Static

- Dynamic (superscalar)

2) Instruction execution scheduling

Static scheduling (in-order)

Dynamic scheduling (out-of-order)

728x90

저작자표시 비영리 변경금지 (새창열림)

'CS(Computer Science) > 컴퓨터구조' 카테고리의 다른 글

11. 메모리 구조 (2) (0)	2023.06.04
11. 메모리 구조(1) (0)	2023.06.04
10. 파이프라인 (2) (0)	2023.06.04
10. 파이프라인 (1) (0)	2023.06.04
9. 프로세서 (3) (0)	2023.06.04

동욱이의코딩일지

10. 파이프라인 (3)

'CS(Computer Science) > 컴퓨터구조' 카테고리의 다른 글

티스토리툴바

10. 파이프라인 (3)

'CS(Computer Science) > 컴퓨터구조' 카테고리의 다른 글

관련글

티스토리툴바