chsgcxy
O3CPU Features Overview
O3CPU Block Diagram
O3CPU Code Base Mechainsam
O3CPU Fetch Stage
O3CPU Decode Stage
O3CPU Rename Stage
O3CPU IEW Stage
O3CPU Commit Stage
An out of order CPU model loosely based on the Alpha 21264
Configurable BranchPredictor Type
BiModeBP
LTAGE
LocalBP
MultiperspectivePerceptron64KB
MultiperspectivePerceptron8KB
MultiperspectivePerceptronTAGE64KB
MultiperspectivePerceptronTAGE8KB
TAGE
TAGE_SC_L_64KB
TAGE_SC_L_8KB
TournamentBP
Configurable Buffer and Queue size
Support multi thread
Configurable Width
FetchWidth
DecodeWidth
RenameWidth
DispatchWidth
IssueWidth
WriteBackWidth
CommitWidth
Configurable Hardware Prefetcher
AMPMPrefetcher
BOPPrefetcher
DCPTPrefetcher
IndirectMemoryPrefetcher
IrregularStreamBufferPrefetcher
MultiPrefetcher
PIFPrefetcher
SBOOEPrefetcher
STeMSPrefetcher
SignaturePathPrefetcher
SignaturePathPrefetcherV2
SlimAMPMPrefetcher
StridePrefetcher
TaggedPrefetcher
Configurable Pipeline Stage Delay
Configurable FUs
Latency
Count
IF support pipelined
PhysRegFile
I-Cache
I-TLB
Fetch Buffer
Branch Prediction
Fetch Width
Decode Width
Rename Width
Issue
Width
Wb
Width
Commit
Width
D-TLB
D-Cache
WakeUp
DependGraph
MemDepUnit
StoreQueue
InstQueue
LoadQueue
ROB
Instruction Fetch
Commit
WriteBack
Rename
Decode
IntALU x6 |
IntMultDiv x2 |
FpALU x4 |
FpMultDiv x2 |
SimdUnit x4 |
RdWrPort x4 |
Dispatch Width
Fetch Queue
Configurable Width
Configurable Fus
Configurable Pipeline Latency
Void CPU::tick()
{
src/cpu/o3/cpu.cc
TimeBuffer decodeToRenameDelay = 2
fetch.tick(); decode.tick(); rename.tick(); iew.tick(); commit.tick(); |
timeBuffer.advance(); |
fetchQueue.advance(); decodeQueue.advance(); renameQueue.advance(); iewQueue.advance(); |
} |
rename
Wire(-2)
decode
decode
Wire(0)
Wire(0)
rename
Wire(-2)
{
{
{
{
{
{
{
Size=2; Size=3; Size=4; Size=0; Size=0; Size=0; Size=0;
Insts=[ ] Insts=[ ] Insts=[ ] Insts=[ ] Insts=[ ] Insts=[ ] Insts=[ ]
} } } } } } }
clear
clear
Base=0 Base=1 Base=2
Advance() Advance()
Use a lot of std container
TimeBuffer responsible for key pipeline control
State machine for pipeline control
Some extra processing to ensure the pipeline is correct
int vector_index = idx + base;
if (vector_index >= (int)size) { vector_index -= size;
} else if (vector_index < 0) { vector_index += size;
}
Cycle += 2
No
Squash?
Yes
Update
Clear req
Clear req
Update
Yes
PA
FetchBuffer
BlockPC
Fetch
Addr
squash
+
Decode Stage
PA update
Yes
Stall
request
If (taken)
stop
invalid
invalid
fetchWidth
One cache line max
FetchQueue
FetchBuffer
macroOp
!=
FetchBufferPC
FetchOffset
Squash?
Align
MMU
PC
ICache
BranchPredictor
Commit Stage
Yes
squash
squash
Decode width
decodeStatus[tid] == Unblocking
IsDirectControl && IsUncondControl && mispred
Fetch Stage
Insts
DecodeWidth
SkidBuffer
clear
stall
clear
Decode Logic
squash
squash
block
Rename Stage
Insts
Commit
Stage
squash
valid | valid | … | valid |
ScoreBoard
All phy Regs
PrePhyId
CurPhyId
ArchRegId
Seq
…
HistoryBuffer(RAT checkpoint)
Seq
ArchRegId
…
ArchRegId
IntReg / FloatReg
/ VecReg /
……
Seq
squash
RenameStatus == Unblocking
IsDirectControl && IsUncondControl && mispred
Dest mark
Src ready ?
CurPhyId PrePhyId
… CurPhyId
… PrePhyId
list
IEW
Stage
Decode Stage
Insts
SkidBuffer
Rename Logic
RenameWidth
clear
serializeBefore
squash
clear
squash
block
Freelist
NumFreeRegs < NumDestReg
RenameMap(RAT)
Queue info
Phy Reg ID | Phy Reg ID | … | Phy Reg ID |
Commit Stage
queue
IntReg / FloatReg
/ VecReg /
……
IntReg / FloatReg
Arch Reg Index
Phy Reg ID |
Phy Reg ID |
Phy Reg ID |
Phy Reg ID |
Phy Reg ID |
/ VecReg /
……
ScoreBoard
PhyRegFiles
MemDependUnit
L1D
DTLB
hidden
MemDepencePred
InstsToExecute
Rename Stage
LoadQueue
Insts
writeback
WB
clear
SkidBuffer
clear
StoreQueue
InstructionQueue
Opclass
Latency ispiplined
FUPool
Issue width
width
IEW
Queue
Commit Width
squash
block
listorder Age first
Opclass=IntAlu oldest=1
Opclass=FloatDiv oldest=2
Opclass=IntMul oldest=6
Opclass=IntDiv oldest=8
Commit Stage
DependGraph
ReadyInst
inst5
inst21
Opclass=IntAlu
Age=1
Opclass=IntAlu
Age=3
Opclass=IntAlu
Age=4
inst7
Opclass=FloatDiv
Age=2
Opclass=FloatDiv
Age=5
Opclass=IntMul Age=6
Opclass=IntDiv
Age=11
Opclass=IntDiv
Age=8
inst11
inst2 x inst3
…..
inst4
Opclass
PhyRegIndex
Age priority
Wakeup Release a Chain
Todo…
Thanks