CPUSim64 Cycle Timing Model

Instructions on a modern RISC processor each take multiple cycles to pass through the CPU's execution stages (this is the instruction's latency). However, because these stages are pipelined—allowing several instructions to be processed simultaneously at different stages—the processor can often complete roughly one instruction per cycle in terms of throughput.

Some instructions break this ideal. Operations such as division are typically not fully pipelined and take many cycles, while events like interrupts incur additional overhead from flushing the pipeline and saving processor state. This table documents the effective number of cycles attributed to each instruction.

Instruction	Form	Cycles
Simple ALU
NOP	—	1
CLEAR	—	1
MOVE	reg-reg	1
COMPL	—	1
AND, OR, XOR	—	1
TEST, CMP	—	1
LSHIFT, RSHIFT, ARSHIFT	—	1
LROTATE, RROTATE	—	1
PACK, PACK64, UNPACK, UNPACK64	—	1
ENDIAN	—	1
READONLY	—	1
Arithmetic
NEGATE	integer	1
NEGATE	FP	3
ADD, SUBTRACT	integer	1
ADD, SUBTRACT	FP	3
MULTIPLY	integer	3
MULTIPLY	FP	3
DIVIDE	integer	12
DIVIDE, RECIP	FP	10
Memory
LOAD	—	2
STORE	—	2
PUSH	—	2
POP	—	2
SAVE	N registers	1 + N
RESTORE	N registers	1 + N
CAS	—	3
Control Flow
JUMP	unconditional	1
JUMP	conditional, taken	2
JUMP	conditional, not taken	1
CALL	unconditional	3
CALL	conditional, taken	4
CALL	conditional, not taken	1
RETURN	—	3
STOP	—	1
I/O & System
IN	—	1
OUT	—	1
INTERRUPT	—	11
INTERRUPT	conditional, not taken	1
DEBUG	—	1

Notes

All cycle counts include the 1-cycle instruction fetch.
SAVE/RESTORE: N = number of registers in the range (e.g., SAVE R0, R28 → N = 29).
INTERRUPT: The 10-cycle overhead represents the trap/return cost. Actual handler execution time is tracked separately as wall-clock “System Time.”
Conditional CALL: Inherits the JUMP penalty (+1 if taken) on top of the 2-write CALL cost, so a conditional taken CALL = 4 cycles.