Instructions on a modern RISC processor each take multiple cycles to pass through the CPU's execution stages (this is the instruction's latency). However, because these stages are pipelined—allowing several instructions to be processed simultaneously at different stages—the processor can often complete roughly one instruction per cycle in terms of throughput.
Some instructions break this ideal. Operations such as division are typically not fully pipelined and take many cycles, while events like interrupts incur additional overhead from flushing the pipeline and saving processor state. This table documents the effective number of cycles attributed to each instruction.
| Instruction | Form | Cycles |
|---|---|---|
| Simple ALU | ||
| NOP | — | 1 |
| CLEAR | — | 1 |
| MOVE | reg-reg | 1 |
| COMPL | — | 1 |
| AND, OR, XOR | — | 1 |
| TEST, CMP | — | 1 |
| LSHIFT, RSHIFT, ARSHIFT | — | 1 |
| LROTATE, RROTATE | — | 1 |
| PACK, PACK64, UNPACK, UNPACK64 | — | 1 |
| ENDIAN | — | 1 |
| READONLY | — | 1 |
| Arithmetic | ||
| NEGATE | integer | 1 |
| NEGATE | FP | 3 |
| ADD, SUBTRACT | integer | 1 |
| ADD, SUBTRACT | FP | 3 |
| MULTIPLY | integer | 3 |
| MULTIPLY | FP | 3 |
| DIVIDE | integer | 12 |
| DIVIDE, RECIP | FP | 10 |
| Memory | ||
| LOAD | — | 2 |
| STORE | — | 2 |
| PUSH | — | 2 |
| POP | — | 2 |
| SAVE | N registers | 1 + N |
| RESTORE | N registers | 1 + N |
| CAS | — | 3 |
| Control Flow | ||
| JUMP | unconditional | 1 |
| JUMP | conditional, taken | 2 |
| JUMP | conditional, not taken | 1 |
| CALL | unconditional | 3 |
| CALL | conditional, taken | 4 |
| CALL | conditional, not taken | 1 |
| RETURN | — | 3 |
| STOP | — | 1 |
| I/O & System | ||
| IN | — | 1 |
| OUT | — | 1 |
| INTERRUPT | — | 11 |
| INTERRUPT | conditional, not taken | 1 |
| DEBUG | — | 1 |
SAVE R0, R28 → N = 29).