Toe is an instruction set for a 64-bit little-endian CPU. In many respects, it is similar to the 64-bit subset of ARMv8 or AMD64. Compared to those, it is simpler and more regular, and has a higher code density. Its instructions are smaller, but it requires more instructions to express the same code.
I named it "Toe" because one of my previous 64-bit instruction sets was called "Leg", which was a play on "ARM", and because ARM designed an instruction set called "Thumb" which, like "Toe", has 16-bit instructions.
Toe is missing a number of features, including long arithmetic, floating-point arithmetic, vector instructions, and synchronization primitives. These could be added, possibly at the expense or removing something else.
While it would be possible to make an in-order CPU that executes the Toe instruction set, it would not be fast compared to (say) an ARM Cortex A53. There is some scope for a compiler to statically reorder Toe instructions such that neighbouring instructions are independent, but much of the freedom has been removed in order to make the code denser. I will therefore say that Toe is designed on the assumption that the CPU will reorder the instructions to find parallelism and hide latency.
The state of the machine is defined after every 16-bit instruction, so it would be possible to make a machine that literally executes the individual instructions. It would be better, however, to merge some of the simpler instructions into the following instructions so that they don't have to be executed at all. This applies to:
MOV
instructions, which can be handled using register renaming.There are 24 general-purpose registers R0
to R23
, and eight short-term registers S0
to S7
. General purpose registers can be written in any order. In contrast, the short-term registers hold the results of the most recent instructions, i.e. they are used as FIFO storage.
There is a program counter PC
, which holds the address of the current instruction. It is always even.
There are four condition flags: N
, Z
, C
and V
. These are set by AND
, ADD
and SUB
instructions, and are preserved by all other instructions. They are tested by conditional branch instructions.
There is a constant continuation flag Q
, which is cleared by exery instruction except #n...
, which sets it.
Instructions are always 16 bits long, and I will name the bits with hexadecimal digits.
To describe the instructions, I will use some special notation:
x:y
is bits x
to y
(inclusive) of the instruction.z[x:y]
is bits x
to y
(inclusive) of z
.[addr]
is the contents of memory starting at addr
.sxt(x)
is x
sign-extended.uxt(x)
is x
zero-extended.even(x)
is x
or its bitwise inverse, whichever is even.Ss
is a short-term register encoded in 5:7
.Rr
is a general-purpose register encoded in 0:4
.Sr
is a short-term register encoded in 0:2
.RSr
is either Rr
or Sr
; in the latter case 3:4
is 11
.Fr
is a unary operation encoded in 0:2
.w
is a width: b
, h
, s
or d
, meaning 1, 2, 4 or 8 bytes.v
is a restricted width: s
or d
only.The following should be prepended to the meaning of every instruction:
HERE <- PC
PC <- PC + 2
The following should be appended to the meaning of every instruction (RESULT
is the 64-bit value computed by the instruction):
S7 <- S6
S6 <- S5
S5 <- S4
S4 <- S3
S3 <- S2
S2 <- S1
S1 <- S0
S0 <- RESULT
The following should be appended to every instruction except #n...
:
Q <- 0
f
is 0
, the instruction is a dataflow instruction.e:f
is 10
are reserved.d:f
is 110
, the instruction is a constant continuation.c:f
is 1110
, the instruction is a direct jump of some sort.b:f
is 11110
, the instruction is an immediate constant.b:f
is 11111
and 8:a
is not 111
, the instruction is a conditional branch.7:f
is 111111110
, the instruction is an indirect jump of some sort.7:f
is 111111111
, the instruction is a privileged instruction.Dataflow instructions are used to read and write registers, to perform arithmetic, and to access memory. They are distinguished by f
being 0
. The general encoding is as follows:
Bits Name Meaning
0:4 r A general-purpose register number. Used as a source and maybe
as a destination.
5:7 s A short-term register number. Used as a source only.
8:d op Determines which operation to perform.
e:e mode Determines the sources and destinations.
If the bits 3:4
are 11
, then r
does not encode a general-purpose register number, and the instruction changes its meaning. Bits 0:2 are then available to encode a short-term register number, or something else.
There are four modes, distinguished by bit e
and by whether r
encodes a general-purpose register number. All modes perform the same operation; they differ only in their operands and their destination register(s).
e 3:4 Written Meaning
0 not 11 op Ss, Rr RESULT <- Rr; Rr <- op(Ss, Rr)
0 11 op Ss, Sr RESULT <- op(Ss, Sr)
1 not 11 op Rr, Ss RESULT <- op(Rr, Ss)
1 11 op Fr, Ss RESULT <- op(Fr(S0), Ss)
The available op
options are:
8:d Written Meaning of op(x, y) Notes
000000 MOV x
000001 AND x & y
000010 OR x | y
000011 XOR x ^ y
000100 LSR uxt(x >> y)
000101 SEL S0 != 0 ? x : y
000110 RLSR uxt(y >> x)
000111 RSEL S0 != 0 ? y : x
00100v ASR.v sxt(x >> y)
00101v RASR.v sxt(y >> x)
00110v SL.v x << y
00111v RSL.v y << x
01000v MUL.v x * y
01001v ADD.v x + y
01010v SUB.v x - y
01011v RSUB.v y - x
011xxx
1000ww LD.w uxt([x + w * y])
1001ww LS.w sxt([x + w * y])
1010ww AD.w x + w * y
1011ww ST.w x + w side-effect [x] <- y
11xxxx
Instructions ending .v
have a 32-bit and a 64-bit version, written as .s
and .d
resepctively. The 32-bit version treats bit 31 as the sign bit, and clears the upper 32 bits of the result. Some operations, including MOV
, AND
, OR
and XOR
, do not have separate versions.
Instructions ending .w
have 8-bit, 16-bit, 32-bit and 64-bit versions, written as .b
, .h
, .s
and .d
respectively. In their description, w
means 1
, 2
, 4
, or 8
respectively, and [address]
means the bytes from address
to address + w - 1
inclusive. The address calculation is done using 64-bit arithmetic, and RESULT
is 64 bits wide, regardless of w
.
Most asymmetric operations have a counterpart that performs the same operation but with the operands swapped. The two opcodes differ in bit 9
, and in most cases their mnemonic differs by the prefix R
. The ability to swap the operands is useless if they are both short-term registers, but it is useful otherwise.
Some operations, including SEL
, use S0
as an additional operand.
TODO: Define that MUL
sets flags. Similar to ADD
.
ADD.w x y
sets the condition flags as follows:
N
is set to the sign bit of RESULT
.Z
is set if RESULT == 0
C
is set if the addition carries, i.e. if the unsigned value of RESULT
differs from the sum of the unsigned value of x
and the unsigned value of y
.V
is set if the addition overflows, i.e. if the signed value of RESULT
differs from the sum of the signed value of x
and the signed value of y
.SUB.w x y
sets the condition flags as follows:
N
is set to the sign bit of RESULT
.Z
is set if RESULT == 0
C
is clear if the subtraction borrows, i.e. if the unsigned value of RESULT
differs from the difference of the unsigned value of x
and the unsigned value of y
.V
is set if the subtraction overflows, i.e. if the signed value of RESULT
differs from the difference of the signed value of x
and the signed value of y
.AND x y
serves as both a 64-bit and a 32-bit operation. It sets the condition flags as follows:
N
is set to the bit 3f
of RESULT
.Z
is set if RESULT == 0
.C
is set to the bit 1f
of RESULT
.V
is set if RESULT[0:1f] == 0
.All other operations preserve the condition flags. AD.b
might be useful as a version of ADD.d
that does not affect the flags. RSUB.w
similarly does not affect the flags.
When the first operand is S0
(the result of the previous instruction), there is an opportunity to apply a unary function to it before using it. This is indicated by Fr
in the decoding table above. This mechanism is also used to provide immediate constants and PC-relative addressing, and some simple arithmetic operations.
If Q
is clear, then the previous instruction was not a constant continuation, and the available Fr
options are:
0:2 Written Meaning of Fr(S0)
000 #0 0
001 #1 1
010 #2 2
011 #3 3
100 #4 4
101 BIT S0 & 1
110 INC S0 + 1
111 NOT ~S0
If Q
is set, then the value of S0
is from a constant continuation, and the available Fr
options are:
0:2 Written Meaning of Fr(S0)
000 #n logic_immediate(S0)
001 #n S0 << 8
010 #n S0 << 16
011 #n S0 << 24
100 label HERE + even(S0)
101 [label] [HERE + even(S0)]
110 PR_n Performance counter `n`
111 #n ~S0
The logic_immediate()
unary function decodes the bottom 13 bits of S0
, which must come from a #n...
instruction, and constructs a useful bitmask.
The idea is similar to ARMv8's encoding of immediate constants for logic instructions, but differs in the details. Toe can encode a few bitmasks that ARMv8 can't.
S0[0:c] Test Meaning
nyyyyyyxxxxxx y > x (n ? -1 : 0) ^ 0x0000000000000001 * ((1 << y) - (1 << x))
n0yyyyy1xxxxx y > x (n ? -1 : 0) ^ 0x0000000100000001 * ((1 << y) - (1 << x))
n00yyyy11xxxx y > x (n ? -1 : 0) ^ 0x0001000100010001 * ((1 << y) - (1 << x))
000yyyy01xxxx 0x0101010101010101 * ((y << 4) + x)
Note that only 5776 of the 8192 possible 13-bit patterns encode a bitmask, and that all these bitmasks are distinct. Other 12-bit patterns should not be used. 0
and -1
cannot be encoded as a logic immediate.
The available performance counters are as follows:
Name Meaning
PR_0 Clock cycles since reset
PR_1 Instructions retired since reset
PR_2 Nanoseconds since reset
PR_3 A 64-bit random number
Small immediate constants are encoded as part of the instruction that uses them. Larger immediate constants require one or more constant continuation instructions before the instruction that uses them. A constant continuation instruction is distinguished by d:f
being 110
.
It is possible to assemble a constant continuation instruction explicitly using the assembly language syntax #n...
(in which n
is the 13-bit constant). However, it is more usual to allow the assembler to construct the additional instructions. If you write a single assembly language instruction that includes an immediate constant, the assembler will prepend as many #n...
instructions as are necessary to express the constant's value.
You can force the assembler to assemble a specified number of instructions using an assembly language prefix. For example, 3.CALL label
is guaranteed to assemble two #n...
instructions followed by a CALL
(or to report an error). This feature is useful in code that will be relocated and linked, in which the final value of label
might not be known. In order to support this feature, longer #n...
prefixes can encode all the values encodeable by shorter sequences.
d:f Written Meaning
110 #n... RESULT <- uxt(0:c) | (Q ? (S0 << 13) : 0)
Q <- 1
This is the only instruction which sets Q
to 1
. All other instructions set Q
to 0
. Therefore, the Q
flag indicates whether the previous instruction was #n...
. If so, the value it computed is used to supply 13 additional bits to the following instruction. Notably, it can be used to supply the high bits of an immediate constant in the immediately following instruction, but it has other uses too.
A direct jump instruction jumps to a constant program address. It is distinguished by c:f
being 1110
.
b:f Written Meaning
11100 JUMP label RESULT <- PC
PC <- HERE + even(uxt(0:a) | (Q ? (S0 << 11) : 0))
11101 CALL label RESULT <- PC
PC <- HERE + even(uxt(0:a) | (Q ? (S0 << 11) : 0))
The only difference between JUMP
and CALL
is only in the behaviour of the branch predictor. CALL
pushes the return address onto the predictor's return stack, whereas JUMP
doesn't. Both compute a return address, though it is usually useful after a CALL
.
Valid code addresses are all even numbers, so bit 0
is available to indicate a negative jump offset. The even()
function (defined earlier) maps the odd values to even negative values.
When not preceded by a constant continuation, direct jump instructions can jump up to 2K in either direction. Each preceding constant continuation instruction multiplies the range by 8192. With one, the range is 16M, which is large enough to reach anywhere in a static binary. With two, the range is 128G, which is large enough to reach a dynamic library loaded at a random address.
An immediate constant instruction computes an unsigned constant result. It is distinguished by b:f
being 11110
.
b:f Written Meaning
11110 #n RESULT <- uxt(0:a) | (Q ? (S0 << 11) : 0)
There are a few alternative ways of expressing immediate constants for use in dataflow instructions which might be shorter or otherwise preferable (see "Unary operations"):
0
and 1
are encodeable as part of arithmetic instructions.When not preceded by a constant continuation, an immediate constant instruction can express an 11-bit unsigned number. Each preceding constant continuation instruction adds 13 more bits. Using more than two constant continuation instructions, though possible, is in most situations inferior to using a PC-relative load.
Conditional branch instructions are used to implement "if". They test a condition code: a boolean function of the N
, Z
, C
and V
flags. If the test passes, the instruction does a direct jump.
7:f Written Meaning
11111ccc0 B.cond label if cond(N, Z, C, V) {
RESULT <- -1
PC <- HERE + even(uxt(0:6) | (Q ? (S0 << 7) : 0))
} else {
RESULT <- 0
}
11111ccc1 B.cond label if !cond(N, Z, C, V) {
RESULT <- -1
PC <- HERE + even(uxt(0:6) | (Q ? (S0 << 7) : 0))
} else {
RESULT <- 0
}
There are only seven condition codes. If 8:a
is 111
then the instruction is not a conditional branch instruction.
The negated form of B.cond
is distinguished by setting bit 7
. In assembly language, the negated form is indicated by changing the name of the condition code.
Note that a conditional branch to the immediately following instruction can be useful for computing a boolean value.
When not preceded by a constant continuation, conditional branch instructions can jump up to 128 bytes in either direction: a factor of 16 less than a direct jump. Each preceding constant continuation instruction multiplies the range by 8192. With one, the range is 1M, and with two, the range is 8G.
The available condition codes are similar to the ones that ARM uses. Each has two spellings in assembly language, one of which indicates that the boolean should be inverted:
8:a Written Inverted Meaning Interpretation after SUB x y
000 ZS or EQ ZC or NE Z x == y
001 CS or HS CC or LO C x >= y (unsigned)
010 NS or MI NC or PL N (x - y) < 0
011 VS VC V x - y overflows (signed)
100 HI LS C & !Z x > y (unsigned)
101 GE LT N == V x >= y (signed)
110 GT LE N == V & !Z x > y (signed)
111 Not a condition code
The formal meaning of the condition codes is in the "meaning" column. It is defined in terms of the condition flags. The section "Setting the condition flags" below explains how the condition flags are set by various arithmetic instructions. Combining the two, you can implement a variety of useful tests. The "interpretation" column gives some examples: the tests that can be achieved by preceding each condition code with a SUB
instruction.
An indirect jump instruction jumps to a computed code address. It is distinguished by 7:f
being 111111110
.
5:f 0:4 Written Meaning
11111111000 RSr JUMP RSr RESULT <- PC; PC <- RSr
11111111001 RSr CALL RSr RESULT <- PC; PC <- RSr
11111111010 RSr RET RSr RESULT <- PC; PC <- RSr
11111111011 n SWI #n Enter supervisor mode.
RESULT <- PC; PC <- [0]
The difference between RET
, JUMP
and CALL
is only in the behaviour of the branch predictor. RET
pops the return address from the predictor's stack, CALL
pushes the return address onto the predictor's return stack, and JUMP
doesn't use the stack.
SWI
generates an interrupt. This can be used for a number of purposes, but the most common is to perform a system call.
5:f 0:4 Written Meaning
11111111100 RSr MMAP Map physical page RSr at virtual page 0
TODO.
SEL
with S.cond
?b
(?) to swap operands, not bit 9
.