Toe is an instruction set for a 64-bit little-endian CPU. In many respects, it is similar to the 64-bit subset of ARMv8 or AMD64. Compared to those, it is simpler and more regular, and has a higher code density. Its instructions are smaller, but it requires more instructions to express the same code.

I named it "Toe" because one of my previous 64-bit instruction sets was called "Leg", which was a play on "ARM", and because ARM designed an instruction set called "Thumb" which, like "Toe", has 16-bit instructions.

Toe is missing a number of features, including long arithmetic, floating-point arithmetic, vector instructions, and synchronization primitives. These could be added, possibly at the expense or removing something else.

While it would be possible to make an in-order CPU that executes the Toe instruction set, it would not be fast compared to (say) an ARM Cortex A53. There is some scope for a compiler to statically reorder Toe instructions such that neighbouring instructions are independent, but much of the freedom has been removed in order to make the code denser. I will therefore say that Toe is designed on the assumption that the CPU will reorder the instructions to find parallelism and hide latency.

The state of the machine is defined after every 16-bit instruction, so it would be possible to make a machine that literally executes the individual instructions. It would be better, however, to merge some of the simpler instructions into the following instructions so that they don't have to be executed at all. This applies to:

- Immediate constants, which can be inserted into the subsequent instruction that uses them.
`MOV`

instructions, which can be handled using register renaming.- Conditional and unconditional jumps to the immediately following instruction.

There are 24 general-purpose registers `R0`

to `R23`

, and eight short-term registers `S0`

to `S7`

. General purpose registers can be written in any order. In contrast, the short-term registers hold the results of the most recent instructions, i.e. they are used as FIFO storage.

There is a program counter `PC`

, which holds the address of the current instruction. It is always even.

There are four condition flags: `N`

, `Z`

, `C`

and `V`

. These are set by `AND`

, `ADD`

and `SUB`

instructions, and are preserved by all other instructions. They are tested by conditional branch instructions.

There is a constant continuation flag `Q`

, which is cleared by exery instruction except `#n...`

, which sets it.

Instructions are always 16 bits long, and I will name the bits with hexadecimal digits.

To describe the instructions, I will use some special notation:

`x:y`

is bits`x`

to`y`

(inclusive) of the instruction.`z[x:y]`

is bits`x`

to`y`

(inclusive) of`z`

.`[addr]`

is the contents of memory starting at`addr`

.`sxt(x)`

is`x`

sign-extended.`uxt(x)`

is`x`

zero-extended.`even(x)`

is`x`

or its bitwise inverse, whichever is even.`Ss`

is a short-term register encoded in`5:7`

.`Rr`

is a general-purpose register encoded in`0:4`

.`Sr`

is a short-term register encoded in`0:2`

.`RSr`

is either`Rr`

or`Sr`

; in the latter case`3:4`

is`11`

.`Fr`

is a unary operation encoded in`0:2`

.`w`

is a width:`b`

,`h`

,`s`

or`d`

, meaning 1, 2, 4 or 8 bytes.`v`

is a restricted width:`s`

or`d`

only.

The following should be prepended to the meaning of every instruction:

```
HERE <- PC
PC <- PC + 2
```

The following should be appended to the meaning of every instruction (`RESULT`

is the 64-bit value computed by the instruction):

```
S7 <- S6
S6 <- S5
S5 <- S4
S4 <- S3
S3 <- S2
S2 <- S1
S1 <- S0
S0 <- RESULT
```

The following should be appended to every instruction except `#n...`

:

```
Q <- 0
```

- If
`f`

is`0`

, the instruction is a dataflow instruction. - Instructions where
`e:f`

is`10`

are reserved. - If
`d:f`

is`110`

, the instruction is a constant continuation. - If
`c:f`

is`1110`

, the instruction is a direct jump of some sort. - If
`b:f`

is`11110`

, the instruction is an immediate constant. - If
`b:f`

is`11111`

and`8:a`

is not`111`

, the instruction is a conditional branch. - If
`7:f`

is`111111110`

, the instruction is an indirect jump of some sort. - If
`7:f`

is`111111111`

, the instruction is a privileged instruction.

Dataflow instructions are used to read and write registers, to perform arithmetic, and to access memory. They are distinguished by `f`

being `0`

. The general encoding is as follows:

```
Bits Name Meaning
0:4 r A general-purpose register number. Used as a source and maybe
as a destination.
5:7 s A short-term register number. Used as a source only.
8:d op Determines which operation to perform.
e:e mode Determines the sources and destinations.
```

If the bits `3:4`

are `11`

, then `r`

does not encode a general-purpose register number, and the instruction changes its meaning. Bits 0:2 are then available to encode a short-term register number, or something else.

There are four modes, distinguished by bit `e`

and by whether `r`

encodes a general-purpose register number. All modes perform the same operation; they differ only in their operands and their destination register(s).

```
e 3:4 Written Meaning
0 not 11 op Ss, Rr RESULT <- Rr; Rr <- op(Ss, Rr)
0 11 op Ss, Sr RESULT <- op(Ss, Sr)
1 not 11 op Rr, Ss RESULT <- op(Rr, Ss)
1 11 op Fr, Ss RESULT <- op(Fr(S0), Ss)
```

The available `op`

options are:

```
8:d Written Meaning of op(x, y) Notes
000000 MOV x
000001 AND x & y
000010 OR x | y
000011 XOR x ^ y
000100 LSR uxt(x >> y)
000101 SEL S0 != 0 ? x : y
000110 RLSR uxt(y >> x)
000111 RSEL S0 != 0 ? y : x
00100v ASR.v sxt(x >> y)
00101v RASR.v sxt(y >> x)
00110v SL.v x << y
00111v RSL.v y << x
01000v MUL.v x * y
01001v ADD.v x + y
01010v SUB.v x - y
01011v RSUB.v y - x
011xxx
1000ww LD.w uxt([x + w * y])
1001ww LS.w sxt([x + w * y])
1010ww AD.w x + w * y
1011ww ST.w x + w side-effect [x] <- y
11xxxx
```

Instructions ending `.v`

have a 32-bit and a 64-bit version, written as `.s`

and `.d`

resepctively. The 32-bit version treats bit 31 as the sign bit, and clears the upper 32 bits of the result. Some operations, including `MOV`

, `AND`

, `OR`

and `XOR`

, do not have separate versions.

Instructions ending `.w`

have 8-bit, 16-bit, 32-bit and 64-bit versions, written as `.b`

, `.h`

, `.s`

and `.d`

respectively. In their description, `w`

means `1`

, `2`

, `4`

, or `8`

respectively, and `[address]`

means the bytes from `address`

to `address + w - 1`

inclusive. The address calculation is done using 64-bit arithmetic, and `RESULT`

is 64 bits wide, regardless of `w`

.

Most asymmetric operations have a counterpart that performs the same operation but with the operands swapped. The two opcodes differ in bit `9`

, and in most cases their mnemonic differs by the prefix `R`

. The ability to swap the operands is useless if they are both short-term registers, but it is useful otherwise.

Some operations, including `SEL`

, use `S0`

as an additional operand.

TODO: Define that `MUL`

sets flags. Similar to `ADD`

.

`ADD.w x y`

sets the condition flags as follows:

`N`

is set to the sign bit of`RESULT`

.`Z`

is set if`RESULT == 0`

`C`

is set if the addition carries, i.e. if the unsigned value of`RESULT`

differs from the sum of the unsigned value of`x`

and the unsigned value of`y`

.`V`

is set if the addition overflows, i.e. if the signed value of`RESULT`

differs from the sum of the signed value of`x`

and the signed value of`y`

.

`SUB.w x y`

sets the condition flags as follows:

`N`

is set to the sign bit of`RESULT`

.`Z`

is set if`RESULT == 0`

`C`

is*clear*if the subtraction borrows, i.e. if the unsigned value of`RESULT`

differs from the difference of the unsigned value of`x`

and the unsigned value of`y`

.`V`

is set if the subtraction overflows, i.e. if the signed value of`RESULT`

differs from the difference of the signed value of`x`

and the signed value of`y`

.

`AND x y`

serves as both a 64-bit and a 32-bit operation. It sets the condition flags as follows:

`N`

is set to the bit`3f`

of`RESULT`

.`Z`

is set if`RESULT == 0`

.`C`

is set to the bit`1f`

of`RESULT`

.`V`

is set if`RESULT[0:1f] == 0`

.

All other operations preserve the condition flags. `AD.b`

might be useful as a version of `ADD.d`

that does not affect the flags. `RSUB.w`

similarly does not affect the flags.

When the first operand is `S0`

(the result of the previous instruction), there is an opportunity to apply a unary function to it before using it. This is indicated by `Fr`

in the decoding table above. This mechanism is also used to provide immediate constants and PC-relative addressing, and some simple arithmetic operations.

If `Q`

is clear, then the previous instruction was not a constant continuation, and the available `Fr`

options are:

```
0:2 Written Meaning of Fr(S0)
000 #0 0
001 #1 1
010 #2 2
011 #3 3
100 #4 4
101 BIT S0 & 1
110 INC S0 + 1
111 NOT ~S0
```

If `Q`

is set, then the value of `S0`

is from a constant continuation, and the available `Fr`

options are:

```
0:2 Written Meaning of Fr(S0)
000 #n logic_immediate(S0)
001 #n S0 << 8
010 #n S0 << 16
011 #n S0 << 24
100 label HERE + even(S0)
101 [label] [HERE + even(S0)]
110 PR_n Performance counter `n`
111 #n ~S0
```

The `logic_immediate()`

unary function decodes the bottom 13 bits of `S0`

, which must come from a `#n...`

instruction, and constructs a useful bitmask.

The idea is similar to ARMv8's encoding of immediate constants for logic instructions, but differs in the details. Toe can encode a few bitmasks that ARMv8 can't.

```
S0[0:c] Test Meaning
nyyyyyyxxxxxx y > x (n ? -1 : 0) ^ 0x0000000000000001 * ((1 << y) - (1 << x))
n0yyyyy1xxxxx y > x (n ? -1 : 0) ^ 0x0000000100000001 * ((1 << y) - (1 << x))
n00yyyy11xxxx y > x (n ? -1 : 0) ^ 0x0001000100010001 * ((1 << y) - (1 << x))
000yyyy01xxxx 0x0101010101010101 * ((y << 4) + x)
```

Note that only 5776 of the 8192 possible 13-bit patterns encode a bitmask, and that all these bitmasks are distinct. Other 12-bit patterns should not be used. `0`

and `-1`

cannot be encoded as a logic immediate.

The available performance counters are as follows:

```
Name Meaning
PR_0 Clock cycles since reset
PR_1 Instructions retired since reset
PR_2 Nanoseconds since reset
PR_3 A 64-bit random number
```

Small immediate constants are encoded as part of the instruction that uses them. Larger immediate constants require one or more constant continuation instructions before the instruction that uses them. A constant continuation instruction is distinguished by `d:f`

being `110`

.

It is possible to assemble a constant continuation instruction explicitly using the assembly language syntax `#n...`

(in which `n`

is the 13-bit constant). However, it is more usual to allow the assembler to construct the additional instructions. If you write a single assembly language instruction that includes an immediate constant, the assembler will prepend as many `#n...`

instructions as are necessary to express the constant's value.

You can force the assembler to assemble a specified number of instructions using an assembly language prefix. For example, `3.CALL label`

is guaranteed to assemble two `#n...`

instructions followed by a `CALL`

(or to report an error). This feature is useful in code that will be relocated and linked, in which the final value of `label`

might not be known. In order to support this feature, longer `#n...`

prefixes can encode all the values encodeable by shorter sequences.

```
d:f Written Meaning
110 #n... RESULT <- uxt(0:c) | (Q ? (S0 << 13) : 0)
Q <- 1
```

This is the only instruction which sets `Q`

to `1`

. All other instructions set `Q`

to `0`

. Therefore, the `Q`

flag indicates whether the previous instruction was `#n...`

. If so, the value it computed is used to supply 13 additional bits to the following instruction. Notably, it can be used to supply the high bits of an immediate constant in the immediately following instruction, but it has other uses too.

A direct jump instruction jumps to a constant program address. It is distinguished by `c:f`

being `1110`

.

```
b:f Written Meaning
11100 JUMP label RESULT <- PC
PC <- HERE + even(uxt(0:a) | (Q ? (S0 << 11) : 0))
11101 CALL label RESULT <- PC
PC <- HERE + even(uxt(0:a) | (Q ? (S0 << 11) : 0))
```

The only difference between `JUMP`

and `CALL`

is only in the behaviour of the branch predictor. `CALL`

pushes the return address onto the predictor's return stack, whereas `JUMP`

doesn't. Both compute a return address, though it is usually useful after a `CALL`

.

Valid code addresses are all even numbers, so bit `0`

is available to indicate a negative jump offset. The `even()`

function (defined earlier) maps the odd values to even negative values.

When not preceded by a constant continuation, direct jump instructions can jump up to 2K in either direction. Each preceding constant continuation instruction multiplies the range by 8192. With one, the range is 16M, which is large enough to reach anywhere in a static binary. With two, the range is 128G, which is large enough to reach a dynamic library loaded at a random address.

An immediate constant instruction computes an unsigned constant result. It is distinguished by `b:f`

being `11110`

.

```
b:f Written Meaning
11110 #n RESULT <- uxt(0:a) | (Q ? (S0 << 11) : 0)
```

There are a few alternative ways of expressing immediate constants for use in dataflow instructions which might be shorter or otherwise preferable (see "Unary operations"):

- Code addresses are encodeable in a PC-relative way.
`0`

and`1`

are encodeable as part of arithmetic instructions.- Negative constants are encodeable as the bitwise inverse of positive constants.
- Patterned bitmasks are encodeable as logic immediates.
- If the same constant is needed twice in quick succession, one immediate constant instruction suffices.
- Constants larger than 26 bits are best loaded from a PC-relative address.

When not preceded by a constant continuation, an immediate constant instruction can express an 11-bit unsigned number. Each preceding constant continuation instruction adds 13 more bits. Using more than two constant continuation instructions, though possible, is in most situations inferior to using a PC-relative load.

Conditional branch instructions are used to implement "if". They test a condition code: a boolean function of the `N`

, `Z`

, `C`

and `V`

flags. If the test passes, the instruction does a direct jump.

```
7:f Written Meaning
11111ccc0 B.cond label if cond(N, Z, C, V) {
RESULT <- -1
PC <- HERE + even(uxt(0:6) | (Q ? (S0 << 7) : 0))
} else {
RESULT <- 0
}
11111ccc1 B.cond label if !cond(N, Z, C, V) {
RESULT <- -1
PC <- HERE + even(uxt(0:6) | (Q ? (S0 << 7) : 0))
} else {
RESULT <- 0
}
```

There are only seven condition codes. If `8:a`

is `111`

then the instruction is not a conditional branch instruction.

The negated form of `B.cond`

is distinguished by setting bit `7`

. In assembly language, the negated form is indicated by changing the name of the condition code.

Note that a conditional branch to the immediately following instruction can be useful for computing a boolean value.

When not preceded by a constant continuation, conditional branch instructions can jump up to 128 bytes in either direction: a factor of 16 less than a direct jump. Each preceding constant continuation instruction multiplies the range by 8192. With one, the range is 1M, and with two, the range is 8G.

The available condition codes are similar to the ones that ARM uses. Each has two spellings in assembly language, one of which indicates that the boolean should be inverted:

```
8:a Written Inverted Meaning Interpretation after SUB x y
000 ZS or EQ ZC or NE Z x == y
001 CS or HS CC or LO C x >= y (unsigned)
010 NS or MI NC or PL N (x - y) < 0
011 VS VC V x - y overflows (signed)
100 HI LS C & !Z x > y (unsigned)
101 GE LT N == V x >= y (signed)
110 GT LE N == V & !Z x > y (signed)
111 Not a condition code
```

The formal meaning of the condition codes is in the "meaning" column. It is defined in terms of the condition flags. The section "Setting the condition flags" below explains how the condition flags are set by various arithmetic instructions. Combining the two, you can implement a variety of useful tests. The "interpretation" column gives some examples: the tests that can be achieved by preceding each condition code with a `SUB`

instruction.

An indirect jump instruction jumps to a computed code address. It is distinguished by `7:f`

being `111111110`

.

```
5:f 0:4 Written Meaning
11111111000 RSr JUMP RSr RESULT <- PC; PC <- RSr
11111111001 RSr CALL RSr RESULT <- PC; PC <- RSr
11111111010 RSr RET RSr RESULT <- PC; PC <- RSr
11111111011 n SWI #n Enter supervisor mode.
RESULT <- PC; PC <- [0]
```

The difference between `RET`

, `JUMP`

and `CALL`

is only in the behaviour of the branch predictor. `RET`

pops the return address from the predictor's stack, `CALL`

pushes the return address onto the predictor's return stack, and `JUMP`

doesn't use the stack.

`SWI`

generates an interrupt. This can be used for a number of purposes, but the most common is to perform a system call.

```
5:f 0:4 Written Meaning
11111111100 RSr MMAP Map physical page RSr at virtual page 0
```

TODO.

- Exceptions
- MMAP
- Make Fr depend more on Q.
- Replace
`SEL`

with`S.cond`

? - In arithmetic instructions, use bit
`b`

(?) to swap operands, not bit`9`

.