Halfive Virtual Machine Specification

SPDX identifier: COIL-1.0

Copyright Nomagno 2021, 2022

It is recommended to use the “.h5asm” extension for H5VM assembly documents (where applicable). H5VM assembly is identified by the MIME type text/h5asm

It is recommended to use the “.h5bin” extension for H5VM binary executables (where applicable). H5VM binary drives are identified by the MIME type application/h5bin

It is recommended to use the “.h5drive” extension for H5VM read-only drives (where applicable). H5VM binary drives are identified by the MIME type application/h5drive

The Halfive virtual machine


The Halfive virtual machine, henceforth referred to H5VM, is a standardized, but not fully defined, execution engine for code. This code may be interpreted directly from the assembly format, executed directly from the binary format, or executed in any other equivalent way.

The H5VM Instruction notation

The instruction set for the virtual machine is composed of precisely 16 instructions, whose functioning is documented with the following notation: inst ARG1 ARG2, where ARGx may be noted as either Vx, Rx, or ID. There are three types of arguments: literals, pointers, and addresses. Literals are prefixed by equal =XXXX, pointers by and &XXXX, and addresses are not XXXX. All are always written down in hexadecimal base when not talking about the binary format. Arguments of type Vx take literals, pointers and addresses, those of type Rx take only addresses and pointers, and the ones denoted ID take only literals. POINTERS AND LITERALS SHALL NEVER BE COMBINED IN AN INSTRUCTION’S ARGUMENTS


H5VM Memory layout (Code memory and data memory)

IMPORTANT NOTE: ALL MEMORY IN CODE AND DATA MEMBERS HAS TO BE INITIALIZED TO 0 (ZERO)!

Following the legacy Harvard architecture, H5VM loads code and data separately into different ‘chips’, memory sections, or however it may be implemented. The executable section will be henceforth referred to as ‘code member’, and the data section will be henceforth referred to as ‘data member’.

H5VM has two major data structures: the byte and the integer. Both are raw memory, and bytes must be able to hold the values 0x00 to 0xFF, so on a typical computer they need to be unsigned integer cells of at least 8 bits in size. The machine is also required to handle the code member in cells that can have values from 0x0000 to 0xFFFF, or minimum 16-bits.

It is encouraged to denote two basic memory units trough constants, variables, or identifiers: H5VM_MEMSIZE, and H5VM_MEMSMALL. H5VM_MEMSIZE denotes the basic amount of bytes/integers that may be allocated, and is 0x1000. H5VM_MEMSMALL denotes a fraction of H5VM_MEMSIZE, meant for more precise assignments, 0x400.

Note: Altough bytes can only hold 8 bits of information, and so literals may only range from 0x00 to 0xFF, the address space of the VM is 16 bits, and the machine must hence be able to read addresses ranging from 0x0000 to 0xFFFF. Pointers are memory addresses read as the value of two contiguous bytes from memory

There are two structures inside the code member:

The instruction storage can hold 0x4000 bytes/ints, or four times the value of H5VM_MEMSIZE.

There are three structures inside the data member:

Special registers (0xFFF0-0xFFFF):

Extra note: the program counter is incremented every time an instruction is executed, however the jmp/skpz/skmz instructions can forcible modify it without increasing it, hence being the only instructions capable of taking 16-bit literals. All instructions in a program SHALL be LINEARLY NUMBERED from ZERO (0) the first to the last, and the program counter SHALL indicate execution of each instruction at any given time.


The instruction set

There are currently SIXTEEN (16) instructions, each numbered with the decimal number in PARENTHESIS () for later reference.

halt (0) - TAKES NO ARGUMENTS, STOPS PROGRAM EXECUTION
nop (1) - TAKES NO ARGUMENTS, DOES NOTHING (*RESERVED FOR FUTURE USAGE*).
jmp (2) V1 - JUMP (MOVE THE PROGRAM COUNTER, HAND EXECUTION) *TO* VALUE V1. SPECIAL EXCEPTION: Take 16-bit literals,
    addresses are treated as 16-bit literals, and pointers are treated as addresses with a 16-bit value
skpz (3) - LITERAL; add (LITERAL+1) to PROGRAM COUNTER *IF* 0xFFFF is ZERO (0), where LITERAL is a literal.
skmz (4) - LITERAL; substract (LITERAL-1) from PROGRAM COUNTER *IF* 0xFFFF is ZERO (0), where LITERAL is a literal.

set (5) R1 V2; SETS ADDRESS R1 *TO* VALUE V2
add (6) R1 V2  - ADD R1 AND V2, WRITE THE RESULT TO R1. SETS CARRY/ZERO FLAGS APPROPIATELY
sub (7) R1 V2  - SUBSTRACT V2 *FROM* R1, WRITE THE RESULT TO R1. SETS CARRY/ZERO FLAGS APPROPIATELY
and (8) R1 V2  - PERFORM A BINARY 'and' ON R1 AND V2, WRITE THE RESULT TO R1. SETS ZERO FLAG APPROPIATELY
or (9) R1 V2 - PERFORM A BINARY 'or' ON R1 AND V2, WRITE THE RESULT TO R1. SETS ZERO FLAG APPROPIATELY
xor (10) R1 V2 - PERFORM A BINARY 'xor' ON R1 AND V2, WRITE THE RESULT TO R1. SETS ZERO FLAG APPROPIATELY
shift (11) R1 V2 - IF V2 IS 0 THROUGH 7, BITSHIFT R1 LEFT BY V2, WRITE THE RESULT TO R1.
    ELSE IF V2 IS 8 THROUGH F, BITSHIFT R1 RIGHT BY (V2 - 8), WRITE THE RESULT TO R1. ELSE DO NOTHING

cmp (12) V1, V2 - SUBSTACT V2 *FROM* V1, BUT *WITHOUT* SAVING THE RESULT. SETS CARRY/ZERO FLAGS APPROPIATELY
func (13) ID - SEE SECTION BELOW
ret (14) ID - SEE SECTION BELOW
call (15) ID - SEE SECTION BELOW

Subroutines


Assembly language

The assembly language is the exact representation of instructions, literals, pointers and addresses that has been discussed so far.


Binary format

A specification of the format follows:

BINARY FORMAT:
You code each instruction as 4 bits of which the least significant 2 indicate
which arguments are literals and which addresses, and the highest indicates if
the literals are pointers or not (0001 - from right to left:  first is literal,
second address, third bit is padding, high bit indicates they are literals and NOT
pointers), then 4 bits for the instructions themselves (check the listed values)
then two 16-bit 'arguments' (0-2),
and repeat.

EXAMPLE:
BINARY: 0001 0101 000000000000001 000000000011111 0000 0000 0000000000000000 0000000000000000
DECIMAL:   1    5               1              31 0       0                0                0
ASSEMBLY:     add               1             =1F      halt
ENGLISH: add the contents of address ONE and the number 1F, put the result back into address ONE, halt

SUMMARY OF DATA STORAGE TYPES

You can store code as either .h5vm ASSEMBLY LANGUAGE files, or .h5bin BINARY FILES, and can store data as .h5drive READ-ONLY MEMORY files.