The CPUSim64 code consists of an assembler and an emulator. Assembly language files are plain text files that can create with any programming editor. Do not use a word processor like Word or Pages as these do not save their files as plain text. Then you use the assembler to compile your assembly language source files into CPUSim64 machine code object files. These object files can then be run with the CPUSim64 emulator. Both of these programs are written in Java so they themselves run on the Java VM (virtual machine).
CPUSim64 is composed of Java classes in jar files located in the lib folder of the distribution and script files in the main directory that you can use to run the assembler and emulator in different modes. Download the CPUSim64.zip file and expand it to a location of your choice. For this guide we will assume that you expand the ZIP archive in your Documents dirctory.
You are required to install Java JDK 17 on your system and make sure that the Java JDK bin directory is in your system's PATH ejvironment variable. Type the following in your terminal window to see if Java is installed correctly.
If you don't see the Java version printed, check your install and PATH variable. Also it can be handy to put the location of your CPUSim64 directory in the path variable as well. Then you can execute the CPUSim64 scripts from any directory.
Once you have Java and CPUSim64 installed, you can test it out using a the very small program listed below. Use a text editor to enter the text of the program then save it as example001.asm.
Our simple program only has three instructions. The NOP
instruction is called a No-Op because it doesn't do anything but take up one CPU cycle. The STOP
instruction tells the CPU to stop executing your program and return to the command line. At the very end of all your programs you must place two STOP
instructions as this tells the assembler to stop compiling instructions.
Once the program source file is saved you can run the assembler to compile your assembly language program into CPUSim64 machine code. This is done with the compile.sh (or compile.bat for Windows) script. Using the terminal window, navigate to the directory where your source file is saved. Then run the compile script by typing the script filename followed by the name of your source file. Do not include the ".asm" extension as that is assumed by the script. You should see output similar to the output below.
The program will compile your assembly language source file and create a machine lanugage object file with an ".obj.gz" extension. It will print how many words were compiled from your source when it is complete. If there are errors, they will be displayed. The symbols used and the labels used in your source will be listed.
To run the program use the run.sh (or run.bat for Windows) script name followed by the base name of your program source file. Like the compile script, the run script will first compile your source into an object file. It does this in a quiet mode so you don't get the same verbose output as the compile script. Then the run script will execute your object file on the CPUSim64 emulator. It will print statistics related to your program before it runs such as code size, heap size and maximum stack size. Then your program will run. After it runs, the message "System Halted!" will be printed and statistics related to the run will be printed such as the number of user CPU cycles used, the wall clock time it took to execut and the return code from your program.
If your program doesn't work as expected you can run it in debug mode using the debug.sh (or debug.bat for Windows) script name followed by the base name of your program source file. Like the run script your program will be compiled and run, but this time the assembler and emulator will be run in debug mode. The assembler will print out your source program as it understands it, complete with addresses and symbolic addresses. You can use this as a reference when you are debugging and can also use it to make sure that the assembler generates the instructions that you expect. When the CPUSim64 emulator runs your program in debug mode it will print out the entire state of the emulated CPU everytime it encounters a NOP
instruction. It will also print the final state of the CPU when your program ends.
For more information about how your program runs, you can debug your program in trace mode with trace.sh (or trace.bat for Windows) script name followed by the base name of your program source file. In this mode, each instruction will be printed as it is executed.
Because assembly language can be hard to read, it is important to put plenty of documentation in your source code in the form of comments. A comment line starts with two slashes (//
) and causes the line to be ignored by the assembler. You can also put comments using double slashes at the end of instruction lines which causes everything from the double slashes to the end of the line to be ignored. It is good practice to put a documentation block at the beginning of the program, functions or other important units of code to explain what the code is supposed to do when it works properly. This way someone (perhaps yourself six months from now) can debug errant code because the documentation will tell them what the correct operation should be.
One of the most basic operations available is to move a constant into either an integer register or a floating point register. This is done with the MOVE
operation. It can take two arguments, the first is the destination register and the second is the constant to move into the register. Constants can be 16-bit Unicode character constants, integer constants (in decimal or hexadecimal), or floating point constants. Characters constants are formed using a single character in single quotes. It is also possible to use special escape sequences for special characters and Unicode characters whose codepoint is known. The special characters are as follows:
'\0'
'\b'
'\t'
'\n'
'\f'
'\r'
'\"'
'\''
'\\'
Given the Unicode codepoint you can also specify a character using the escape sequence of the form \uxxxx
where 'xxxx' is the four hexadecimal digit value for the codepoint.
Integer constants can be positive or negative. They can be in decimal format or hexadecimal format. Hexadecimal constants are always preceeded by Ox
.
Floating point constants can be positive or negative. They are written using floating point notation (a decimal point is required) of up to 16 decimal significant digits. They can also be written using scientific notation such as 1.23e10 or 3.456e-20.
LOAD
instruction is substituted for your MOVE
instruction to load it into the register by the assembler. This same process applies to all floating point MOVE
instructions that specify constants since the 64-bit floating point constant can not fit inside the MOVE
instruction.You can also use the MOVE
instruction to move data from one general purpose register to another or from one floating point register to another. You can also use it to move data between general purpose registers and floating point registers.
When run using the debug script as in the above example, the entire CPU state is printed when the program completes. That way you can confirm the results of your program. But if you are only interested in viewing a few registers at a time during the execution of your program, you can use the DEBUG
instruction to display 1-4 registers at a time. The nice thing about the DEBUG
is that it only gets compiled into your code when you use the --debug
option on the assembler as is the case in the debug script. Likewise if your code was compiled with debug instructions turned on, they are only acted upon when the CPUSim64 emulator is run with the --debug
option, as is likewise the case with the debug script. If you run your program without the --debug
option on the emulator, debug is off and the DEBUG
instructions are treated as NOP
instructions.
If you want to display the entire CPU state at some time during the execution of your program in addition to at the end use the instruction int iPrintCPUState
. Unfortunately this uses a system interrupt and is not automatically disabled by the debug settings. You will have to remove it own your own when you no longer need it. You will also need to include the system definition file <system/system.def>
as well for this to work.
Arithmetic operations are available to perform addition, subtraction, multiplication and division on integer or floating point registers. There are a variety of arguments that can be supplied. There are two forms of two operand operations. The first form takes two register operands, the first operand is the destination and the second is the value to apply to it based on the arithmetic operation and stores the result in the first operand. The second form takes a register operand and an integer literal. Like the first form, it applies the literal second operand to the first and stores the result in the first.
The arithmetic operators also have a three operand form. There are four forms for three operands:
op GP_reg, GP_reg, GP_reg
op FP_reg, FP_reg, FP_reg
op GP_reg, GP_reg, int_literal
op FP_reg, FP_reg, int_literal
What all these forms have in common is that they apply the operation to the second and third operands and store the result in the first operand. For example:
In addition to the divide
operations we have seen so far, there is an additional form that takes four general purpose registers. This version divides the third operand by the fourth then places the integer quotient in the first operand and the remainder in the second.
Finally there are two arithmetic operations that take just a single operand: negation and reciprocal. The negate
operation takes a general purpose or floating point register, negates it then stores it back into the register. The recip
operation takes a floating point register, computes its reciprocal then stores it back into the register.
We can make control structures such as loops using the JUMP
instruction. The JUMP
instruction can branch unconditionally or based on the condition of one of the bits in the status register (SR). We can make the equivalent of DO WHILE/WEND and DO/WHILE loops as illustrated below.
JUMP
instruction we must always supply an address as the last operand. It can either be a general purpose register with an address in it or an address literal. Address literals are symbols that may with an '@' character. If the address literal is prefixed with the '@' character it means that the label is within the function currently being defined. Address literals without the '@' character refer to global labels outside the function being defined. The literal must match a label somewhere else in the code. Labels are symbols that end with a colon (':').Two instructions that can be helpful when writing loops are COMPARE
and TEST
.
COMPARE
takes two operands and subtracts them, setting the status register bits according to the computed difference. If the two operands are equal, the Z (zero) status bit will be true. Likewise it will be false (not zero) if they are not equal. When the status register is printed by the DEBUG
instruction the zerio bit will be a capital Z if it is set (true) and a lowercase z if it is not set (false). The condition codes we use in the JUMP
instruction are 'z' if set and 'nz' if not set, corresponding to 'is zero' and 'is not zero' respectively. You may also use the condition 'eq' for 'z' or 'ne' for 'nz'.
The TEST
instruction simply tests the single operand supplied, setting the status registers based on the attributes of the operand. It is essentially equivalent to comparing the operand to zero.
The table below describes some of the condition codes that can be used with the JUMP
instruction after a MOVE
, COMPARE
or TEST
.
Condition Code | SR Bit Checked | Relational Equivalent |
---|---|---|
u | unconditional | |
z or eq | zero | op0 == op1 |
nz or ne | not zero | op0 != op1 |
n or lt | negative | op0 < op1 |
p or gt | positive | op0 > op1 |
nn or ge | not negative | op0 >= op1 |
np or le | not positive | op0 <= op1 |
We have a number of bitwise logical operators that can be used for Boolean arithmetic. These operators can also be used for simple logical testing if we restrict our use of -1 and 0 for operand values. This allows us to represent TRUE as -1 (all bits set) and FALSE as 0 (no bits set). The binary logical operators are AND
, OR
and XOR
. There is one operator that takes a single argument, the COMPLIMENT
operator (also known as logical NOT
) The following example illustrates this use of the logical operators to print out truth tables.
This last example makes use of console output functions to print the truth tables and format the values. We will talk about console output later in this document.
Using symbolic names for literal values helps with the readability of our programs. It also helps eliminate mistakes caused by repeated typing of the same literal value. With symbolic constants we get the added benefit that if we misspell the symbolic constant, we should get a compile error. Unlike mistyping a literal value is often just a different and wrong legal literal value.
One way to create symbolic constants is to use a preprocessor directive #define
. The preprocessor directives define simple text substitutions that happen on our code before it is compiled. We have been using one such directive #include
to add in code from another file, The #define
directive establishes a simple substitution between a symbolic name and a numeric literal (integer or floating point). When ever the symbolic name is used in our code, the corresponding literal value is substituted just as if we had typed it into the code ourselves. When we use #define
we often use all upper case symbols to help reminds us where text substitutions are occuring in our code.
The other mechanism for creating named constants is to declare a constant in memory at the end of our code. This is done with the DCI
or DCF
compiler directives to store an integer or floating point value. There are also DCC
and DCS
directives for storing a character or string respectively.
Using named constants we can refactor the truth tables program to be more readable and avoid redundancies thus reducing the chances of typing errors. Symbols for TRUE
and FALSE
are defined in <system/system.def>
. We use a DCS constant for the formatting strings to fprintf()
.
To access elements of an array all you need is the base address of the array and an offset to the element. When you create an array with DCA
you should give it a label which will be the base address of the array. Depending on whether the array has integer elements or floating point elements you can use one of the load instruction of the form:
load r0, BASE_ADDR[offset]
load f0, BASE_ADDR[offset]
The offset can be a literal integer or the value in an integer register. Valid offsets are zero through the size of the array minus one. You can also use the special offset -1 to get the size of the array.
If you include the system header file <system/io.asm>
you will gain access to a number of helpful functions for performing output to the console.
Function | Description |
---|---|
puts(str) | Prints a string |
putc(value) | Prints a character |
put_int(value, base) | Prints an integer using supplied base |
put_dec(value) | Prints an integer in base 10 |
put_hex(value) | Prints an integer in base 16 |
put_fp(value) | Prints a floating point |
put_nl() | Prints a new line |
fprintf(STDOUT, fmt, values...) | Uses a format string to print variable number of value |
Because these functions pass arguments on the stack, you use the #call
directive when you wish to call them. All arguments should be either integer or floating point registers as appropriate to the call. The arguments will then be pushed onto the stack and the function call made.
When writing command line programs it is often necessary to pass in arguments to the program on the command line. There are two system level interrupts that we can invoke. One will give us the count of items on the command line and the other can be used to get each item on the command line. Items on the command line are strings and are separated by spaces. For example:
> run.sh example015 326 Hello 3.1415
Interrupts are operating system level functions that are executed via the software interrupt mechanism of the cpu invoked with the INT
instruction. Interrupt instructions take a single integer operand to identify the system level function to execute. Interrupts use a register passing convention. If the interrupt requires an input argument it is expected in r0
or f0
. Likewise a value can be returned from the interrupt in r0
or f0
. Symbols for the various interrupt codes available are defined in the system definition files. You must include the appropriate file to gain access to the definitions. For the command line argument interrupts we will need to include <system/system.def>
.
The first command line argument in element zero of ARGS is always the name of the program file that is running.
Often we will want to take command line arguments (which are strings) and convert them to integers or floating point numbers so that we can do calculations with them. In the <system/string.def>
definition file we have some interrupt codes defined to help us with that.
Function | Description |
---|---|
iPARSE_INT | Converts string at r0 to integer in r0 Accepts both decimal and hexadecimal with the '0x' prefix |
iPARSE_DEC | Converts decimal string at r0 to integer in r0 |
iPARSE_HEX | Converts hexadecimal string at r0 to integer in r0 |
iPARSE_FLOAT | Converts FP string at r0 to floating point in f0 |
If these parsing functions can not make any sense of the string passed in r0
, they will return zero.
Conditional execution of code is acoomplished using the JUMP
instruction. We can compose an IF/THEN construct using one JUMP
or an IF/THEN/ELSE construct using two JUMP
instructions. For example if we look at the number of command line argumnents passed to the program we can branch based on whether there are any or not. If we compare the result of interrupt iARGC to 2 we will know that there aren't any command line arguments if the result of the comparison is less than. In pseudocode we would have:
Because we want to use a JUMP
instruction to branch around the code in the THEN part of the IF, we actually jump on the opposite test. Since greater or equal is the opposite of less than, we shall jump around if argc ≥ 2. See the example code for how this is implemented.
If we want to implement an IF/THEN/ELSE we need two JUMP
instructions. Again comparing to the result of interrupt iARGC we can print one message if there are no arguments and a different message if there are arguments. The pseudocode for this is as follows:
If argc < 2 we need to jump around the THEN statements. The THEN statements likewise need to jump around the ELSE statements so the THEN section must end with a JUMP
to the end of the IF/THEN/ELSE.
Often we need to repeat code or similar code multiple times. If the code follows a pattern with just a few elements that differ we can write a macro substitution that can be used to implement the code more easily and correctly.
Macro substitutions are setup with the #def_macro
preprocessor directive. Using this directive we specifiy the name of the macro and its arguments in parenthesis. Following that we provide the statements we want to be substituted when the macro is used in our code. In the substitution code we can use the macro variables by using the special syntax ${varname}
. Each time such a special variable symbol is used, it is replaced by the text supplied for that variable when the macro is used.
This next example uses macros with three arguments to compute the minimum and the maximum of two integers inline.
You can make these two macros even more compact (two instructions each) by using a special form of the MOVE
instruction that takes a SR
just like the JUMP
instruction does. If the condition is true, it moves the third operand into the second. If the condition is false, it moves the fourth operand into the second. This is called a conditional move.
We can better organize our code and make it easier to understand by breaking it up into separate functions that do one thing well. We can then call our functions from many places in our code eliminating the redundancy inherent in using macros.
The basic form of function in assembly language uses register-based calling conventions. This means that the inputs to a function are expected to be in specific registers, typically, r0
, r1
, etc for integer arguments or addresses and f0
, f1
, etc. for floating point. By convention, functions are allowed to destroy the contents of r0
and/or f0
. In fact these are the two registers in which a function might return a value. But if the function uses any other registers it is expected that it will PUSH
the values of the registers it uses onto the stack to save them and restore them with POP
when it ends.
Register-based functions are started with a unique label which gives the function its name. The function must use a RETURN
statement at the end of its code to return to the calling code.
The calling code sets up the required registers then issues a CALL
instruction with the address of the function. The CALL
instruction can also take a SR
condition like the JUMP
instruction does to make the CALL
conditional.
Functions can be defined before or after the code that calls them, but functions generally are not to be declared inside other functions.
In the following example we take our minimum and maximum macros and turn them into register-based functions.
Stack-based calling conventions give us some advantges over register calling conventions. First they manage all the PUSH
and POP
instructions necessary to use registers other then r0
or f0
. They also allow us to give symbolic names to arguments and other registers, making our code easier to read.
Stack-based functions are defined using the preprocessor declaration #DEF_FUNC
and stack-based functions are called with the preprocessor declaration #CALL
.
At the beginning of your function you can use one #SVAR
directive to declare additional stack named variables. You will have to use LOAD
and STORE
to access these variables. That can be followed by one #VAR
directive to declare all your integer or address register named variables. These register variables can be used direvtly by instructions and can be changed with MOVE
. That in turn can be followed by one #FVAR
to declare all you floating point register named variables.
To return a value simply put it in r0
or f0
. You can also use the #RETURN
or #FRETURN
which take a register and move it into r0
or f0
respectively.
Below is our min/max example using stack-based functions. There is also a sum function for summing a floating point array.
We can even turn our main code into a function and simply call it from the start of our program then exit with a return value returned by main.
Our programs have access to a whole region of memory called the heap. Our programs can dynamically allocate blocks from the heap as they are running. When we are done with an allocated block we must release or "free" it. In this way we can manage the memory in the heap.
As an example, say we want to allocate enough memory to hold 100 integers. We can do this issue the software interrupt iALLOC
with the size of the block we want to allocate in r0
. It will return the address of the allocated block in r0
or 0
if there is an error.
For small allocations the iALLOC
may allocate a slightly larger block so that there are consistent size blocks in the help which helps when blocks are freed an can be reused. The allocated size is stored in the word right before the address returned by iALLOC
.
There are a number of functions for operating on strings. Strings are simply an array of Unicode codepoints terminated with the null character (0 or '\0'). Strings are either statically allocated using a DCS
assembler directive and stored in the code segment or they are dynamically allocated in the heap using interrupt iALLOC
. It is important to free strings when you don't need them any longer to free up memory in the heap.
Math functions are available for most of the standard math functionality on floating point values. There are also some functions for generating random numbers which is helpful in simulations and games.