christianermann.dev-hugo

The Hugo source for my website
git clone git://git.christianermann.dev/christianermann.dev-hugo
Log | Files | Refs | Submodules | README

commit c9f76038080e2337b0fa632544ed76b741b7c9ea
parent 6db462fd21624cce1b3069d31f66ed0a08b24dc4
Author: Christian Ermann <christianermann@gmail.com>
Date:   Thu, 14 Nov 2024 10:33:39 -0800

Add 'The Infamous 'link' Macro' and 'Forth RV32I Assembler

Diffstat:
Acontent/posts/forth-rv32i-assembler.md | 172+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Acontent/posts/the-infamous-link-macro.md | 70++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 242 insertions(+), 0 deletions(-)

diff --git a/content/posts/forth-rv32i-assembler.md b/content/posts/forth-rv32i-assembler.md @@ -0,0 +1,172 @@ +--- +title: "Forth RV32I Assembler" +date: 2024-11-13T17:33:47-08:00 +tags: [Forth, Assembly, RISC-V] +draft: false +--- + +Here's an assembler for RV32I that I wrote in Forth. I find the definitions of +the instructions and instruction types especially elegant, and I find it to be +a great demonstration of how concise and powerful Forth can be. + +All code in this post is licensed under the +[AGPLv3](https://www.gnu.org/licenses/agpl-3.0.en.html#license-text). + +```forth +\ encode stack values into proper instruction locations. all +\ encoding sequences must begin with 'opcode'. +: opcode hex 7F and ; +: funct3 swap hex 7 and decimal 12 lshift + ; +: funct7 swap hex 7F and decimal 25 lshift + ; +: i-immed swap hex FFF and decimal 20 lshift + ; +: i-immed-shamt swap hex 1F and decimal 20 lshift + ; +: u-immed swap hex FFFFF and decimal 12 lshift + ; +: s-immed over hex FE0 and decimal 20 lshift + + swap hex 1F and decimal 7 lshift + ; +: j-immed over hex 100000 and decimal 11 lshift + + over hex FF000 and + + over hex 800 and decimal 9 lshift + + swap hex 7FE and decimal 20 lshift + ; +: b-immed over hex 1000 and decimal 19 lshift + + over hex 800 and decimal 4 rshift + + over hex 7E0 and decimal 20 lshift + + swap hex 1E and decimal 7 lshift + ; +: rd swap hex 1F and decimal 7 lshift + ; +: rs1 swap hex 1F and decimal 15 lshift + ; +: rs2 swap hex 1F and decimal 20 lshift + ; + +\ instruction types. all instruction values should be pushed on the +\ stack with the opcode last before calling. +: r-type opcode funct3 funct7 rs2 rs1 rd , ; +: i-type opcode funct3 i-immed rs1 rd , ; +: i-type-shamt opcode funct3 funct7 i-immed-shamt rs1 rd , ; +: s-type opcode funct3 s-immed rs2 rs1 , ; +: b-type opcode funct3 b-immed rs2 rs1 , ; +: u-type opcode u-immed rd , ; +: j-type opcode j-immed rd , ; + +\ instructions. these are just simple encodings, no assembler +\ niceties yet. +\ funct7 funct3 opcode encoding +: addi, hex 0 13 i-type ; +: andi, hex 7 13 i-type ; +: ori, hex 6 13 i-type ; +: xori, hex 4 13 i-type ; +: slli, hex 00 1 13 i-type-shamt ; +: srli, hex 00 5 13 i-type-shamt ; +: srai, hex 20 5 13 i-type-shamt ; +: slti, hex 2 13 i-type ; +: sltiu, hex 3 13 i-type ; +: lui, hex 37 u-type ; +: auipc, hex 17 u-type ; +: add, hex 00 0 33 r-type ; +: sub, hex 20 0 33 r-type ; +: and, hex 00 7 33 r-type ; +: or, hex 00 6 33 r-type ; +: xor, hex 00 4 33 r-type ; +: sll, hex 00 1 33 r-type ; +: srl, hex 00 5 33 r-type ; +: sra, hex 20 5 33 r-type ; +: slt, hex 00 2 33 r-type ; +: sltu, hex 00 3 33 r-type ; +: jal, hex 6F j-type ; +: jalr, hex 0 67 i-type ; +: beq, hex 0 63 b-type ; +: bne, hex 1 63 b-type ; +: blt, hex 4 63 b-type ; +: bltu, hex 6 63 b-type ; +: bge, hex 5 63 b-type ; +: bgeu, hex 7 63 b-type ; +: lw, hex 2 03 i-type ; +: lh, hex 1 03 i-type ; +: lhu, hex 5 03 i-type ; +: lb, hex 0 03 i-type ; +: sw, hex 2 23 s-type ; +: sh, hex 1 23 s-type ; +: sb, hex 0 23 s-type ; +: fence, hex 0 0F i-type ; +: ecall, hex 0 73 i-type ; +: ebreak, hex 0 73 i-type ; + +\ some instructions, now with nicer usage. +: sw, >r swap r> sw, ; +: sh, >r swap r> sh, ; +: sb, >r swap r> sb, ; +: ecall, 0 0 0 ecall, ; \ usage: ecall, +: ebreak, 0 0 1 ebreak, ; \ usage: ebreak, +: fence, >r 0 0 r> fence, ; \ usage: imm fence, + +\ registers +decimal + 0 constant x0 1 constant x1 2 constant x2 3 constant x3 + 4 constant x4 5 constant x5 6 constant x6 7 constant x7 + 8 constant x8 9 constant x9 10 constant x10 11 constant x11 +12 constant x12 13 constant x13 14 constant x14 15 constant x15 +16 constant x16 17 constant x17 18 constant x18 19 constant x19 +20 constant x20 21 constant x21 22 constant x22 23 constant x23 +24 constant x24 25 constant x25 26 constant x26 27 constant x27 +28 constant x28 29 constant x29 30 constant x30 31 constant x31 + +\ registers (calling convention) +x0 constant zero \ zero constant +x1 constant ra \ return address +x2 constant sp \ stack pointer +x3 constant gp \ global pointer +x4 constant tp \ thread pointer +x8 constant fp \ frame pointer +\ function arguments / return values (a0, a1) +x10 constant a0 x11 constant a1 x12 constant a2 x13 constant a3 +x14 constant a4 x15 constant a5 x16 constant a6 x17 constant a7 +\ saved registers +x8 constant s0 x9 constant s1 x18 constant s2 x19 constant s3 +x20 constant s4 x21 constant s5 x22 constant s6 x23 constant s7 +x24 constant s8 x25 constant s9 x26 constant s10 x27 constant s11 +\ temporaries +x5 constant t0 x6 constant t1 x7 constant t2 x28 constant t3 +x29 constant t4 x30 constant t5 x31 constant t6 +``` + +And here's some tests to verify the instruction encodings are generated +correctly: +```forth +: undo, -1 cells allot here @ ; \ undoes the last ',' +t{ a0 a1 hex FF addi, undo, -> hex 0FF58513 }t +t{ a0 a1 hex FF andi, undo, -> hex 0FF5F513 }t +t{ a0 a1 hex FF ori, undo, -> hex 0FF5E513 }t +t{ a0 a1 hex FF xori, undo, -> hex 0FF5C513 }t +t{ a0 a1 hex F slli, undo, -> hex 00F59513 }t +t{ a0 a1 hex F srli, undo, -> hex 00F5D513 }t +t{ a0 a1 hex F srai, undo, -> hex 40F5D513 }t +t{ t0 hex FFFF lui, undo, -> hex 0FFFF2B7 }t +t{ t0 hex FFFF auipc, undo, -> hex 0FFFF297 }t +t{ a0 a1 hex FF slti, undo, -> hex 0FF5A513 }t +t{ a0 a1 hex FF sltiu, undo, -> hex 0FF5B513 }t +t{ a0 a1 a2 add, undo, -> hex 00C58533 }t +t{ a0 a1 a2 sub, undo, -> hex 40C58533 }t +t{ a0 a1 a2 and, undo, -> hex 00C5F533 }t +t{ a0 a1 a2 or, undo, -> hex 00C5E533 }t +t{ a0 a1 a2 xor, undo, -> hex 00C5C533 }t +t{ a0 a1 a2 sll, undo, -> hex 00C59533 }t +t{ a0 a1 a2 srl, undo, -> hex 00C5D533 }t +t{ a0 a1 a2 sra, undo, -> hex 40C5D533 }t +t{ a0 a1 a2 slt, undo, -> hex 00C5A533 }t +t{ a0 a1 a2 sltu, undo, -> hex 00C5B533 }t +t{ ra hex FFFF jal, undo, -> hex 7FF0F0EF }t +t{ ra a0 hex FF jalr, undo, -> hex 0FF500E7 }t +t{ a0 a1 hex F beq, undo, -> hex 00B50763 }t +t{ a0 a1 hex F bne, undo, -> hex 00B51763 }t +t{ a0 a1 hex F blt, undo, -> hex 00B54763 }t +t{ a0 a1 hex F bltu, undo, -> hex 00B56763 }t +t{ a0 a1 hex F bge, undo, -> hex 00B55763 }t +t{ a0 a1 hex F bgeu, undo, -> hex 00B57763 }t +t{ a0 a1 hex F lw, undo, -> hex 00F5A503 }t +t{ a0 a1 hex F lh, undo, -> hex 00F59503 }t +t{ a0 a1 hex F lhu, undo, -> hex 00F5D503 }t +t{ a0 a1 hex F lb, undo, -> hex 00F58503 }t +t{ a0 a1 hex F sw, undo, -> hex 00A5A7A3 }t +t{ a0 a1 hex F sh, undo, -> hex 00A597A3 }t +t{ a0 a1 hex F sb, undo, -> hex 00A587A3 }t +t{ hex 0FF fence, undo, -> hex 0FF0000F }t +t{ ecall, undo, -> hex 00000073 }t +t{ ebreak, undo, -> hex 00100073 }t +``` diff --git a/content/posts/the-infamous-link-macro.md b/content/posts/the-infamous-link-macro.md @@ -0,0 +1,70 @@ +--- +title: "The Infamous 'link' Macro" +date: 2024-11-12T13:01:09-08:00 +tags: [Forth, Assembly, RISC-V, ARM] +draft: false +--- + +Each word (or function, in other languages) in Forth is stored as an entry in +a linked list known as the dictionary. When bootstrapping a Forth from +assembly, it is your responsibility to create and maintain this linked list +structure. This is a tedious process and is the source of many errors when +re-arranging words or defining new words; it's incredibly easy to turn your +list into a graph by mistake. + +In +[jonesforth](https://rwmj.wordpress.com/2010/08/07/jonesforth-git-repository/), +an x86 implementation of Forth, the assembler supports re-assigning new values +to assembler variables. This means an assember variable can be used to store +the memory address of the previous entry, and we have the ability to update it +or use it whenever we need to. I used this exact strategy in my +[incomplete x86 Forth](https://git.christianermann.dev/forth/log.html) although +I used [FASM](https://flatassembler.net/) instead of +[GAS](https://wiki.osdev.org/GAS). Unfortunately, GAS doesn't support this +feature for ARM or RISC-V targets. It may be possible to pull this off on ARM +with the right set of relocation and/or relaxation parameters, but I was unable +to find success with RISC-V. + +I did come up with an alternative set of macros that achieves the same goal +and should be more portable than the `jonesforth` solution: +```GAS +.macro this_link + .globl link_\+ +link_\+: +.endm + +.macro prev_link + .int link_\+ +.endm + +.macro link + this_link + prev_link +.endm + +this_link +``` + +These macros depend on the special value +{{<highlight GAS "hl_inline=true">}}\+{{</highlight>}}. +This value is replaced by the invocation count of the current macro during +assembly. Since we want to +{{<highlight GAS "hl_inline=true">}}link{{</highlight>}} +back to the previous word in the dictionary, we need +{{<highlight GAS "hl_inline=true">}}prev_link{{</highlight>}} +to resolve to 1 word before +{{<highlight GAS "hl_inline=true">}}this_link{{</highlight>}}, +which is why we call +{{<highlight GAS "hl_inline=true">}}this_link{{</highlight>}} +once before defining any words. + +You'll also need to add this to your linker script to null-terminate your +dictionary, otherwise the start of your dictionary will be marked by the +address where the +{{<highlight GAS "hl_inline=true">}}link_0{{</highlight>}} +label created by +{{<highlight GAS "hl_inline=true">}}this_link{{</highlight>}} was stored: +```GAS +link_0 = 0; +``` +