forth.asm (28675B)
1 ; An in-progress Forth compiler and tutorial for UEFI x86_64 systems. 2 ; 3 ; Written By Christian Ermann 4 ; 5 6 format pe64 dll efi 7 entry main 8 9 section '.text' code executable readable 10 11 include 'efi.asm' 12 13 ;------------------------------------------------------------------------------ 14 ; 15 ; Foreword 16 ; 17 ; I am writing this as an alternative to the "jonesforth" implementation of 18 ; the Forth language for learning how to write a Forth. I'm attempting to 19 ; design my Forth in such a way that a user can interact with the system as 20 ; soon as possible, and further progress is broken into incremental steps. My 21 ; hope is that this also makes it easier to port to different architectures by 22 ; providing natural checkpoints to verify functionality as more features are 23 ; added. 24 ; 25 ;------------------------------------------------------------------------------ 26 ; 27 ; Table of Contents 28 ; 1. The Basic Structure of Forth 29 ; 2. Executing this Program 30 ; 3. Hello, World! 31 ; 4. Reading Words 32 ; 5. Printing Numbers 33 ; 34 ;------------------------------------------------------------------------------ 35 ; 36 ; Section 1: Basic Structure 37 ; 38 ; In this section, we'll learn how to represent words in the Forth language in 39 ; memory, and how to execute those words to do meaningful work. 40 ; 41 ; The Dictionary 42 ; 43 ; All words in Forth are stored in a dictionary, just like in English or in 44 ; French. Each entry in a Forth dictionary stores a reference to the previous 45 ; word (known as the link pointer), some flags to denote special properties of 46 ; certain words, the length of the word's name, the name of the word, and 47 ; finally the definition of the word. For memory alignment, there may be some 48 ; padding placed inbetween 49 ; the name of the word and its definition. The memory layout of an entry for 50 ; soupforth can be seen below: 51 ; 52 ; Dictionary Entry 53 ; +-------------------------------+ < 54 ; | link pointer | 8 bytes | | Header 55 ; +-------------------------------+ | 56 ; | flags | 1 byte | | 57 ; +-------------------------------+ | 58 ; | length | 1 byte | | 59 ; +-------------------------------+ | 60 ; | name | upto 255 bytes | | 61 ; +-------------------------------+ | 62 ; | padding | upto 7 bytes | | 63 ; +-------------------------------+ < 64 ; | definition | | 65 ; +-------------------------------+ 66 ; 67 ; All the words we define can be split into two main categories: primitive 68 ; words and Forth words. The terminology can be a bit confusing, as all the 69 ; words we define are part of our Forth, but what these terms actually compare 70 ; is the implementation of the words. Primitive words are implemented directly 71 ; in assembly, while Forth words are implemented in Forth itself. 72 ; 73 ; This distinction is mostly hidden from users of the language, however, the 74 ; choices you make affect the speed and portability of the implementation. As 75 ; the number of primitive words increases, so does the speed of the Forth 76 ; implementation. However, the fewer primitive words there are, the easier the 77 ; Forth is to port to new architectures. 78 ; 79 ; As implementers, we do need to be aware of the differences. For example, the 80 ; "definition" field, left blank in the diagram above, differs slightly between 81 ; primitive words and Forth words. Definitions always start with a codeword and 82 ; end with some sort of "return" statement, but the details of these codewords 83 ; and "return" tatements differ between the two types of words. 84 ; 85 ; For primitive words, the codeword is the address of the assembly code that 86 ; implements the word. The return statement is called `NEXT` and takes care of 87 ; loading and jumping to the address of the **next** word. 88 ; 89 ; As we execute words, we'll use the RAX register to store the address of the 90 ; word that we're currently executing and we'll use the RSI register to store 91 ; the address of the next word we need to execute. 92 ; 93 ; This leads us to a fairly straightforward definition of a macro for NEXT: 94 95 macro NEXT 96 { 97 lodsq ; 1. Load value at address RSI into RAX 98 jmp qword [rax] ; 2. Jump to address in RAX 99 } 100 101 ; For Forth words, the codeword is called DOCOL, short for "DO COLON", as Forth 102 ; definitions are started with a ":". DOCOL takes care of storing the address 103 ; of the next word we need to execute, and starting execution of the current 104 ; word. The return statement is called EXIT and loads the address of the next 105 ; word that DOCOL stored earlier and calls NEXT. 106 ; 107 ; In order to store and load addresses of words, Forth uses what's known as the 108 ; return stack. It's called the return stack as it stores the addresses we 109 ; **return** to at the end of a definition. We'll use the RBP register as our 110 ; return stack pointer. 111 112 macro push_rs reg 113 ; Push the value of `reg` onto the return stack. 114 ; 115 { 116 lea rbp, [rbp - 8] ; 1. Move the return stack pointer down 1 address. 117 mov [rbp], reg ; 2. Push the word address onto the stack. 118 } 119 120 macro pop_rs reg 121 ; Pop the top value of the return stack into `reg`. 122 ; 123 { 124 mov reg, [rbp] ; 1. Pop a word address off of the stack. 125 lea rbp, [rbp + 8] ; 2. Move the return stack pointer up 1 address. 126 } 127 128 DOCOL: 129 ; Start execution of a Forth word. 130 ; 131 push_rs rsi ; 1. Save the next word's address onto the return stack. 132 lea rsi, [rax + 8] ; 2. Load the address of the first data word into RSI. 133 NEXT ; 3. Execute the word pointed to by RSI. 134 135 ; As EXIT is included in the definition of every Forth word, it has to have an 136 ; entry in the dictionary. EXIT will be the first primitive word we define. 137 ; 138 ; Let's revisit what the memory layout of our entry will look like before we 139 ; start defining anything: 140 ; 141 ; Memory layout of EXIT 142 ; 143 ; section '.rodata' 144 ; +-------------------------------+ <- name_EXIT 145 ; | link pointer | 8 bytes | 146 ; +-------------------------------+ 147 ; | flags | 1 byte | 148 ; +-------------------------------+ 149 ; | length | 1 byte | 150 ; +-------------------------------+ 151 ; | name | upto 255 bytes | 152 ; +-------------------------------+ 153 ; | padding | upto 7 bytes | 154 ; +-------------------------------+ <- EXIT 155 ; | code_EXIT | 8 bytes | 156 ; +-------------------------------+ 157 ; 158 ; section '.text' 159 ; +-------------+ <- code_EXIT 160 ; | pop_rs rsi | 161 ; | NEXT | 162 ; +-------------+ 163 ; 164 ; Some notes about the labels: 165 ; 1. name_EXIT is the label for the start of the dictionary entry. 166 ; 2. EXIT is the label for the location of the codeword. 167 ; 3. code_EXIT is the label of the assembly code implementing EXIT. 168 ; 169 ; We'll need to re-create this memory structure for every primitive word we 170 ; define so it's worthwhile to write a `defcode` macro that does it for you: 171 172 macro defcode name, name_length, flags, label_ 173 ; Define a primitive word. 174 ; 175 { 176 section '.rodata' data readable 177 align 8 178 name_#label_: 179 dq link ; 1. Set the link pointer. 180 link equ name_#label_ ; 2. Update `link`. 181 db flags ; 3. Set the flags. 182 db name_length ; 4. Set the name length. 183 db name ; 5. Set the name. 184 185 align 8 ; 6. Add any padding we may need. 186 label_: 187 dq code_#label_ ; 7. Set the codeword. 188 section '.text' code readable executable 189 align 1 190 code_#label_: ; 8. This is where our assembly code will go. 191 } 192 193 macro defword name, name_length, flags, label_ 194 ; Define a forth word. 195 ; 196 { 197 section '.rodata' data readable 198 align 8 199 name_#label_: 200 dq link 201 link equ name_#label_ 202 db flags 203 db name_length 204 db name 205 206 align 8 207 label_: 208 dq DOCOL 209 } 210 211 ; You'll have noticed that the macro expects the `link` variable to hold the 212 ; address of the previous word. As EXIT is our first word, we'll need to 213 ; initialize `link` to 0. The `defcode` macro will take care of updating `link` 214 ; for us as we define new words. 215 ; 216 ; After we set `link`, we can finally define our first primitive word: 217 218 link dq 0 219 220 defcode "EXIT", 4, 0, EXIT 221 ; Return from a Forth word. 222 ; 223 pop_rs rsi ; 1. Load the next word's address back into RSI. 224 NEXT ; 2. Execute word pointed to by RSI. 225 226 ; It will be awhile before we get a chance to actually use EXIT, but it gives 227 ; us a taste of where we're heading. 228 ; 229 ;------------------------------------------------------------------------------ 230 ; 231 ; Section 2 - Executing this Program 232 ; 233 ; In this section, we'll build some infrastructure to enable us to run small 234 ; demos at the end of each section without having to re-write any code. 235 ; 236 ; The Entry Point 237 ; 238 ; At the beginning of this file, we defined the entry point to be `main`. That 239 ; means we need to define what `main` is before we can run anything. I decided 240 ; that `main` should do 4 things: 241 ; 242 ; 1. Initialize the UEFI interface. 243 ; 2. Clear the screen, so we have a blank canvas. 244 ; 3. Store the version string for this Forth onto the stack. 245 ; 4. Start the execution of Forth code. 246 ; 247 ; These first two steps are handled by function calls provided by the UEFI 248 ; interface. 249 ; 250 ; The version string, and its length, arae defined in the `data` section at the 251 ; bottom of this file. 252 ; 253 ; If you remember from the previous section, we can use the macro NEXT to start 254 ; the execution of a primitive word. The only thing NEXT expects is for the RSI 255 ; register to contain the address of a primitive word. 256 ; 257 ; If we pass an address corresponding to a sequence of words, 258 ; 259 ; We'll use the variable `program` to store the address of the w 260 ; 261 262 postpone { 263 section '.text' code executable readable 264 main: 265 cld 266 call sys_initialize 267 call sys_clear_screen 268 initialize_stack 269 mov rbp, return_stack_top 270 mov rsi, program 271 NEXT 272 } 273 274 macro set_program init_stack_macro, program_label { 275 macro initialize_stack \{ init_stack_macro \} 276 program equ program_label 277 } 278 279 ;------------------------------------------------------------------------------ 280 ; 281 ; Section 3 - Hello, World! 282 ; 283 ; In this section, we'll define a few more primitive words and print a message 284 ; to screen using them. 285 ; 286 ; Basic Output 287 ; 288 ; There are two main words for printing to the screen in Forth: EMIT and TYPE. 289 ; EMIT **emits** a single character to the screen, while TYPE outputs an entire 290 ; string to the screen. Unlike many other languages at the time, Forth does not 291 ; use null-terminated strings. Instead, the address and length of a string are 292 ; expected to travel as a pair. 293 ; 294 ; The implementations of these words are fairly simple, as I've defined the 295 ; `sys_print_string` function as part of the EFI abstraction and it does most 296 ; of the work for us. 297 ; 298 299 defcode "EMIT", 4, 0, EMIT 300 ; Print a character. 301 ; 302 ; The character is temporarily stored in the `.char_buffer` local variable. 303 ; `.char_buffer` is defined as a 1-byte variable in the data section at the 304 ; bottom of this file. 305 ; 306 pop rax 307 mov [.char_buffer], al 308 mov rcx, .char_buffer 309 mov rdx, 1 310 call sys_print_string 311 NEXT 312 313 defcode "TYPE", 4, 0, TYPE 314 ; Print an ASCII string. 315 ; 316 pop rdx 317 pop rcx 318 call sys_print_string 319 NEXT 320 321 section '.rodata' data readable 322 323 macro initialize_stack_s3 { 324 push version_string 325 push version_string.length 326 } 327 328 program_s3: 329 dq TYPE 330 331 set_program initialize_stack_s3, program_s3 332 333 ;------------------------------------------------------------------------------ 334 ; 335 ; Section 4 - Reading Words 336 ; 337 ; In this section, we'll define words for processing user input and even build 338 ; a simple interpreter. At this point, the interpreter will only be able to 339 ; execute words that we've already defined. We'll need to do a bit more work 340 ; before we can start defining new words in the interpreter. 341 ; 342 343 defcode "KEY", 3, 0, KEY 344 ; Read a character. 345 ; 346 call _KEY 347 push rax 348 NEXT 349 350 _KEY: 351 cmp [input_buffer], 0 352 jne .buffer 353 354 .stdin: 355 call sys_read_char 356 ret 357 358 .buffer: 359 cmp [input_buffer.len], 0 360 je .buffer_empty 361 mov rax, [input_buffer] 362 movzx rax, byte [rax] 363 inc [input_buffer] 364 dec [input_buffer.len] 365 ret 366 367 .buffer_empty: 368 mov [input_buffer], 0 369 jmp .stdin 370 371 372 defcode "WORD", 4, 0, WORD_ 373 ; Read a word. 374 ; 375 ; A word is considered to be a group of ASCII characters surrounded by white 376 ; space. 377 ; 378 ; WORD is implemented using a pattern that we'll use a lot: We'll define 379 ; a label '_WORD' that holds most of the implementation, aside from stack 380 ; manipulation. This lets us easily re-use the implementation in other 381 ; primitive words, we just have to make sure we're not overwriting important 382 ; registers when we do this. 383 ; 384 call _WORD 385 push rdi ; word name address 386 push rcx ; word name length 387 NEXT 388 389 _WORD: 390 ; Read a word. 391 ; 392 ; Returns: 393 ; RDI: word name address 394 ; RCX: word name length 395 ; 396 397 .skip_whitespace: 398 ; Read characters until we reach one that isn't white space. 399 ; 400 ; After a non-whitespace character is found, RDI is used to hold the address 401 ; of the next character to be stored. 402 ; 403 call _KEY 404 cmp al, ' ' 405 je .skip_whitespace 406 cmp al, 0xA 407 je .skip_whitespace 408 cmp al, 0xD 409 je .skip_whitespace 410 mov rdi, .buffer 411 412 .store_char: 413 ; Store a character into the buffer. 414 ; 415 ; There are some assembly instructions such as 'stos', 'lods', etc. that 416 ; perform string operations for us. These instructions require us to use 417 ; registers in specific ways, so it's always a good idea to double check what 418 ; an instruction does. 419 ; 420 ; In this instance, 'stosb' stores the byte in AL at address RDI, then 421 ; increments RDI to point at the next byte in the buffer. 422 ; 423 stosb 424 425 .next_char: 426 ; Keep processing characters until we find the end of the word. 427 ; 428 call _KEY 429 cmp al, ' ' 430 je .end 431 cmp al, 0xA 432 je .end 433 cmp al, 0xD 434 jne .store_char 435 436 .end: 437 ; Return buffer address in RDI and length in RCX. 438 ; 439 mov rcx, .buffer 440 sub rdi, rcx 441 xchg rdi, rcx 442 ret 443 444 445 446 defcode "FIND", 4, 0, FIND 447 ; Find a word in the dictionary. 448 ; 449 ; We search the dictionary starting from the end (the newest words) and move 450 ; towards the beginning (the oldest words). 451 ; 452 pop rcx ; word name length 453 pop rdi ; word name address 454 call _FIND 455 push rdx ; word address 456 NEXT 457 458 _FIND: 459 ; Find a word in the dictionary. 460 ; 461 ; Args: 462 ; RDI: word name address 463 ; RCX: word name length 464 ; 465 ; Returns: 466 ; RDX: word address 467 ; 468 mov rdx, [latest] 469 jmp .check_out_of_words 470 471 .next_word: 472 ; Move to the next word in the dictionary. 473 ; 474 mov rdx, [rdx] 475 476 .check_out_of_words: 477 ; Are we at the beginning of the dictionary? 478 ; 479 test rdx, rdx 480 je .end 481 482 .check_hidden: 483 ; Is the word we're looking at hidden? 484 ; 485 movzx rax, byte [rdx + 8] ; al = word flags 486 test al, FLAG_HIDDEN 487 jne .next_word 488 489 .check_length: 490 ; Is the word we're looking at the right length? 491 ; 492 movzx rax, byte [rdx + 9] ; al = word name length 493 cmp cl, al 494 jne .next_word 495 496 .check_names: 497 ; Do the words actually match? 498 ; 499 ; We use the 'cmpsb' instruction to compare the names of the words which is 500 ; another of the special string instructions mentioned earlier. 'cmpsb' 501 ; compares the strings referenced by RDI and RSI for up to RCX characters. 502 ; 503 ; The 'repe' prefix causes 'cmpsb' to be called repeatedly until RCX is 0 or a 504 ; difference between the strings is found. 505 ; 506 ; This means that RSI, RDI, and RCX will all be modified. Since we need the 507 ; original values of these registers after the loop, we have to make sure to 508 ; save them to the stack, and then restore them afterwards. 509 ; 510 push rsi 511 push rdi 512 push rcx 513 514 lea rsi, [rdx + 10] ; rsi = word name address of current entry 515 repe cmpsb 516 517 pop rcx 518 pop rdi 519 pop rsi 520 521 jne .next_word 522 523 .end: 524 ret 525 526 527 defcode ">CFA", 4, 0, TO_CFA 528 ; Convert the address of a word into it's code field address. 529 ; 530 ; The code field address, or CFA, is the memory address of the code that 531 ; actually implements this word. 532 ; 533 pop rdx 534 call _TO_CFA 535 push rdx 536 NEXT 537 538 _TO_CFA: 539 ; Convert the address of a word into it's code field address. 540 ; 541 ; The implementation of this word uses a common method for aligning an address 542 ; to the nearest 8-byte boundary. First, we add 8-1=7 to our address. This 543 ; ensures we're within the 8-byte region we want to end up in. Next we flip all 544 ; the bits for 7, then do a bit-by-bit AND with the current address to drop 545 ; down to the actual 8-byte boundary. It's weird the first time you see it, so 546 ; it can be useful to work it out on paper. 547 ; 548 ; Args: 549 ; RDX: word address 550 ; 551 ; Returns: 552 ; RDX: word CFA 553 ; 554 movzx rax, byte [rdx + 9] ; al = name length 555 lea rdx, [rdx + 10 + rax + 7] ; rdx > link, flags, length, and name 556 and rdx, 0xFFFFFFFFFFFFFFF8 ; rdx = code field address 557 ret 558 559 560 defcode ">NUMBER", 7, 0, TO_NUMBER 561 ; Convert a string into a number. 562 ; 563 ; The digits permitted in the string depend on the value of the `base` variable 564 ; defined at the bottom of this file. 565 ; 566 pop rcx ; string length 567 pop rdi ; string address 568 call _TO_NUMBER 569 push rbx ; parsed number 570 push rcx ; # of unparsed characters (0 => no error) 571 NEXT 572 573 _TO_NUMBER: 574 ; Convert a string into a number. 575 ; 576 ; The number is initialized as 0 and is parsed incrementally. As the string is 577 ; parsed from left to right, the existing number must be multipled by the 578 ; value of `base` before adding each new digit. We use the R8 register as a 579 ; sign flag, where 0 means positive and 1 means negative. 580 ; 581 ; Args: 582 ; RDI: string address 583 ; RCX: string length 584 ; 585 ; Returns: 586 ; RBX: parsed number 587 ; RCX: # of unparsed characters (0 => no error) 588 ; 589 xor rax, rax 590 xor rbx, rbx 591 xor r8, r8 592 push rsi 593 mov rsi, rdi 594 595 .check_empty_string: 596 ; Is the string empty? 597 ; 598 ; If so, we can't do any parsing and should give up. Otherwise, we initialize 599 ; RDX to hold the value of `base` and we start reading characters into RAX. 600 ; 601 test rcx, rcx 602 jz .handle_empty_string 603 604 movzx rdx, byte [BASE] 605 lodsb 606 607 .check_sign: 608 ; Does the string begin with '-'? 609 ; 610 cmp al, '-' 611 jnz .to_numeric_value 612 inc r8 613 dec rcx 614 lodsb 615 616 .check_negative_empty_string: 617 ; Is '-' the only character in the string? 618 ; 619 ; If so, we return the number 0 with 1 unparsed character. 620 ; 621 test rcx, rcx 622 jnz .to_numeric_value 623 624 mov rcx, 1 625 pop rsi 626 ret 627 628 .next_char: 629 ; Start parsing the next character. 630 ; 631 ; We also have to multiply the number by 'base' to make room for the next 632 ; digit. 633 ; 634 imul rbx, rdx 635 lodsb 636 637 .to_numeric_value: 638 ; Convert an ASCII character to its numeric value. 639 ; 640 ; The characters '0'-'9' are represented by the ASCII codes 48-57, so we can 641 ; convert them to the right value by subtracting 48. If the value is below 0 642 ; at that point, an invalid character was passed, and we should stop parsing. 643 ; If it's below 10, we're good to start comparing to the value of 'base'. For 644 ; values greater than 10, the only valid characters are 'A'-'F' which are 645 ; represented by the ASCII codes 65-70. Since we've already substracted 48, 646 ; we need to subtract 7 to line each character up with the value they 647 ; represent (A=10, ..., F=15). If the value is below 10, then we should stop 648 ; parsing as an invalid character was passed. Otherwise, we're good to start 649 ; comparing to the value of 'base'. 650 ; 651 sub rax, 48 652 jb .handle_sign 653 cmp rax, 10 654 jb .check_base 655 sub rax, 7 656 cmp rax, 10 657 jb .handle_sign 658 659 .check_base: 660 ; Check if the value is less than the value of 'base'. 661 ; 662 ; If the value is greater than 'base', we should stop parsing. Otherwise, we 663 ; can add the value to our number and start parsing the next character. 664 ; 665 cmp rax, rdx 666 jge .handle_sign 667 668 add rbx, rax 669 loop .next_char 670 671 .handle_sign: 672 ; Check if the sign flag was set, then negate the number. 673 ; 674 test r8, r8 675 jz .end 676 neg rbx 677 678 .end: 679 pop rsi 680 ret 681 682 .handle_empty_string: 683 ; If an empty string was detected, return -1 as an error code. 684 ; 685 mov rcx, -1 686 pop rsi 687 ret 688 689 690 defcode "INTERPRET", 9, 0, INTERPRET 691 call _WORD 692 call _FIND 693 694 test rdx, rdx 695 jz .try_number 696 697 mov rbx, rdx ; rbx = word address 698 call _TO_CFA ; rdx = word CFA 699 700 .check_state: 701 mov rax, [state] 702 test rax, rax 703 jz .execute_word 704 705 .check_immediate: 706 movzx rax, byte [rbx + 8] 707 test al, FLAG_IMMEDIATE 708 jnz .execute_word 709 mov rax, rdx 710 call _COMMA 711 NEXT 712 713 .execute_word: 714 mov rax, rdx 715 jmp qword [rax] 716 717 .try_number: 718 call _TO_NUMBER 719 test rcx, rcx 720 jnz .not_found 721 722 .check_state_number: 723 mov rax, [state] 724 test rax, rax 725 jz .execute_number 726 727 .compile_number: 728 mov rax, LIT 729 call _COMMA 730 mov rax, rbx 731 call _COMMA 732 NEXT 733 734 .execute_number: 735 push rbx 736 NEXT 737 738 .not_found: 739 mov rcx, unrecognized_msg 740 mov rdx, unrecognized_msg.length 741 call sys_print_string 742 NEXT 743 744 defcode "BRANCH", 6, 0, BRANCH 745 add rsi, [rsi] 746 NEXT 747 748 defword "QUIT", 4, 0, QUIT 749 dq INTERPRET 750 dq BRANCH, -16 751 752 macro initialize_stack_s4 { 753 push version_string 754 push version_string.length 755 } 756 757 program_s4: 758 dq TYPE 759 dq QUIT 760 761 set_program initialize_stack_s4, program_s4 762 763 ;------------------------------------------------------------------------------ 764 ; 765 ; Section 5 - Printing Numbers 766 ; 767 768 defcode ".", 1, 0, DOT 769 ; Print a signed integer. 770 ; 771 pop rax 772 movzx rcx, [BASE] 773 xor rbx, rbx 774 xor r8, r8 775 776 .check_negative: 777 ; if positive, start dividing to get digits 778 test rax, rax 779 jns .divide 780 781 ; if negative, negate and then start dividing to get digits 782 neg rax 783 inc r8 784 785 .divide: 786 xor rdx, rdx 787 div rcx 788 push rdx 789 inc rbx 790 test rax, rax 791 jnz .divide 792 793 .reverse_digits: 794 mov rcx, rbx 795 mov rdx, .buffer 796 797 .handle_negative: 798 test r8, r8 799 jz .next_digit 800 mov byte [rdx], 0x2D 801 inc rdx 802 inc rbx 803 804 .next_digit: 805 pop rax 806 add al, 48 807 cmp al, 58 808 jl .store_in_buffer 809 add al, 7 810 811 .store_in_buffer: 812 mov [rdx], al 813 inc rdx 814 dec rcx 815 jnz .next_digit 816 817 .end: 818 mov byte [rdx], 0xA 819 inc rbx 820 mov rcx, .buffer 821 mov rdx, rbx 822 call sys_print_string 823 NEXT 824 825 826 defcode "U.", 2, 0, U_DOT 827 ; Print an unsigned integer. 828 ; 829 pop rax 830 movzx rcx, [BASE] 831 xor rbx, rbx 832 833 .divide: 834 xor rdx, rdx 835 div rcx 836 push rdx 837 inc rbx 838 test rax, rax 839 jnz .divide 840 841 .reverse_digits: 842 mov rcx, rbx 843 mov rdx, .buffer 844 845 .next_digit: 846 pop rax 847 add al, 48 848 cmp al, 58 849 jl .store_in_buffer 850 add al, 7 851 852 .store_in_buffer: 853 mov [rdx], al 854 inc rdx 855 dec rcx 856 jnz .next_digit 857 858 .end: 859 mov byte [rdx], 0xA 860 inc rbx 861 mov rcx, .buffer 862 mov rdx, rbx 863 call sys_print_string 864 NEXT 865 866 macro initialize_stack_s5 { 867 push 0xBADC0DE 868 push -0xBAD 869 push version_string 870 push version_string.length 871 } 872 873 program_s5: 874 dq TYPE 875 dq DOT 876 dq U_DOT 877 878 ;set_program initialize_stack_s5, program_s5 879 880 ;------------------------------------------------------------------------------ 881 ; 882 ; Section 6 - Memory 883 ; 884 885 defcode "!", 1, 0, STORE_ 886 pop rbx 887 pop rax 888 mov [rbx], rax 889 NEXT 890 891 defcode "@", 1, 0, FETCH 892 pop rbx 893 mov rax, [rbx] 894 push rax 895 NEXT 896 897 defcode "C@", 2, 0, FETCH_CHAR 898 pop rbx 899 movzx rax, byte [rbx] 900 push rax 901 NEXT 902 903 ;------------------------------------------------------------------------------ 904 ; 905 ; Section 7 - Defining Words 906 ; 907 908 defcode "LATEST", 6, 0, LATEST 909 push latest 910 NEXT 911 912 defcode "HERE", 4, 0, HERE 913 push here 914 NEXT 915 916 defcode "CREATE", 6, 0, CREATE 917 pop rcx ; string length 918 pop rbx ; string address 919 call _CREATE 920 NEXT 921 922 _CREATE: 923 mov rdi, [here] ; address of new header 924 mov rax, [latest] ; address of last entry 925 stosq ; add link to last entry 926 mov [latest], rdi ; Update LATEST to point to this word 927 928 ; Store flags byte 929 xor rax, rax 930 stosb 931 932 ; Store length byte 933 mov rax, rcx 934 stosb 935 936 ; Store name 937 push rsi 938 mov rsi, rbx 939 rep movsb 940 pop rsi 941 942 ; Align to nearest 8-byte boundary 943 add rdi, 7 944 and rdi, 0xFFFFFFFFFFFFFFF8 945 946 ; Update LATEST and HERE 947 mov rax, [here] 948 mov [latest], rax 949 mov [here], rdi 950 ret 951 952 defcode ",", 1, 0, COMMA 953 pop rax 954 call _COMMA 955 NEXT 956 957 _COMMA: 958 mov rdi, [here] 959 stosq 960 mov [here], rdi 961 ret 962 963 defcode "C,", 2, 0, CHAR_COMMA 964 pop rax 965 mov rdi, [here] 966 stosb 967 mov [here], rdi 968 NEXT 969 970 defcode "LIT", 3, 0, LIT 971 lodsq 972 push rax 973 NEXT 974 975 defcode "[", 1, FLAG_IMMEDIATE, LEFT_BRACKET 976 mov [state], 0 977 NEXT 978 979 defcode "]", 1, 0, RIGHT_BRACKET 980 mov [state], 1 981 NEXT 982 983 defcode "STATE", 5, 0, STATE 984 push [state] 985 NEXT 986 987 defcode "HIDDEN", 6, 0, HIDDEN 988 pop rdi 989 add rdi, 8 990 mov rcx, FLAG_HIDDEN 991 xor [rdi], rcx 992 NEXT 993 994 defcode "IMMEDIATE", 9, FLAG_IMMEDIATE, IMMEDIATE 995 mov rdi, [latest] 996 add rdi, 8 997 xor byte [rdi], FLAG_IMMEDIATE 998 NEXT 999 1000 defcode "+", 1, 0, PLUS 1001 pop rbx 1002 pop rcx 1003 add rbx, rcx 1004 push rbx 1005 NEXT 1006 1007 defword ":", 1, 0, COLON 1008 dq WORD_ 1009 dq CREATE ; Create header 1010 dq LIT, DOCOL ; Put DOCOL on stack... 1011 dq COMMA ; ...then append to definition 1012 dq LATEST, FETCH, HIDDEN ; Set HIDDEN 1013 dq RIGHT_BRACKET ; Switch to COMPILE mode 1014 dq EXIT 1015 1016 defword ";", 1, FLAG_IMMEDIATE, SEMICOLON 1017 dq LIT, EXIT ; Put EXIT on stack... 1018 dq COMMA ; ...then append EXIT 1019 dq LATEST, FETCH, HIDDEN ; Unset HIDDEN 1020 dq LEFT_BRACKET ; Switch to IMMEDIATE mode 1021 dq EXIT 1022 1023 defcode "ALIGN", 5, 0, ALIGN_ 1024 mov rax, [here] 1025 add rax, 7 1026 and rax, 0xFFFFFFFFFFFFFFF8 1027 mov [here], rax 1028 NEXT 1029 1030 defcode "DUP", 3, 0, DUP_ 1031 pop rcx 1032 push rcx 1033 push rcx 1034 NEXT 1035 1036 defcode "DROP", 4, 0, DROP 1037 pop rcx 1038 NEXT 1039 1040 defword "CONSTANT", 8, 0, CONSTANT 1041 dq WORD_ 1042 dq CREATE 1043 dq LIT, DOCOL 1044 dq COMMA 1045 dq LIT, LIT 1046 dq COMMA 1047 dq COMMA 1048 dq LIT, EXIT 1049 dq COMMA 1050 dq EXIT 1051 1052 defword "FIELD", 5, 0, FIELD 1053 dq DUP_ 1054 dq CONSTANT 1055 dq EXIT 1056 1057 ;------------------------------------------------------------------------------ 1058 ; 1059 ; Section 8 - String operations 1060 ; 1061 1062 defcode 'S"', 2, 0, S_QUOTE 1063 mov rdx, .buffer 1064 mov [.length], 0 1065 1066 .next_char: 1067 call sys_read_char 1068 cmp al, '"' 1069 je .end 1070 1071 .store_char: 1072 mov byte [rdx], al 1073 inc rdx 1074 inc [.length] 1075 jmp .next_char 1076 1077 .end: 1078 push .buffer 1079 push [.length] 1080 NEXT 1081 1082 defcode 'S\"', 3, 0, S_BACKSLASH_QUOTE 1083 mov rdx, .buffer 1084 mov [.length], 0 1085 1086 .next_char: 1087 call sys_read_char 1088 cmp al, '"' 1089 je .end 1090 cmp al, '\' 1091 jne .store_char 1092 1093 .handle_escape: 1094 call sys_read_char 1095 cmp al, 'n' 1096 jne .next_char 1097 1098 mov al, 0xA 1099 1100 .store_char: 1101 mov byte [rdx], al 1102 inc rdx 1103 inc [.length] 1104 jmp .next_char 1105 1106 .end: 1107 push .buffer 1108 push [.length] 1109 NEXT 1110 1111 ;------------------------------------------------------------------------------ 1112 ; 1113 ; Section 9 - Misc. 1114 ; 1115 1116 defcode "WORDS", 5, 0, WORDS 1117 ; List all words in the dictionary. 1118 ; 1119 ; We search the dictionary starting from the end (the newest words) and move 1120 ; towards the beginning (the oldest words). 1121 ; 1122 mov rbx, [latest] 1123 jmp .print_word 1124 1125 .next_word: 1126 mov rbx, [rbx] 1127 1128 .check_out_of_words: 1129 test rbx, rbx 1130 je .end 1131 1132 .next_line: 1133 mov rdx, 1 1134 mov rcx, .newline 1135 call sys_print_string 1136 1137 .print_word: 1138 movzx rdx, byte [rbx + 9] ; al = word name length 1139 lea rcx, [rbx + 10] ; rcx = word name address of current entry 1140 call sys_print_string 1141 jmp .next_word 1142 1143 .end: 1144 NEXT 1145 1146 defcode "\", 1, 0, SLASH 1147 .skip_comment: 1148 call _KEY 1149 cmp al, 0xA 1150 je .end 1151 cmp al, 0xD 1152 je .end 1153 jmp .skip_comment 1154 .end: 1155 NEXT 1156 1157 defcode "(", 1, 0, LEFT_PAREN 1158 .skip_comment: 1159 call _KEY 1160 cmp al, 0xA 1161 je .end 1162 cmp al, 0xD 1163 je .end 1164 cmp al, ')' 1165 je .end 1166 jmp .skip_comment 1167 .end: 1168 NEXT 1169 1170 defcode "LOAD_BASE", 9, 0, LOAD_BASE 1171 mov [input_buffer], forth_base 1172 mov [input_buffer.len], forth_base.len 1173 NEXT 1174 1175 defcode "LOAD_EFI", 8, 0, LOAD_EFI 1176 mov [input_buffer], forth_efi 1177 mov [input_buffer.len], forth_efi.len 1178 NEXT 1179 1180 section '.data' data readable writable 1181 1182 version_string db 'soup forth v0.1', 0xA 1183 .length = $ - version_string 1184 1185 unrecognized_msg db 'unrecognized word', 0xA 1186 .length = $ - unrecognized_msg 1187 1188 input_buffer dq 0 1189 input_buffer.len dq 0 1190 1191 code_EMIT.char_buffer db ? 1192 1193 code_DOT.buffer rb 64 1194 code_U_DOT.buffer rb 64 1195 1196 _WORD.buffer rb 64 1197 1198 code_S_QUOTE.buffer rb 64 1199 code_S_QUOTE.length dq ? 1200 1201 code_S_BACKSLASH_QUOTE.buffer rb 64 1202 code_S_BACKSLASH_QUOTE.length dq ? 1203 1204 code_WORDS.newline db 0xA 1205 1206 BASE db 16 1207 1208 latest dq link 1209 1210 here dq here_top 1211 here_top rq 0x4000 1212 1213 state dq 0 1214 1215 FLAG_IMMEDIATE = 0x40 1216 FLAG_HIDDEN = 0x20 1217 1218 forth_base: 1219 file './base.forth' 1220 forth_base.len = $ - forth_base 1221 1222 forth_efi: 1223 file './efi.forth' 1224 forth_efi.len = $ - forth_efi 1225 1226 align 4096 1227 return_stack rb 8192 1228 return_stack_top: 1229