Cool. I love that J1 processor.
For those who don't know about J1, imagine a subroutine call that takes 1 clock cycle and a return that takes 0 clocks.
One of my thoughts, although I don't have any knowledge of Verilog is that Forth CPUs would benefit from having a workspace register to
assist multitasking. The Forth stacks typically are in very fast on chip ram, but if you want to change tasks it can be awkward swapping the stacks (ie registers) in/out of conventional memory. So... if there were larger memory spaces available for a number of tasks and a workspace register, the chip could have fast context switching albeit for a finite number of tasks, which is typically ok for an embedded application.
You are probably one of the few people in the world who know about FPGA 9900 and J1. :-)
The J1 implements its stacks as two huge shift registers, where each shift operation is a shift by word length, typically 16 or 32 bits. The stacks are not deep, they're for the 16-bit version by default 15 deep for data stack and 17 for the return stack. So these stacks are implemented in the FPGA logic fabric, not in block memory. This also means that there are no stack pointers, at least for the J1A version. So you don't know how deep you're in the stacks... The source code for J1A is about 130 lines of Verilog. It is tiny. It is inspired by the Novic NC4016 to my understanding. The J1 is an awesome project, and it comes with Swapforth already implemented. The basic J1 system for the BlackIce takes 1072 logic cells, so about one eight of the total capacity.
It is not only that subroutine calls and pretty much every other instruction takes 1 clock cycle, you can also combine certain operations such as the subroutine return to it. Oh, and it runs at 48 MHz on the BlackIce-II. I did not try to optimize it.
I think I also ported it over to the Pepino board, as 32 bit version. Along the lines James had done his version for the Xilinx Spartan 6.
Is there a repository of your code for the J1->BlackIce project?
No but I guess I could set it up. I was playing with the Icestorm tools and used the J1 as the core to play with. I did not do much, my work amounted to merging the top level block from BlackIce examples with the J1. I tested it with both place-and-route tools: arcahne-pnr and the newer nextpnr. For the latter I had to study things a little to get the PLL done properly (the input clock is 100MHz, which the PLL takes to 48MHz).