Jump to content
IGNORED

Erik's ET-PEB


speccery

Recommended Posts

fbForth 2.0 was 1 m 44 s and, with defining BOUNDS with ALC, went down to 1 m 2 s. But, the point of benchmarks is to use code defined in exactly the same way for all platforms—so, yes, our high-level Forths are a little slow on the TMS9900. Even TurboForth 1.2 was 1 m 39 s, which surprised me, actually.

 

...lee

  • Like 1
Link to comment
Share on other sites

2 minutes ago, Lee Stewart said:

fbForth 2.0 was 1 m 44 s and, with defining BOUNDS with ALC, went down to 1 m 2 s. But, the point of benchmarks is to use code defined in exactly the same way for all platforms—so, yes, our high-level Forths are a little slow on the TMS9900. Even TurboForth 1.2 was 1 m 39 s, which surprised me, actually.

 

...lee

Very true. 

Forth is an odd duck in the world of compilers where you and I are the optimizers. :-)

When I saw over + swap it was an obvious substitution. 

But I drew the bronze medal on this one. ;) 

 

 

  • Like 3
Link to comment
Share on other sites

24 minutes ago, Lee Stewart said:

fbForth 2.0 was 1 m 44 s and, with defining BOUNDS with ALC, went down to 1 m 2 s. But, the point of benchmarks is to use code defined in exactly the same way for all platforms—so, yes, our high-level Forths are a little slow on the TMS9900. Even TurboForth 1.2 was 1 m 39 s, which surprised me, actually.

 

...lee

I'm also fascinated by the differences between the three implementations.  FbForth benefits from simpler DO LOOP computations and spins fast. (I think)

TF benefits from all those primitives in 16 bit RAM, but I believe Mark is stacking 3 items on the R stack to for do loops so that's a little overhead.

Brad's standard Camel Forth keeps the loop values in registers which I didn't do in order to have them for multi-tasking.  Camel Forth uses a leave stack in the do loops that is hi level Forth so that's a slow down as well. Something I could optimize perhaps.  It's always remarkable how every decision has pluses and minuses and on a processor like 9900 they really make big differences.

 

 

  • Like 3
Link to comment
Share on other sites

I found the work I did two years ago, when I ported the J1A to the BlackIce-II board. This is a cheap FPGA board, sporting the low end Lattice ICE40HX4K FPGA. I have used the Icestorm toolchain for synthesis, with the nextpnr place and route tool. I am running J1 at 48MHz. According to the timing analysis, the J1 could run at 73MHz at this low end FPGA. In case you're interested, the memory space contains just 8K of on-chip FPGA block RAM (half of the RAM on this small FPGA). The core uses 1080 LUTs out of the 7680 available (Icestorm enables unlocking the chip, Lattice 4K FPGA actually has 8K LUTs on board).

 

@TheBF and @Lee Stewart should find this interesting... I am quite blown away myself.

 

The benchmark results are amazing: 100 iterations of the Fib2-bench completed in 13.5 seconds... So one iteration takes 0.14 seconds. I couldn't time it without a loop. The BENCHME word was on the benchmark page. The FIND word was new to me, and it did not seem to work right. But the old fashioned tick did the trick. This is 1000 times the performance of TurboForth (well not quite, just over 700 times). On a single core taking a fraction of a low end FPGA.

' FIB2-BENCH 100 BENCHME .................................................................................................... 100 Iterations.  ok

And below it the output of "see fib2". (I am trying to use the spoiler thing for the first time, let's see if it works).

Spoiler

see fib2 
13DE 8000       $0000 
13E0 8001       $0001 
13E2 40CA       C 0194 rot
13E4 8000       $0000 
13E6 445A       C 08B4 
13E8 6127       alu: 0127 
13EA 451D       C 0A3A over
13EC 44EB       C 09D6 +
13EE 450F       C 0A1E swap
13F0 6B1D       alu: 0B1D 
13F2 4486       C 090C 
13F4 29F4       Z 13E8 
13F6 43FA       C 07F4 
13F8 0518       J 0A30 drop
13FA 13D6       J 27AC 
13FC 460A       C 0C14 
13FE 4249       C 0492 
1400 2D32       Z 1A64 
1402 4542       C 0A84 
1404 434E       C 069C 
1406 FF48       $7F48 
1408 83E8       $03E8 
140A 8000       $0000 
140C 445A       C 08B4  ok

 

That's what you get when your machine code is Forth.

I did a simple benchmark run, as below, running 100 million iterations. This took just under 14 seconds. 7 million iterations per second.

: thousand 1000 0 do loop ;  ok
: million 1000 0 do thousand loop ;  ok
million  ok
' million 100 benchme .................................................................................................... 100 Iterations.  ok

And the code:

Spoiler

see thousand 
142C 83E8       $03E8 
142E 8000       $0000 
1430 445A       C 08B4 
1432 6127       alu: 0127 
1434 6B1D       alu: 0B1D 
1436 4486       C 090C 
1438 2A19       Z 1432 
143A 43FA       C 07F4 
143C 608C       alu: 008C 
143E 1420       J 2840 
1440 6D07       alu: 0D07 
1442 6C69       alu: 0C69 
1444 696C       alu: 096C 
1446 6E6F       alu: 0E6F 
1448 83E8       $03E8 
144A 8000       $0000 
144C 445A       C 08B4 
144E 6127       alu: 0127 
1450 4A16       C 142C thousand
1452 6B1D       alu: 0B1D 
1454 4486       C 090C 
1456 2A27       Z 144E 
1458 43FA       C 07F4 
145A 608C       alu: 008C  ok

 

 

Edited by speccery
  • Like 2
Link to comment
Share on other sites

Wow that's amazing!  

 

I bought a little RISC V nano to see what all the fuss is about with RISC V. 

I get the part that it is an open-source instruction set for everyone to use but to call it "RISC" compared to NOVIX or J1 is just wrong.  :) 

I do like the fact that RISC V  has 32 registers, however what would be wrong with a 32 bit J1 and 64 element stacks. That would work pretty well too I suspect.

 

Thanks for this info.  I am now beginning to drool at the thoughts of getting one running here.

 

BTW I used to try this test on any Forth I found in the old days.

: thousand 1000 0 do loop ;  ok
: million 1000 0 do thousand loop ;  ok

I remember trying it on an Amiga with dragon Forth in a store. The kid in the store who was used to BASIC said "That's going to take awhile"

For comparison I distinctly remember it took 19 seconds because he had barely finished speaking when the "ok" prompt popped onto the screen. :) 

 

So J1 did 100 of those in 14 seconds!  And the Amiga was clocking at about what 25Mhz? So only 1/2 of J1 clock speed.

Totally cool.

  • Like 1
Link to comment
Share on other sites

On 2/15/2021 at 11:03 PM, TheBF said:

Wow that's amazing!  

 

I bought a little RISC V nano to see what all the fuss is about with RISC V. 

I get the part that it is an open-source instruction set for everyone to use but to call it "RISC" compared to NOVIX or J1 is just wrong.  :) 

I do like the fact that RISC V  has 32 registers, however what would be wrong with a 32 bit J1 and 64 element stacks. That would work pretty well too I suspect.

 

Thanks for this info.  I am now beginning to drool at the thoughts of getting one running here.

 

BTW I used to try this test on any Forth I found in the old days.


: thousand 1000 0 do loop ;  ok
: million 1000 0 do thousand loop ;  ok

I remember trying it on an Amiga with dragon Forth in a store. The kid in the store who was used to BASIC said "That's going to take awhile"

For comparison I distinctly remember it took 19 seconds because he had barely finished speaking when the "ok" prompt popped onto the screen. :) 

 

So J1 did 100 of those in 14 seconds!  And the Amiga was clocking at about what 25Mhz? So only 1/2 of J1 clock speed.

Totally cool.

That's weird - so you were testing with the same (or similar) benchmark :) Perhaps it's not so surprising, it's a simple thing to do to bring a computer to its knees...

I am aware of the RISC V development but haven't really dived in. Is the nano you mention a development board of some sort?

 

I haven't updated it recently, but I also have the StrangeCart thread discussing my proof of concept project of by just hooking a microcontroller to the TI, and connecting to the bus with just software. The microcontroller has two cores, the other core is busy monitoring the TI bus but the other more powerful core could run Forth for instance. At 100 or 150 MHz it would give a decent performance. I am actually thinking of doing another iteration of that board, to also include FPGA chip. 

 

So let me ask you a question: would you be interested in a Forth accelerator for the TI? Either on the module port or expansion bus. The module port is easier since the no connector is needed. If that was another version of the StrangeCart it would be on the module port, and could be a pretty simple board, just what I have already done plus FPGA running J1.

 

I was looking at my notes, I noticed that I have actually tested J1B on the Papilio Duo board. It was two years ago. I also located that project, but haven't tried it again. It is a 32 bit version of the J1, running at 80 MHz.

Link to comment
Share on other sites

2 hours ago, speccery said:

That's weird - so you were testing with the same (or similar) benchmark :) Perhaps it's not so surprising, it's a simple thing to do to bring a computer to its knees...

I am aware of the RISC V development but haven't really dived in. Is the nano you mention a development board of some sort?

 

I haven't updated it recently, but I also have the StrangeCart thread discussing my proof of concept project of by just hooking a microcontroller to the TI, and connecting to the bus with just software. The microcontroller has two cores, the other core is busy monitoring the TI bus but the other more powerful core could run Forth for instance. At 100 or 150 MHz it would give a decent performance. I am actually thinking of doing another iteration of that board, to also include FPGA chip. 

 

So let me ask you a question: would you be interested in a Forth accelerator for the TI? Either on the module port or expansion bus. The module port is easier since the no connector is needed. If that was another version of the StrangeCart it would be on the module port, and could be a pretty simple board, just what I have already done plus FPGA running J1.

 

I was looking at my notes, I noticed that I have actually tested J1B on the Papilio Duo board. It was two years ago. I also located that project, but haven't tried it again. It is a 32 bit version of the J1, running at 80 MHz.

Whenever I would say "great minds think alike"  my late mother would say "Or fools seldom differ". 

Not sure which one applies to our benchmark.  :) 

 

The longdan Nano is a RISC V board from China for $4.99 USD.  128K flash 32K RAM.  I need to get some time to write a program for it.

Sipeed Longan Nano - RISC-V GD32VF103CBT6 Development Board - Seeed Studio

 

The module port accelerator sounds interesting.  It a multi-processor TI-99!

If you build one I would enjoy making a compiler for it.

 

I am interested in that 32bit J1 on Papillio Duo.  Is it published somewhere?

How hard is it to move the verilog from Duo to Pro? Simple, difficult or don't bother trying.

 

 

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, TheBF said:

Whenever I would say "great minds think alike"  my late mother would say "Or fools seldom differ". 

Not sure which one applies to our benchmark.  :)

 

The longdan Nano is a RISC V board from China for $4.99 USD.  128K flash 32K RAM.  I need to get some time to write a program for it.

Sipeed Longan Nano - RISC-V GD32VF103CBT6 Development Board - Seeed Studio

 

The module port accelerator sounds interesting.  It a multi-processor TI-99!

If you build one I would enjoy making a compiler for it.

 

I am interested in that 32bit J1 on Papillio Duo.  Is it published somewhere?

How hard is it to move the verilog from Duo to Pro? Simple, difficult or don't bother trying.

https://github.com/jamesbowman/swapforth is the original creator of swapforth and the J1. The J1B is the 32 bit version. Actually now looking at this, James targeted the Duo. I thought I did a port, but apparently not. I have ported it to the BlackIce-II board, but that's not a Xilinx project. Take a look at the PDF document in https://github.com/jamesbowman/swapforth/tree/master/j1b/doc, James has done a very good job at documenting it.

 

I took a quick look at the top level verilog file, Xilinx-top.v. It appears that if WANT_VGA is not defined, the design does not use the external SRAM. In that case it's pretty much only using the on chip resources of the FPGA itself. Porting it to the Pro would be simple then, it's mostly a matter of making a user constraint file for the Pro board and adjusting the top level verilog code to match the Pro board. If the external clock frequency is the same as with the duo, then there's no need to even modify the PLL settings. I assume the Pro also has FTDI chip to provide USB serial port.

 

So the steps would be something like:

1. Install Xilinx tools (I use ISE 14.7)

2. Find any existing demo project for the Pro board. Run something like "rebuild all" for the demo project to verify that the toolchain is working.

3. Load the bitstream generated in step 2 to the Pro board and check that the board does something.

 

It's easy to write the steps above, but 1 & 2 will take a bit of time and patience, and learning curve. After that you would be able to actually port the project:

 

4. Use the papilio Duo project as basis.

5. Replace the UCF file to match the Pro board (probably the Pro's default UCF file is good to go). The same file used in step 2 should be good.

6. Modify the top-level verilog file. If memory serves me right the Pro has a SDRAM chip. You'd want to set its control signals to a known static state, to keep it the SDRAM silent and not having floating control signals. The same goes for any other unused peripherals on the board. A basic J1 environment really doesn't need much more than serial port (TXD and RXD), in coming clock signal and reset. The remaining pins can be left as inputs, or as outputs driving 0 or 1 as appropriate to any unused peripherals on the board to keep them out of the way.

7. Synthesise the modified design. Same process as in step 2. 

8. Fix bugs to the point that the synthesis works. Basically rinse and repeat step 6&7 as long as it takes :)

9. Load the generated bitstream, hope it works, and enjoy. If not, it's back to step 6 :)

 

I can also try to make a port. It shouldn't take more than an hour, could be much less, but since I have no way to test it... could be a bit frustrating and really shooting in the dark. I don't know when I could try this, hopefully next week as I am away from home the weekend.

Link to comment
Share on other sites

Tak 

This is a a great start.

I found port for the Pro on github but the finished image (or whatever it's called) does not seem to give me a prompt.

I will have to dig into it and follow your instructions.

I had the Xilinx tools on another machine that died so I need to get that again.

 

So much code so little time... :) 

 

 

  • Like 1
Link to comment
Share on other sites

9 hours ago, TheBF said:

Tak 

This is a a great start.

I found port for the Pro on github but the finished image (or whatever it's called) does not seem to give me a prompt.

I will have to dig into it and follow your instructions.

I had the Xilinx tools on another machine that died so I need to get that again.

 

So much code so little time... :) 

I actually live in Finland - I assume the "Tak" in the beginning refers to "Tack" which would be thank you in Swedish :)

The Finnish word is weirder, its is "Kiitos". I used to be fluent in Swedish too, but nowadays everybody is using English...

 

After sending I realised that you probably are already familiar with the Xilinx tools, since you've got the board and all. Xilinx is providing a VirtualBox image installer for Windows to support ISE 14.7. That's what I was mostly using, although I now have a ISE 14.7 installation on my Windows 10 box. It requires changing a DLL to run properly, the DLL comes with ISE 14.7. I don't know what operating system you're using, so again this might not be relevant for you.

  • Like 1
Link to comment
Share on other sites

4 hours ago, speccery said:

I actually live in Finland - I assume the "Tak" in the beginning refers to "Tack" which would be thank you in Swedish :)

The Finnish word is weirder, its is "Kiitos". I used to be fluent in Swedish too, but nowadays everybody is using English...

 

After sending I realised that you probably are already familiar with the Xilinx tools, since you've got the board and all. Xilinx is providing a VirtualBox image installer for Windows to support ISE 14.7. That's what I was mostly using, although I now have a ISE 14.7 installation on my Windows 10 box. It requires changing a DLL to run properly, the DLL comes with ISE 14.7. I don't know what operating system you're using, so again this might not be relevant for you.

FWIW, "Tak" is Thanks in Danish :)

  • Like 2
Link to comment
Share on other sites

5 hours ago, speccery said:

I actually live in Finland - I assume the "Tak" in the beginning refers to "Tack" which would be thank you in Swedish :)

The Finnish word is weirder, its is "Kiitos". I used to be fluent in Swedish too, but nowadays everybody is using English...

 

After sending I realised that you probably are already familiar with the Xilinx tools, since you've got the board and all. Xilinx is providing a VirtualBox image installer for Windows to support ISE 14.7. That's what I was mostly using, although I now have a ISE 14.7 installation on my Windows 10 box. It requires changing a DLL to run properly, the DLL comes with ISE 14.7. I don't know what operating system you're using, so again this might not be relevant for you.

"Tak" was because for some reason I thought you were in Denmark.

 

All good on Xilinx tools. I am feeling my way along with this FPGA stuff.

Ok on Finland. Yes all Finnish words are unusual to the rest of the European languages. It's in class by itself with Hungarian and Estonian (maybe?).

My step grand-father was from Finland. He told me interesting stories about Soviet era Russia, but that's way off topic. 

 

  • Like 1
Link to comment
Share on other sites

2 hours ago, mizapf said:

I think the Estonians and Finnish people almost understand each other (yksi, kaksi, kolme vs. üks, kaks, kolm (1,2,3)). Maybe similar to Dutch vs. German.

Off topic but I always found it fascinating how these things are not bi-directional.  Portuguese speakers can understand Spanish but not the reverse. Dutch speakers seem to get more German than Germans hear in Dutch.

For those crazy people like me here is a series of videos where people all speak different related languages a try to understand each other.  Now I will stop commenting on this.  :) 

(8) Ecolinguist - YouTube

  • Like 1
Link to comment
Share on other sites

Here's an obscure data point:

 

FIB2 running on Wycove Forth 3.0 (an extended FigForth) on a 16-bit console. This version of Forth benefits significantly from the memory upgrade. (Benchmark exactly as published):

 

1' 17"

 

For me the relevant comparisons (of the feverish retrocomputing variety) are with contemporaries of the TI:

 

C64 Forth64:   3’ 50”

C64 Durex Forth   1’ 57”

 

Apple II v 3.2    3' 56"

Apple II GraForth    2' 19"

 

Z80 4Mhz FigForth  1’19”

Edited by Reciprocating Bill
Apple II numbers are OK
  • Like 6
Link to comment
Share on other sites

Interesting. Gotta love that 16bit buss.

 

And just to to show how much time is wasted in the Forth indirect threaded interpreter ( 3 instructions on 9900) here is a version that removes some of that overhead.

I have an utility that converts literal numbers and code words into "super-instructions" that are compiled in Low RAM as machine code and the super-instructions replace the original code in the Forth word.

 


NEEDS INLINE[ FROM DSK1.INLINE

DECIMAL
: FIB3
  INLINE[ 0 1 ROT 0 ]
  DO
   INLINE[ OVER + SWAP ]
  LOOP
  DROP ;

: FIB3-BENCH 1000 0 DO
    I FIB3 DROP
  LOOP ;

1:46 is the regular Forth speed on my system. 

1:15.6  with some interpreter cycles removed.

 

fib2 with inline code.png

  • Like 4
Link to comment
Share on other sites

  • 3 weeks later...

 I contracted COVID some time ago, still fighting the disease. It’s now day 9. I think this is the mild version (so far) but with so many days on this, it is a bit consuming. At least I got my appetite back a couple of days I ago, hopefully it’s a sign of something. But the fever keeps me unable to

do much, even if it isn’t too high.
 

The only TI related thing I have done is that I received this week a hard copy of @Lee Stewart’s fbForth manual which I ordered sometime ago. I haven’t read it yet, but having a real manual is nice and very exceptional.

  • Like 9
  • Thanks 1
  • Sad 1
Link to comment
Share on other sites

Thanks for the comments and your thoughts - now finally I am feeling better and believe recovery has started. Still fragile, but I feel much better than just a few days ago. Probably full recovery will take several weeks minimum. So happy to be able to exit the haze which feels to have merged the past two weeks into one never ending day of fever and aches.

  • Like 3
  • Thanks 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...