StrangeCart

+TheBF · November 24, 2021

1 hour ago, mizapf said:

Something like this from my recent SCSI emulation - I use it several times in each file.


line_state whtscsi_pld_device::scsi_cs()
{
    return (busen()
        && ((m_board->m_address & 0x0fe0)==0x0fe0)
        && (((m_board->m_address & 0x1000)!=0) == m_bank_swapped)
        && (((m_board->m_address & 1)==0) || m_word_transfer))? ASSERT_LINE : CLEAR_LINE;
}

I think idea was pioneered in LISP. LISP - Cond Construct (tutorialspoint.com)

Does the C version generate better code than if else ?

+mizapf · November 24, 2021

Another good use for it is debugging output:


printf("Card is turned %s\n", m_selected? "on" : "off");

Can't live without it anymore. ?

speccery · November 25, 2021

21 hours ago, mizapf said:

Can't live without it anymore. ?

I use the ?: notation all the time, here is the implementation of SGN function in the Basic interpreter of the StrangeCart:

      case TOK_SGN:
        line++;
        expect(TOK_OPEN_PARENTHESIS);
        a = eval();
        expect(TOK_CLOSE_PARENTHESIS);
        return a < 0 ? -1 : (a ==0 ? 0 : 1);

I start to feel that I really need to add it to the Basic. Then for example factorials (the standard example of recursive functions) could be computed this way:

DEF FACT(N)= N>1 ? N*FACT(N-1) : 1

+TheBF · November 25, 2021

3 hours ago, speccery said:
I use the ?: notation all the time, here is the implementation of SGN function in the Basic interpreter of the StrangeCart:
      case TOK_SGN:
        line++;
        expect(TOK_OPEN_PARENTHESIS);
        a = eval();
        expect(TOK_CLOSE_PARENTHESIS);
        return a < 0 ? -1 : (a ==0 ? 0 : 1);
I start to feel that I really need to add it to the Basic. Then for example factorials (the standard example of recursive functions) could be computed this way:
DEF FACT(N)= N>1 ? N*FACT(N-1) : 1

I realize now that this is sometimes called the "Elvis" operator. Think emoticon with hair ?:-)

So I am still curious is it just a syntax convenience or does it generate different code?

Tursi · November 26, 2021

1 hour ago, TheBF said:

I realize now that this is sometimes called the "Elvis" operator. Think emoticon with hair ?:-)

So I am still curious is it just a syntax convenience or does it generate different code?

The intermediate code is similar, so it optimizes down the same way. If you think about the assembly language, there's not really too many ways to do "use A or B depending on C"

I've had trouble using it with heavily overloaded templates in C++, though, it seems the preprocessor can get confused about the type of operation.

speccery · November 26, 2021

6 hours ago, Tursi said:

The intermediate code is similar, so it optimizes down the same way. If you think about the assembly language, there's not really too many ways to do "use A or B depending on C"

That's correct when you talk about compilers, but when talking about an interpreter, the speed is pretty much defined by how much fluff the interpreter has to work through. More compact expressions will mean more speed. In this specific example of recursive functions the interpreter can stay in expression evaluation code path, it does not need to proceed to the next line or something like that, so it should be faster. Of course a TI Basic function can call another function, which could recurse back to the first function, so you may need to change interpretation pointer location but you would not need to search for lines as you would if using GOTO for instance.

Having said that, I really want to make another version of the interpreter which would actually be a compiler. Probably not targeting ARM nor TMS9900 assembler in the first step, but instead some intermediate code. Need to try to stay focused on the interpreter first, painful as it is to see how much runtime processing it has to do validate stuff which a compiler only would do once. Maybe I need to split the interpreter into two passes first, do syntax checking first as the TI Basic is doing and then rely more on correctness when interpreting. But even then there a ton of stuff the interpreter needs to check when executing.

6 hours ago, Tursi said:

I've had trouble using it with heavily overloaded templates in C++, though, it seems the preprocessor can get confused about the type of operation.

As a small stupid comment regarding C++, my understanding is that templates are handled by the C++ compiler, not by the preprocessor. The type system is of course intimately involved when resolving templates, and I believe the type system is pretty much completely done within the compiler.

Tursi · November 26, 2021

13 hours ago, speccery said:

That's correct when you talk about compilers, but when talking about an interpreter, the speed is pretty much defined by how much fluff the interpreter has to work through.

Interpreters don't generate code, so that's not the question I was answering.

As a small stupid comment regarding C++, my understanding is that templates are handled by the C++ compiler, not by the preprocessor.

Yes, even I make mistakes.

speccery · November 30, 2021

In the aftermath of last weekend's pandemic call @SteveB released code to his Basic game Rescuer.

That was interesting test material, as it uses BREAK statements in the code, as well as the VAL function. Testing it resulted in a few evenings worth of coding to get it running, as I was also in the process of adding DEF.

Supporting of BREAK is now added, as is VAL, although the implementation of VAL is still limited to numbers only, as that's what the game uses. I also added UNBREAK. Support of BREAK and UNBREAK needed the addition of an augment linenumber table, which is only partially in use, but once fully in use will speed up interpretation.

I added full support of DEF. This is actually somewhat hairy, since there are many different cases:

DEF PI=4*ATN(1) : no argument given
DEF SQ(X)=X*X : numeric argument numeric function
DEF N$(X$)=SEG$(X$,2,100) & SEG$(X$,1,1) : string argument and string function
DEF B$(X)=">"&STR$(X+21)&"<" : numeric argument and string result
DEF AK(A$)=POS(A$,"X",1) : string argument, numeric result

User defined functions are in my opinion a bit weird since they can refer to normal (global) variables too.

I think I still have a special case which does not work properly, since I implemented arguments so that the argument variable is temporarily defined and inserted into the front of the symbol search chain. Thus it will be found before global variables, but if the definition calls another function which is expecting to refer to a global variable, it will actually find the variable defined by the first function like so:

X=3 (global)

DEF F(Y)=Y*X

DEF G(X)=X+F(X)

PRINT G(2)

Here the print should give 2+2*3, but in my implementation it will give 2+2*2, since the function F(Y) will find G's X before the global X. I didn't test this, but that's the way it's written.

I have also added LET, which was trivial. The updated list of missing stuff is thus shorter:

GO and SUB (GOTO and GOSUB are there, but GO TO and GO SUB not yet, none of the existing programs I've tested use them)
CALL SOUND is decoded but sound is not working yet
DELETE (token 0x99) is not supported, and it does not need to supported by TI BASIC either?
OPEN, CLOSE, DISPLAY, EOF, REC, VARIABLE, RELATIVE, INTERNAL, SEQUENTIAL, OUTPUT UPDATE, APPEND, FIOXED, PERMANENT, # not done yet
VAL only supports numeric strings as arguments, the tokenizer is very limited at the moment
CALL KEY is only partially done, as is INPUT. CALL KEY currently only supports one of the scanning modes, and INPUT is really stupid in that it only waits for a single character

RXB · November 30, 2021

3 hours ago, speccery said:

In the aftermath of last weekend's pandemic call @SteveB released code to his Basic game Rescuer.

That was interesting test material, as it uses BREAK statements in the code, as well as the VAL function. Testing it resulted in a few evenings worth of coding to get it running, as I was also in the process of adding DEF.

Supporting of BREAK is now added, as is VAL, although the implementation of VAL is still limited to numbers only, as that's what the game uses. I also added UNBREAK. Support of BREAK and UNBREAK needed the addition of an augment linenumber table, which is only partially in use, but once fully in use will speed up interpretation.

I added full support of DEF. This is actually somewhat hairy, since there are many different cases:

DEF PI=4*ATN(1) : no argument given

DEF SQ(X)=X*X : numeric argument numeric function

DEF N$(X$)=SEG$(X$,2,100) & SEG$(X$,1,1) : string argument and string function

DEF B$(X)=">"&STR$(X+21)&"<" : numeric argument and string result

DEF AK(A$)=POS(A$,"X",1) : string argument, numeric result

User defined functions are in my opinion a bit weird since they can refer to normal (global) variables too.

I think I still have a special case which does not work properly, since I implemented arguments so that the argument variable is temporarily defined and inserted into the front of the symbol search chain. Thus it will be found before global variables, but if the definition calls another function which is expecting to refer to a global variable, it will actually find the variable defined by the first function like so:

X=3 (global)

DEF F(Y)=Y*X

DEF G(X)=X+F(X)

PRINT G(2)

Here the print should give 2+2*3, but in my implementation it will give 2+2*2, since the function F(Y) will find G's X before the global X. I didn't test this, but that's the way it's written.

I have also added LET, which was trivial. The updated list of missing stuff is thus shorter:

GO and SUB (GOTO and GOSUB are there, but GO TO and GO SUB not yet, none of the existing programs I've tested use them)

CALL SOUND is decoded but sound is not working yet

DELETE (token 0x99) is not supported, and it does not need to supported by TI BASIC either?

OPEN, CLOSE, DISPLAY, EOF, REC, VARIABLE, RELATIVE, INTERNAL, SEQUENTIAL, OUTPUT UPDATE, APPEND, FIOXED, PERMANENT, # not done yet

VAL only supports numeric strings as arguments, the tokenizer is very limited at the moment

CALL KEY is only partially done, as is INPUT. CALL KEY currently only supports one of the scanning modes, and INPUT is really stupid in that it only waits for a single character

Take a look at RXB CALL KEY routines they are similar to XB 3 or SXB but they do not have CALL ONKEY either.

CALL ONKEY is like ON KEY GOTO built into a single command.

Example: 200 CALL ONKEY("yYnN",5,K,S) GOTO 300,300,400,400 :: GOTO 200

If y or Y goto to 300, if n or N then goto 400, if no key pressed goto 200

CALL ONKEY is pass thru like normal CALL KEY in XB, but RXB has both pass thru or single key.

Example: 200 CALL KEY("0123456789",5,K,S)

Program will only advance if a number key is pressed.

RXB also has CALL JOYMOTION and JOYLOCATE that both are for JOYST MOTION or LOCATE built into your joystick actions.

SteveB · November 30, 2021

4 hours ago, speccery said:

In the aftermath of last weekend's pandemic call @SteveB released code to his Basic game Rescuer.

That was interesting test material, as it uses BREAK statements in the code, as well as the VAL function. Testing it resulted in a few evenings worth of coding to get it running, as I was also in the process of adding DEF.

My 14 year old me is not available to be asked, why the hell he used BREAK back then. And twice! Pointing to a REM ...

But my present me is glad that the program was of help for the Strange Cart Project.

Steve

speccery · November 30, 2021

32 minutes ago, SteveB said:

My 14 year old me is not available to be asked, why the hell he used BREAK back then. And twice! Pointing to a REM ...

But my present me is glad that the program was of help for the Strange Cart Project.

Steve

I am looking forward to playing this game! I have added a new feature, CALL VSYNC a couple of weeks ago, which will allow me to slow down the game to a reasonable level when running on strangecart. Currently there’s just some flickering and the fuel runs out of in a matter of 10 seconds or something like that.

I like the look of the game. Impressive for TI Basic. it requires a joystick. I don’t have one, or actually I just remembered that I have built a TI compatible thing with arcade buttons a few years ago, need to try with it. Before that need to check if CALL JOYST actually goes somewhere ?

speccery · December 6, 2021

A few more features added while working on Rescuer to get it working:

Joystick support, quite simple addition. The TMS9900 just does a few CRU accesses, and the ARM handles the rest.
Scanning keyboard in the other modes. Since I'm using ROM keyboard scanning routine, this was also a simple modification.
Some other clean up of the code.

Rescuer now works with joystick on the StrangeCart, although completely unplayable since everything happens so fast. Need to modify the game next a little with delays in the right places.

With Escape99 a delay loop of 10000 iterations per game loop did the trick, but since I now have CALL VSYNC that's a much better way to slow down games.

kl99 · December 19, 2021

Sounds like yet more improvements. Very nice
You mentioned that now you need to introduce delays at the right places.

I wonder if you can automize that step by adding an optional delay based on an input value you get for it when doing the CALL RUN command.

If set, you delay the execution between each LINE by the provided delay time.

Having a generic option like this would allow you to test many more TI Basic Programs without having to think on each where delays are needed.

This might not be needed in the final interpreter but might save you a lot of time while tweaking it for compatibility.

Have some nice christmas holidays

speccery · January 31, 2022

Almost two months without an update! That doesn't mean I have not worked on the StrangeCart. I have now what I believe is working audio.

I also started to wonder how fast the StrangeCart could run TMS9900 code through emulation. That project got a life of its own, as I wrote an emulator for the whole TI. Or almost whole TI, it currently only supports most of the computer, enough to run TI Invaders. Or TI Basic, but without a keyboard in the devices pictured below this is a bit moot.

The reason I went for my own emulator was to create something that can run with a relatively slow CPU and limited RAM. The two devices pictures below are from Pimoroni, their 32blit console running STM32H750 CPU (a quite fast processor) and the smaller one is their PicoSystem console, sporting the Raspberry Pi RP2040 chip. The RP2040 is Raspberry Pi's own silicon, with a Cortex M0 core. This is the lowest end Cortex M core there is, although on the PicoSystem the firmware runs it at 250MHz so it isn't exactly slow. The RP2040 has about 250k of RAM and the emulator runs in that. I don't think other emulators can run in that small memory footprint. Actually the framebuffer occupies 115K (the LCD is 240x240 at 16 bpp).

Pimoroni's 32blit SDK is nice in that it can target both of these embedded devices, but also Linux and macOS. I guess it also supports Windows. All from the same codebase. So I developed the code on my Mac, and then test on the Mac and if things work I can very quickly cross compile and run it on these two devices.

While working on those, I have not yet been able to answer how fast the StrangeCart could run TMS9900 code, as I went from CPU emulation to other hardware emulation. I need to port this code to the StrangeCart, to be able to find out what I was trying to learn...

Code for this emulator is here on GitHub.

SteveB · January 31, 2022

This looks fantastic!

I always wanted to have a TI on my Nintendo DS just like I have an Atari 800XL emulator ... The advantage of the NDS is the lower touch-screen, where you can show a keyboard ...

speccery · January 31, 2022

50 minutes ago, SteveB said:

This looks fantastic!

I always wanted to have a TI on my Nintendo DS just like I have an Atari 800XL emulator ... The advantage of the NDS is the lower touch-screen, where you can show a keyboard ...

Thanks - how do you get the Nintendo DS to run Atari 800XL? Is it a special cartridge of some sort? I have DS as well, just wanting to be upgraded to a 16-bit TI...

SteveB · January 31, 2022

I have a DS One card for running homebrew on my old NDS, for 3DS there is even a software solution http://smealum.github.io/3ds/

I have the PokeyDS Emulator and it is fun to play the old games from my disk-images.

Edited January 31, 2022 by SteveB

Tursi · January 31, 2022

3 hours ago, speccery said:

I also started to wonder how fast the StrangeCart could run TMS9900 code through emulation. That project got a life of its own, as I wrote an emulator for the whole TI. Or almost whole TI, it currently only supports most of the computer, enough to run TI Invaders. Or TI Basic, but without a keyboard in the devices pictured below this is a bit moot.

That what's I've been waiting for.

Once it's ported, you can then load XB (or RXB which was the target I was hoping for), and run it on the cartridge to provide a Turbo Extended BASIC

speccery · February 1, 2022

17 hours ago, Tursi said:

That what's I've been waiting for.

Once it's ported, you can then load XB (or RXB which was the target I was hoping for), and run it on the cartridge to provide a Turbo Extended BASIC

Yes, that would be cool. I really need to run the benchmark to get an idea what the performance would be. Once ported, performance can be optimised by either writing in native code some parts of the ROMs or compiling from TMS9900 machine code to Cortex M4 machine code.

Emulating the TI on the TI is interesting in that in order to keep performance good, the ARM really needs to minimise transactions requiring the TMS9900 to do anything. For example, a copy of VDP memory needs to be kept in ARMs internal memory, and only relevant writes to the VDP should be forwarded to the actual VDP.

Edited February 1, 2022 by speccery

Tursi · February 1, 2022

8 hours ago, speccery said:

Yes, that would be cool. I really need to run the benchmark to get an idea what the performance would be. Once ported, performance can be optimised by either writing in native code some parts of the ROMs or compiling from TMS9900 machine code to Cortex M4 machine code.

Emulating the TI on the TI is interesting in that in order to keep performance good, the ARM really needs to minimise transactions requiring the TMS9900 to do anything. For example, a copy of VDP memory needs to be kept in ARMs internal memory, and only relevant writes to the VDP should be forwarded to the actual VDP.

I've sketched out dozens of ideas for that over the last decade or so... caching schemes, double buffering, dumb terminal mode... that's why I'm excited you're getting closer, so I don't have to!

I foresaw keeping the VDP updated with an output fifo - and double buffering the VRAM inside the emulation so you don't have to wait for it. (So, write to emulated VRAM, and cue up a write to real hardware to happen in parallel at whatever rate can be maintained). Reads can then only worry about the emulated VRAM. In order to support both 32k and AMS memory, I felt that CPU RAM reads and writes would go to real hardware. Yes, this slows things down, but the acceleration of all the GROM interpretation should more than make up for it. VDP status and DSRs need to be real hardware access, and assembly language calls need some special consideration to get in and out. But it should work.

oddemann · February 4, 2022

Was reading about speech synth, your concept could fit a modern speech synth box.

Using a ARM CPU and then integrate all of the speech synth possibility's in one box. I was also thinking about PixelPedant's YT about text-to-speech synth song maker. With modern parts, all this could be integrated into a modern speech synth and it might be a great upgrade. Not saying you should, but more a question, "Would your use of a modern ARM CPU, fit this kind of a project?"

Looking forward to more progress on your StrangeCart!

speccery · February 5, 2022

Well now I know the answer to how quickly the StrangeCart and emulate the TI-99/4A: 3.5MHz CPU equivalent. Barely faster than the real iron.

Obviously this was not acceptable, way too slow. After a few hours of optimisation it's now at 8.8MHz which is more acceptable. This requires some explanation.

I have been running the StrangeCart ARM cores at 96MHz lately, since that's very stable. But I did spent some months ago quite a bit of time to make it run at 150MHz. That does work, but performance gain was very small, so I reverted back to 96MHz. I knew the problem is that the number of wait states of the MCU's flash accesses becomes ridiculously high at that maximum clock speed: 7 wait states per access. The LPC54114 microcontroller I use does not have a cache for on-chip flash, so code execution is at the mercy of the flash access speed. The slowness of the on-chip flash is partially mitigated by what they call a "flash accelerator", which is a hardware structure reading flash memory over an on-chip 128-bit wide bus to a set of eight 128-bit wide buffers, and the ARM Cortex-M4 core reads data from there. Anyway, that thing and the flash access becomes saturated nearby 90 MHz and thus higher clock speeds only increase speed marginally when running from Flash.

The obvious solution is to run time sensitive code from RAM instead of Flash. So I played with linker control files and memory map so that I can now easily move selected functions to RAM. This requires that the ARM code is compiled to run from RAM but the code is actually initially located in Flash, and the C/C++ runtime start function then copies that code to RAM.

I have tried that before, but the gains seemed small. However, with the TMS9900 CPU emulator there are a few really hot spots (think memory read and write) and moving them to RAM for execution helped a lot, and I now have nearly linear acceleration for the emulated TMS9900 from 96MHz to 150MHz.

The additional complication is that the LPC54114 has several bus masters: the Cortex-M4 core, Cortex-M0 core and a 20-channel DMA controller. Further, the M4 has several buses to memory, and can simultaneously read instructions and data. To support this, the LPC54114 has a nice bus crossbar matrix, and it can support many memory accesses per cycle as long as they are to different memory ports. The on-chip 192K RAM is divided to 4 blocks which can be accessed concurrently. I have already in the past partitioned things in a way that the Cortex-M0 runs from RAM, using a block rarely used by the M4 so that those two do not fight for memory bandwidth too much. Now I figured a better partitioning to put the time critical blocks of Cortex M4 code to RAM, and that did boost performance.

I also refactored the C++ code a bit. I added more jump tables, analysed the assembler code generated by the C++ compiler, and noticed that using static member functions enable much faster code. It does prevent instantiating multiple TMS9900 cores, but I don't care about that for now. The code generated by the C++ compiler is very good indeed, using hand optimised assembly would not increase performance much (unless DSP instructions could be used, but that's another story, not a good fit here).

It is interesting that with current code the 150MHz ARM core only manages to emulate the TI at around 3x original speed. I am sure there is still plenty of ways to optimize performance more, as always with software. There are things that are surprisingly involved such as computing the TMS9900 flags for each instruction.

Anyway, with this level of performance interesting things can already be done. Currently I am just running a benchmark, i.e. booting the TI-99/4A inside the StrangeCart and emulating the computer for 2 seconds, that ends up being 17,699,820 TMS9900 clock cycles. The output of the emulated TI is not going anywhere, the results would need to be copied to the actual hardware. Of course the real way to speed up would to actually write some hot spots in native ARM code instead of emulating the TMS9900, for example the GPL interpreter. That should increase performance by several orders of magnitude for those parts of code.

+TheBF · February 5, 2022

This gives me even more admiration for Apple's Rosetta emulation.

They seem to be able to emulate Intel on ARM with at most a 2X slowdown.

And they did it way back emulating Power PC on Intel.

Makes me wonder how.

I know the 9900 is not like normal machines but it seems like it is a real challenge for ARM to emulate based on your results.

I wonder if Apple is using something like a huge jump table (because they have the RAM) so that instructions compute the address of the emulation code. ??

Way above my pay grade.

speccery · February 5, 2022

1 hour ago, TheBF said:

This gives me even more admiration for Apple's Rosetta emulation.

They seem to be able to emulate Intel on ARM with at most a 2X slowdown.

And they did it way back emulating Power PC on Intel.

Makes me wonder how.

I know the 9900 is not like normal machines but it seems like it is a real challenge for ARM to emulate based on your results.

I wonder if Apple is using something like a huge jump table (because they have the RAM) so that instructions compute the address of the emulation code. ??

Way above my pay grade.

Well a few things:

They have spent a bit more investment on that work than a few hours hobby time here and there
The M1 chips apparently have hardware acceleration for the translation
They do not interpret the machine code, at least the original Rosetta was doing JIT compilation and interpretation both. A huge jump table can help if interpreting, but on processors with cache memories you may end up with inefficient cache usage patterns.

On an embedded processor with quite minimal RAM and program memory it is not practical to do all the possible things. I have thought about a compiler to compile on the fly the TMS9900 code to ARM machine code, at least for the most often used constructs. But that quickly runs out of RAM. I am also already using 210K of the 256K of Flash, although I could easily free up on-chip Flash since I have the external serial Flash available. For instance GROM stuff could all be there. The compilation would work so that the TMS9900 code would be decompiled into an intermediate format, think tree presentations of what the instructions actually do. The intermediate trees could then be compiled to whatever runtime architecture, for example Forth . Even a very simple compiler generating bad ARM code would be hugely faster than the current interpretation approach.

PeteE · February 5, 2022

2 hours ago, speccery said:

There are things that are surprisingly involved such as computing the TMS9900 flags for each instruction.

One trick I saw in Tursi's and Rasmus' emulators is to use a status flags lookup table for comparison to zero for every possible word or byte value, if you have enough RAM to spare to store the tables. Another untested idea I have is instructions store a pair of values for each status flag that they modify, and later lazily compare the values when the flag is needed. The latter would use less RAM, and could possibly use vector intrinsic like __mm_shuffle_epi32 to store multiple value pairs at once.

StrangeCart

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members