Jump to content
IGNORED

Geneve 9640 benchmarks


vol

Recommended Posts

It seems I have gotten very strange results.  I can't understand them. :(
I used simple Basic code which plots Mandelbrot fractals in ASCII.  The disk image with this program is attached.  It is easy to use it by OLD DSK#.MANDEL and then RUN.  The size of the plot is 12x12, it may be changed in lines 250 and 260.
Let me present the results:

TI99-4A                     - 145s
Geneve 9640 (GPL speed = 1) - 147s
Geneve 9640 (GPL speed = 2) - 108s
Geneve 9640 (GPL speed = 3) -  97s
Geneve 9640 (GPL speed = 4) -  88s
Geneve 9640 (GPL speed = 5) -  77s

I used XB (option D in the Geneve MDOS menu).  I also used the next command line to start MAME

mess geneve -peb:slot8 hfdc -peb:slot8:hfdc:h1 generic -hard1 mdos_gpl_util.hd -flop1 mandel.dsk

The hd image from ftp.whtech.com is used.

So how might this be?!  It is rather impossible for me.  Maybe the Geneve works at 3 Mhz in GPL mode?  Is it possible to switch it to 12 Mhz?  However even at the same frequency the TMS9995 must be faster than the TMS9900...  Maybe it is because MAME emulation is very inaccurate for the Geneve?...  So results from real hardware may help a lot.

I attached screenshots with the results.  Please help me figure out why I got so unexpected numbers.  Thank you.

BTW I have noticed that AUTOEXEC on the boot disk doesn't contain the TIMODE invocation...  I tried to add it but this doesn't help.

 

ti99-4a.png

gplspeed1.png

gplspeed2.png

gplspeed3.png

gplspeed4.png

gplspeed5.png

mandel.zip

Link to comment
Share on other sites

Ehm, no, the Geneve emulation is also extremely accurate in MAME, take my word for it.

 

I'm not sure what you actually expected. The GPL speed 1 is designed to decelerate the system so much that you get a similar speed as the original TI console, even a bit slower. Otherwise, you would not be able to play the old games.

 

And GPL speed 5 is fastest. As expected.

 

The Geneve is clearly not 4 times faster than the TI. The external speed is 3 MHz as with the TMS9900; the 12 MHz are only inside the CPU.

  • Like 3
Link to comment
Share on other sites

OK then, I ran MANDEL in Extended Basic/GPL mode on my real Geneve.

 

GPL speed 1: 148.47 sec

GPL speed 2: 108.02 sec

GPL speed 3: 97.26 sec

GPL speed 4: 88.78 sec

GPL speed 5: 78.08 sec

 

Those numbers look much better and match the emulated times quite well, I dare to say.

 

Tim, I just copied the files MANDEL and MANDEL40 from the disk image to my Geneve, no problem with loading. MANDEL40 does not run on my ABASIC4, "name not in table"; it seems to rely on GPL mode, and I guess ABASIC4 runs in native mode.

Edited by mizapf
  • Like 3
Link to comment
Share on other sites

9 minutes ago, mizapf said:

Tim, I just copied the files MANDEL and MANDEL40 from the disk image to my Geneve, no problem with loading. MANDEL40 does not run on my ABASIC4, "name not in table"; it seems to rely on GPL mode, and I guess ABASIC4 runs in native mode.

Sorry, my post was not clear: I am able to LOAD (via OLD) MANDEL and MANDEL40 within XB and ABASIC however, LISTing MANDEL in ABASIC crashes the interpreter and abruptly exits to the Geneve command line.  XB can LIST and RUN MANDEL but I cannot edit or duplicate line 164 and the other long CALL LOAD statements without decreasing the parameter count.  I was going to edit the program lines to create a copy of the program that would LIST and RUN in advanced basic. Alas, I've run out of time to play for the time being ;) 

 

  • Like 2
Link to comment
Share on other sites

 

3 hours ago, mizapf said:

Ehm, no, the Geneve emulation is also extremely accurate in MAME, take my word for it.

 

I'm not sure what you actually expected. The GPL speed 1 is designed to decelerate the system so much that you get a similar speed as the original TI console, even a bit slower. Otherwise, you would not be able to play the old games.

 

And GPL speed 5 is fastest. As expected.

 

The Geneve is clearly not 4 times faster than the TI. The external speed is 3 MHz as with the TMS9900; the 12 MHz are only inside the CPU.

You know that The MYARC 9640 User's Manual claims

Quote

The Speed can be set at any one of the GPL-speed values listed below:
1 - Normal TI-99/4A in Basic -Slowest
2 - The TI-99/4A in Extended Basic
3 - Approximately 2 times faster than the TI-99/4A.
4 - Approximately 3 times faster than the TI-99/4A.
5 - Approximately 3 1/4 times faster than the TI-99/4A.

So I expected at least 3x speed for GPL speed = 5 but we have only 2x.  Anyway an ML-program which is in scratchpad RAM must be much faster on the TMS9995@12MHz than on the TM9900@3MHz...  However my results with my implementation of pi-spigot show that we still have only 2x acceleration.

The computation of 1000 digits of the number π takes about 68s on the TI-99/4A and about 31s on the Geneva...  Both results are taken from MAME.

I attached a new disk image which may be used with standard Basic (E/A cartridge is required too) or with XB.  Load PIEA for the former case and PIXB for the latter.  This disk also contains the MANDEL program. 

3 hours ago, mizapf said:

OK, "very" accurate, not "extremely" accurate ... will have to check a bit further how to improve it. But good enough. The first picture is with GPL speed 5, the second is with GPL speed 1.

Thank you very much for these photos!  I feel like I have touched your hardware. :) Would you like to run PIXB or PIEA too? ;) Results for 100 and 3000 digits would be also very good for me to get.  However it seems that your system is European and this may cause an issue.  I assume that the raster interrupt frequency is 60 Hz but maybe your system uses 50 MHz.  This possible issue can be corrected in line 240.  You can change the divider here to a proper one.  BTW MANDEL timing calculation doesn't depend on PAL/NTSC frequency.  IMHO even if American and European models use the same 3 MHz for the CPU the PAL system should be slightly faster because the NTSC system spends more time handling raster interrupts.  Anyway it interesting to know is there a way how to distinguish PAL system from NTSC?

2 hours ago, InsaneMultitasker said:

I tried to load the program into Advanced BASIC, listing the program crashes the system.  I loaded the file into Extended BASIC, however, lines 164-166 cannot be edited or duplicate - XB reports "LINE TOO LONG".    How was this program entered?

These lines are just too long for the Basic editor but Basic interpreter can handle them.  I used cross-environment to enter this text.  However I have just updated the MANDEL program, I just shortened 2 long lines.  Please use a new disk image below.

1 hour ago, mizapf said:

OK then, I ran MANDEL in Extended Basic/GPL mode on my real Geneve.

 

GPL speed 1: 148.47 sec

GPL speed 2: 108.02 sec

GPL speed 3: 97.26 sec

GPL speed 4: 88.78 sec

GPL speed 5: 78.08 sec

 

Those numbers look much better and match the emulated times quite well, I dare to say.

 

Tim, I just copied the files MANDEL and MANDEL40 from the disk image to my Geneve, no problem with loading. MANDEL40 does not run on my ABASIC4, "name not in table"; it seems to rely on GPL mode, and I guess ABASIC4 runs in native mode.

Thank you very much again!  Your results are almost identical to MAME.  Maybe tiny difference presents because of PAL/NTSC difference.

MANDEL40 requires to load T40 at first.  It allows us to use 40 column screen mode - look here. :) 

47 minutes ago, InsaneMultitasker said:

Sorry, my post was not clear: I am able to LOAD (via OLD) MANDEL and MANDEL40 within XB and ABASIC however, LISTing MANDEL in ABASIC crashes the interpreter and abruptly exits to the Geneve command line.  XB can LIST and RUN MANDEL but I cannot edit or duplicate line 164 and the other long CALL LOAD statements without decreasing the parameter count.  I was going to edit the program lines to create a copy of the program that would LIST and RUN in advanced basic. Alas, I've run out of time to play for the time being ;) 

 

I'm not sure about ABasic I can't start this Basic from the mdos_gpl_util hard disk image. :( It is possible that this Basic may use the same scratchpad RAM locations as MANDEL.  But the updated version of MANDEL must work well under XB editor.

bm.zip

Edited by vol
  • Like 1
Link to comment
Share on other sites

1 minute ago, vol said:

Thank you very much again!  Your results are almost identical to MAME.  Maybe tiny difference presents because of PAL/NTSC difference.

One thing that is still inaccurate are the VDP wait states; I documented them on https://www.ninerpedia.org/wiki/Geneve_video_wait_states

The bad thing is that we recently got the PAL equations, and after putting that into my emulation, things became slightly worse.

 

Second, you always have tolerances for the clock, so you won't get the same numbers for different systems anyway. I suppose we can live with those small deltas.

 

If you are interested in technical details, have a look at Ninerpedia, e.g. https://www.ninerpedia.org/wiki/Geneve_GPL_Interpreter and other articles.

 

Also, as for the speed promises of the manual, I doubt that they ever ran some serious benchmarks.

 

One of my first projects with the Geneve was to write a Fractal (Mandelbrot) set generator for Graphics mode 6. I put as much time-critical stuff as possible into the on-chip memory, in particular a fixed point additions and multiplication. The program should be around here somewhere.

  • Like 2
Link to comment
Share on other sites

9 minutes ago, mizapf said:

One thing that is still inaccurate are the VDP wait states; I documented them on https://www.ninerpedia.org/wiki/Geneve_video_wait_states

The bad thing is that we recently got the PAL equations, and after putting that into my emulation, things became slightly worse.

 

Second, you always have tolerances for the clock, so you won't get the same numbers for different systems anyway. I suppose we can live with those small deltas.

 

If you are interested in technical details, have a look at Ninerpedia, e.g. https://www.ninerpedia.org/wiki/Geneve_GPL_Interpreter and other articles.

 

Also, as for the speed promises of the manual, I doubt that they ever ran some serious benchmarks.

 

One of my first projects with the Geneve was to write a Fractal (Mandelbrot) set generator for Graphics mode 6. I put as much time-critical stuff as possible into the on-chip memory, in particular a fixed point additions and multiplication. The program should be around here somewhere.

Thank you very much!  So I have to write a custom program for the Geneva which will use Scratchpad RAM at >F000...

  • Like 1
Link to comment
Share on other sites

Here it is. In the MAME startup line you have to add "-colorbus busmouse" to enable the mouse, also "-mouse" to have it captured by MAME (otherwise it may only function as long as your desktop pointer is above the MAME window). You will also have to set the mouse functions in the OSD menu ("Input (this machine)", mouse buttons and analog x and y). I recommend to use a new config directory (-cfg_directory cfgmouse) so that the settings won't be lost when you run the Geneve without mouse next time. MAME has an ugly habit to kill settings of devices that it does not find on the next run.

 

There is a selection German / English after the splash screen; if you see a yellow text cursor, the mouse is working. In the program, use the right mouse button for the menu (sorry, I had an Amiga back in those times).
 

 

fractals20.dsk

  • Like 5
Link to comment
Share on other sites

I could make custom variants of pi-spigot which could take advantage of fast RAM of the Geneve 9640.  I got the following results (in seconds) for computations of 1000 pi-digits.

               8300     F000     F200
TI-99/4A       69.1    123.8    123.8 
Geneve 9640    30.7     21.8     56.1

The first column shows timings for the TI-99/4A version which puts all code and data in scratchpad RAM.  The second column shows timings for the version which puts all code and data on the page at 0xF000 where the TMS9995 has its fast RAM.  The third column shows timings for the version which puts all code and data on the page at 0xF200 - it is the worst case for both system.  Any version can be used on either the TI-99/4A or Geneve 9640.  I used XB versions but versions for E/A can be used as well, the timings must be the same for XB or E/A.  The F000 and F200 versions use code which is about 5-10% slower than it could be.  It is because of fast RAM at 0xF000 which divides upper RAM block and there is not enough RAM to keep 3000 pi digits there.  The program therefore has to dynamically merge two memory segments and this has overheads.
The results show that for the best cases TMS9995@12MHz is about 3.3 times faster than the TMS9900@3MHz.  This is rather implausible for me.  The TMS9995 works without wait states when it uses its fast RAM so it must be at least 4 times faster.  Moreover according to datasheets the TMS9995 has better timings for instruction execution than the TMS9995, at least 2 times better.  So the TMS9995 must be at least 8 times faster than the TMS9900 but we got that it is only 3.3 times.  I got all results from MAME.  So I can again think that this emulator doesn't emulate the TMS9995 fast RAM properly.  Helping with real hardware can be very helpful in clarifying this situation.
The results also show that the Geneve SRAM at 0x8300 accelerates program execution about 2 times.  However I don't understand the idea behind it. :( If the TMS9995 works at 3 MHz with its external RAM then why does it use SRAM?  32 KB DRAM at 3 MHz was quite cheap in 1996...  Any hint?  BTW Is it possible to move SRAM from 0x0000-3FFF to 0xC000-FFFF?  This can slightly speed up the π computation.
All programs are available on the attached disk image.  XB program files have names PIXB, PIXBF0, PIXBF2.  E/A Basic program files have names PIEA, PIEAF0, PIEAF2.  BTW I fixed the timer routine so it is PAL/NTSC independent now.  The ASCII Mandelbrot programs are also on this image. 

bmx.zip

On 4/3/2021 at 9:11 PM, mizapf said:

Here it is. In the MAME startup line you have to add "-colorbus busmouse" to enable the mouse, also "-mouse" to have it captured by MAME (otherwise it may only function as long as your desktop pointer is above the MAME window). You will also have to set the mouse functions in the OSD menu ("Input (this machine)", mouse buttons and analog x and y). I recommend to use a new config directory (-cfg_directory cfgmouse) so that the settings won't be lost when you run the Geneve without mouse next time. MAME has an ugly habit to kill settings of devices that it does not find on the next run.

 

There is a selection German / English after the splash screen; if you see a yellow text cursor, the mouse is working. In the program, use the right mouse button for the menu (sorry, I had an Amiga back in those times).

Thank you very much for your disk image.  I have to confess I have not been able to get this program anywhere else.  I dug this forum and whtech-ftp.  Thanks also for help with a mouse configuration, it is not easy.  Eventually I could run Fractals! I have been impressed by it very much.  Thanks again.  I attached several beautiful screenshots I got with it.

0001.png

0002.png

  • Like 2
Link to comment
Share on other sites

As for Fractals, you should start by entering the following parameters (Parameter->Enter values):

 

xmin = -2.6

xmax = 0.9

ymin = -1.25

ymax = 1.25

Iterations = 100

 

Then transfer them to the picture data (Parameter->Transfer), let it paint (Picture->Global start), and after that you can use the mouse to pick an area (Parameter->Zoom window), possibly edit the parameters, and the transfer them again to let them be rendered as a full picture. You can stop the generation by pressing the left mouse button when the generation reaches the right screen edge.

 

You have two color modes (menu Miscellaneous); I prefer the "semimonotonous mode".

 

The Fractals file format saves the parameters in the file.

  • Like 2
Link to comment
Share on other sites

21 hours ago, mizapf said:

BTW, I'm away from my real hardware until Friday, so someone else with a real Geneve may want to run the Pi calculator and check out the times. Or - you'll have to wait. ?

It is nice to get some results from so interesting hardware.  BTW I have just checked a document GENEVE FAQ 05.19.2001 Compiled By Dan H. Eicher (here) - it claims that the TMS9995 uses 3 Mhz clock, not 12 Mhz!  This explains perfectly my results which shows that the TMS9995 is about 3.3 times faster than the TMS9900 at the same clock frequency which perfectly corresponds datasheets.  But why do we have these 12 Mhz in Wikipedia or MAME?  Myarc ads also claim 12 MHz... Maybe it is a value on a quartz which is divided by 4 to feed the CPU?

Edited by vol
Link to comment
Share on other sites

In case you have not found it yet ... https://www.ninerpedia.org/wiki/Geneve_wait_state_generation

 

(my own graphics; look at the upper left part)

 

The point is: The quarz crystal is 12 MHz, no doubt. The specifications for the 9995 say that you have to connect a clock with 10.7-12 MHz. Hence, this is expressed in this way in MAME and elsewhere. I believe that the 12 MHz do have an effect internally because the machine cycles have much more activity in the 9995 than in the 9900, in particular with the pipelining. That is, the microprograms may actually require the higher frequency, but externally, the 3 MHz are propagated.

 

See also: https://ftp.whtech.com/datasheets and manuals/Datasheets - TI/TMS9995.pdf

page 55 (as printed on the document; PDF page 59)

Edited by mizapf
  • Like 2
Link to comment
Share on other sites

Compare the instruction execution times of the TMS9900 with the TMS9995:

 

https://ftp.whtech.com/datasheets and manuals/Datasheets - TI/TMS9900_DataManual.pdf

(page 28; PDF page 30)

 

https://ftp.whtech.com/datasheets and manuals/Datasheets - TI/TMS9995.pdf

(page 48+; PDF pages 52+)

 

It is interesting to see that the TMS9995 instruction timing is calculated by the CLKOUT rate, not the CLKIN rate. As said above, CLKOUT = CLKIN/4.

 

But the speed-up is impressive. The A operation (add) takes 4 cycles on the TMS9995, but 14 cycles (plus memory access) on the TMS9900. One thing to consider is that the TMS9900 takes 2 clock cycles per machine cycle, so its speed is already halved. Apart from that, it is well understandable what happens in these 7 cycles:

 

https://ftp.whtech.com/datasheets and manuals/Datasheets - TI/9900-FamilySystemsDesign-1stEdition/9900-FamilySystemsDesign-04-Hardware Design.pdf

(page 4-92, PDF page 92)

 

I designed the MAME emulation of the 9900 and 9995 actually as microprogram interpreters, which was easier for the 9900 than for the 9995. Unfortunately, we don't have the corresponding microcode explanation for the 9995 (at least that I know of), so I had to guess. And I found it pretty challenging to let the CPU perform the same amount of operations in just 4 cycles. You'll reach the point where you have to assume that some things happen in parallel, or that two operations are done on the same cycle.

 

Thus, I suspect that the 12 MHz are not just divided by 4 right at the start, but that the microcode processor inside the 9995 needs these subcycles for its operation to achieve these 4 cycles for the A instruction.

 

I mean, I always tell the students that a mere comparison of clock rates is futile, and this is a good example.

  • Like 3
Link to comment
Share on other sites

17 hours ago, mizapf said:

In case you have not found it yet ... https://www.ninerpedia.org/wiki/Geneve_wait_state_generation

 

(my own graphics; look at the upper left part)

 

The point is: The quarz crystal is 12 MHz, no doubt. The specifications for the 9995 say that you have to connect a clock with 10.7-12 MHz. Hence, this is expressed in this way in MAME and elsewhere. I believe that the 12 MHz do have an effect internally because the machine cycles have much more activity in the 9995 than in the 9900, in particular with the pipelining. That is, the microprograms may actually require the higher frequency, but externally, the 3 MHz are propagated.

 

See also: https://ftp.whtech.com/datasheets and manuals/Datasheets - TI/TMS9995.pdf

page 55 (as printed on the document; PDF page 59)

Of course I know about the TMS9995 datasheet but I missed the Ninerpedia page.  Thank you.  But this page content rises questions.  The schematic there shows a clock divider which is located just on the entry of CLKIN signal.  As you have noticed, according to the TMS9995 datasheet all instruction timings depend only on CLKOUT.  So the internal fast memory definitely works at 3 MHz.  There is no 12 MHz in any official timing data relating to instruction timings.
However there is also a mystery for me about SRAM.  The BBC Micro, a 1982 computer, use 4 MHz DRAM.  So why doesn't the Geneve, a 1987 computer use 3 MHz DRAM?  I can suggest an answer.  Maybe 640 KB DRAM at 3 MHz was too expensive in 1986 and it was easier and cheaper then to add 32 KB of fast SRAM than 32 KB of fast DRAM?

  • Like 1
Link to comment
Share on other sites

9 hours ago, vol said:

Of course I know about the TMS9995 datasheet but I missed the Ninerpedia page.  Thank you.  But this page content rises questions.  The schematic there shows a clock divider which is located just on the entry of CLKIN signal.  As you have noticed, according to the TMS9995 datasheet all instruction timings depend only on CLKOUT.  So the internal fast memory definitely works at 3 MHz.  There is no 12 MHz in any official timing data relating to instruction timings.

 

May I point you at page 55 (PDF page 59) of the linked TMS9995 document that says "Crystal frequency MIN=8 NOM=12 MAX=12.1 UNIT=MHz"? You can surely find more information on that if you just browse that document.

 

The input frequency is 12 MHz on the Geneve; there is no need to discuss this, a simple look at the quarz crystal will tell you. The cycle speed is 333ns, as a result of dividing the 12 MHz by 4. I never said that the internal memory works at 12 MHz. I just wanted to explain above why I believe the TMS9995 has a built-in divider instead is just requiring the 3 MHz rate directly.

 

9 hours ago, vol said:

However there is also a mystery for me about SRAM.  The BBC Micro, a 1982 computer, use 4 MHz DRAM.  So why doesn't the Geneve, a 1987 computer use 3 MHz DRAM?  I can suggest an answer.  Maybe 640 KB DRAM at 3 MHz was too expensive in 1986 and it was easier and cheaper then to add 32 KB of fast SRAM than 32 KB of fast DRAM?

I am afraid the relevant people of Myarc who designed the Geneve are not available for asking.

 

I don't know the cycle times of the DRAM. The SRAM is usually much faster than DRAM; just consider that today's CPU caches are all SRAM. The SRAM in the Geneve is accessed at 0 wait states; I cannot say whether a 4 MHz DRAM can be driven at 0 wait states. This is the relevant question - do we need wait states or not? If yes, the SRAM is faster.

  • Like 3
Link to comment
Share on other sites

On 4/7/2021 at 3:18 AM, vol said:

Any hint?  BTW Is it possible to move SRAM from 0x0000-3FFF to 0xC000-FFFF?

It would technically be possible to map sram pages >e9, >eA, or >EB provided no other task is using them.  You would need to incorporate the mapping into your program.  Be advised page >03 is mapped into the >c000 space, sound and grom functionality is dependent upon this in /4a emulation mode.   Geneve GPL Interpreter - Ninerpedia makes reference to page >03 though I think it is partly incorrect.

 

image.thumb.png.fa4fba2917bb529fd5579ff5ac296a9f.png

 

Excerpt above is from the following linked document which may be of value to you.   Geneve programming dev notes

 

Edited by InsaneMultitasker
Updated incorrect statement regarding page >03
  • Like 1
Link to comment
Share on other sites

16 hours ago, mizapf said:

May I point you at page 55 (PDF page 59) of the linked TMS9995 document that says "Crystal frequency MIN=8 NOM=12 MAX=12.1 UNIT=MHz"? You can surely find more information on that if you just browse that document.

 

The input frequency is 12 MHz on the Geneve; there is no need to discuss this, a simple look at the quarz crystal will tell you. The cycle speed is 333ns, as a result of dividing the 12 MHz by 4. I never said that the internal memory works at 12 MHz. I just wanted to explain above why I believe the TMS9995 has a built-in divider instead is just requiring the 3 MHz rate directly.

 

I am afraid the relevant people of Myarc who designed the Geneve are not available for asking.

 

I don't know the cycle times of the DRAM. The SRAM is usually much faster than DRAM; just consider that today's CPU caches are all SRAM. The SRAM in the Geneve is accessed at 0 wait states; I cannot say whether a 4 MHz DRAM can be driven at 0 wait states. This is the relevant question - do we need wait states or not? If yes, the SRAM is faster.

This page is about input frequency but all instruction timings are based on CLKOUT frequency.

I can't agree that DRAM is just slower than SRAM.  It depends on their types.  SRAM is easier to connect and use but it is usually more costly.  So a small amount of SRAM in a system can be cheaper than the same amount of DRAM.  The classical example for this is the TI99/4A killer, the VIC-20 which uses SRAM, not DRAM.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...