-
Content Count
145 -
Joined
-
Last visited
Posts posted by vol
-
-
I have a problem with output redirection.
I don't understand why my programs do not redirect their output to a file when I use >-sign. Standard utilities work quite well, for instance dir >file works fine. But when I type pi-st >file it doesn't work. It just creates an empty file.
I use standard GEMDOS functions for screen output in my programs... GEMDOS functions directly correspond MS-DOS functions which are always easy to redirect using the >-sign. Please help me with this issue. Thank you.
-
On 4/26/2021 at 9:11 PM, moulinaie said:Hi,
I have written a similar program for the 68030 some years ago.
It doesn't display anything on screen, just compute and then save the digits into a file.
It displays the calculation time, the saving time and total time.
You'll find it here, it is SUPERPI.TOS :
https://gtello.pagesperso-orange.fr/kronos_soft.htm
You'll be able to compare the results of my page with what HATARI gives you (the results given are for 16384 digits)
Guillaume.
Thank you very much. Your programs and results are very interesting. Your code uses a different algorithm. The pi-spigot is easier but of course slower. IMHO the pi-spigot is the easiest algo for the number pi computation. So it is quite good for the implementation on any system, even 8-bit.
It allows to use 16-bit numbers that often simplifies calculations but that also limits the number of digits to 9400. My implementation of pi-spigot prints digits on screen because it takes usually less than 1% of overall time for 3000 digits. Some my table entries have results for cases when output was redirected to a file which gives results similar to yours. BTW I started my project in 2015, 3 years after you finished yours.
Hatari (the TT without TTRAM) executes SP16K for 33.58 seconds, your table (the TT without TTRAM) contains 249.05 seconds - so this emu is very-very inaccurate.
Your table shows that TTRAM makes a system about 10% faster but it depends on an algo used. So it is still interesting to get results from your system with TTRAM disabled. Indeed it will be very interesting to get results from some system you listed in your tables.
Do you have any explanation why the Atari Stacy PAK68 / 3 utilizes its 68030 less efficiently that the TT? Is it possible to download your program for the x86?
-
It is quite easy to prove that the TMS9995 uses 12 MHz internally. Just let's analyse its timings. Its memory access cycle is 1 clock - this matches the 6502, ARM and 80486. The TMS9995 is a CISC processor and a CISC processor can't have such timings in 1980. So it is quite definite that actually the TMS9995 uses 4 clocks for a memory access - the same amount as the 8086, and 68000. Indeed it is not the absolute proof but it is quite firm.
Maybe it is more correct to regard that the [email protected] is actually working at 12 MHz too? This is also Stuart's point. -
On 4/11/2021 at 4:46 AM, Ksarul said:Actually, in Assembly, you can easily have 40K of RAM available to you, as long as you are using a Supercart. The TI 32K card gives you 24K in the high memory area and 8K in the low memory area. The supercart gives you an additional 8K in the cartridge space. Note that a lot of the 16K of VDP memory is also usable for variable storage, so you can actually have a lot of usable space with the TI. If you are using a SAMS card, it allows you to swap pages in and out of the 32K memory space and provides 1M of space in most configurations, and up to 4M on the newest boards. On the Geneve 9640, you always have a lot of space available to you. . .as it has a minimum of 512K of RAM.
Sorry, it is impossible to port Xlife-8 to the TI99/4A. Besides the memory issue which is possible to overcome we also have problems with graphics. Xlife-8 requires at least 4 free colors. Indeed, the Geneve 9640 is ok. So maybe one day I will try to port Xlife-8 to it. It is a rare platform but very unusual and interesting.
-
I have improved PI-ST30 code. Now it is actually faster than code for PI-ST. But the speed difference is almost invisible. IMHO it is impossible to get a visible advantage from the 68020/30 additional instructions in this case.
The new programs are attached. Code for the PI-ST was not changed.
37 minutes ago, ParanoidLittleMan said:Blitter field below Print Screen will appear only when there is blitter in machine, or it is activated in emulator.
And min TOS 1.02 .
Thank you. How could I miss it?!
However I can't notice any speed difference for the pi-calculator.
-
2 hours ago, ParanoidLittleMan said:And couple suggestions: should not do any screen write, that can make little difference between blitter off and blitter on - and text print on screen goes via blitter if it is activated in Desktop Options.
Sorry I can't still find how can I activate blitter? I checked Set preferences... and Desktop configuration... and found nothing about blitter there.
-
4 hours ago, moulinaie said:Hello,
For your last question: you can increase a bit the speed on a TT by setting flags to force it to load, run and allocate from TT-RAM (a faster ram only usable by the CPU and that is not accessed by the shifter or DMA soudn etc...). There is a CPX for that.
Else, I made some tests for 3000 digits on my machines and they are suprising...: (I set the flags on the TT!)
TT 32MHz + TTRam
With PI-ST.TOS : 15,30 sec
With PI-ST30.TOS: 17,64 sec
The 030 version is slower than the standard one...
MegaSTE 16MHz:
With PI-ST.TOS : 70,66 sec.
STE 8MHz:
With PI-ST.TOS : 139,32 sec
Have you got the source code for the TT version? I'd like to understand why can it be slower than the ST version...
Thank you very much for your results. I always had doubts that my pi-spigot implementation is slower for the 68020/30 than for the 68000 but all emulators I use (FS-UAE, Hatari) show that the 68020/30 version is faster.
For Hatari it is even much faster. A man reported me results from their Amiga 1200 but I thought he just confounded programs.
Your results show that Hatari is quite accurate for the Mega STE: the emu is only about 2% faster. For the ST/STE it is rather cycle exact, the difference is only about 0.1%! For the TT, Hatari shows approximately 35% faster speed than real hardware on PI-ST, and 186% (!) faster on PI-ST30. It seems that DIVUL takes much less cycles in the emu than in hardware. Would you like to run PI-ST without TTRAM? It is very interesting to know the difference. It would also be good to get a screenshot.
The 68000 and 68020/30 versions are in the same file. I use conditional assembly. Check __VASM variable. The only difference between versions is 32-bit division implementation. This is 68020/30 codedivul.l d4,d7:d6 move d7,(a3)And this is its equivalent for the 68000 (it is faster)
moveq.l #0,d7 divu d4,d6 bvc .div32no\@ swap d6 move d6,d7 divu d4,d7 swap d7 move d7,d6 swap d6 divu d4,d6 .div32no\@ move d6,d7 clr d6 swap d6 move d6,(a3)The code for the 68000 is faster because the branch to .div32no\@ is taken almost always in the pi-spigot algo.
1 hour ago, ParanoidLittleMan said:I got practically same results with Mega STE as moulinaie .
Considering file sizes, it is coded in ASM, I guess.
And couple suggestions: should not do any screen write, that can make little difference between blitter off and blitter on - and text print on screen goes via blitter if it is activated in Desktop Options.
There should be stop before exit, so can read result of test. I needed to look in code to realize that it at all self measures time, because it just immediately returns to Desktop and deletes screen. Sorry, not everyone using MagiC or some TOS/Desktop extension. And maybe add new line before time show, "seconds" after it 🙂
Thank you. Sorry, I thought command line interface is more common on TOS. I have just made my programs more user friendly, they wait a key press before they finish now. The updated programs are attached and my git repo is also updated.
-
15 hours ago, ParanoidLittleMan said:Of course, you can inc. CPU performance in different ways, but CPU clock must then be more than 8 MHz.
Yes, in case of Falcon it matters in which video mode you run tests like this.
Thank you. Would your like to provide some hints for me? How can I increase the CPU performance?
I run tests under EmuCON and IMHO it uses a quite simple video mode which doesn't slow down the CPU.
7 hours ago, DarkLord said:Okay, there is definitely something screwy going on.
I have a Pak 68/3 equipped Atari STacy. The Pak is an accelerator
board, 68030, running at 40mhz on my STacy, with a 40mhz FPU
(68882). It's TOS v3.06 with 4 megs of ST-RAM (no FAST/TT/ALT
RAM).
A few questions first... Your program did not give me a result. It
simply dropped back to the desktop when done. I timed it using
a stopwatch to get results. I'm not sure how totally accurate that
is.
Does your software support a math chip? (FPU). That could
make a huge difference.
Anyway, running it here on the setup above, I timed it at 16.5
seconds.
I don't see any possible way a stock Falcon could beat that score.
I also find it hard to believe that a 32mhz TT beat my score.
Using Chris Swinson's GemBench 6, my STacy is registered as
faster than a stock Falcon or TT030.
Hope this helps.
Thank you very much! Your results confirm my point that Hatari is very inaccurate for timings of top Atari models. Please run the pi calculator from EmuCON or another command line interface: the program prints digits of the number pi and then it prints the time of calculation. The calculator code uses only integers, no FPU is required. What program shows 16.5 s, PI-ST or PI-ST30?
Your board is a very interesting one for me. It would be great if you could provide me with results for 100, 1000, and 3000 digits. I can put them in my research table then.
-
21 hours ago, Stuart said:Thank you very much for the so interesting table! It is also very interesting for me what is the difference between the first two items? What do these PROM/EPROM mean?
-
I have run my number π calculator on various models of Atari. The results are seconds required to compute 3000 digits of this number. The calculator program for TOS is attached. It has two variants:
1) pure 68000 code which works on every models - PI-ST.TOS;
2) 68030 code which requires the 68020 or later the 68k processor - PI-ST30.TOS.
I use Hatari v1.4 emu. Results for the Amiga 1200 were taken from real hardware, results for the Amiga 500 are taken from FS-UAE 3.0.5.68000 68020/30 ST [email protected] 139.55 - Mega STE [email protected] 68.95 - TT [email protected] 11.36 6.17 Falcon [email protected] 22.90 12.47 Amiga 500 [email protected] 167.00 - Amiga 500 Fast RAM 164.00 - Amiga 1200 [email protected] 43.00 37.00 Amiga 1200 Fast Ram n/d 34.00Some results for the Atari look rather implausible for me. It is slightly odd that the Mega STE shows more than two times faster speed. IMHO it must be exactly two times faster. Maybe it is because the Mega STE has cache which is not affected by video?
However the main issue is the 68030 code results. They look completely wrong for me. How can the Falcon be almost 3 times faster than the Amiga 1200?! I am sure that there is a great inaccuracy of the 68030 timing emulation in Hatari.
I don't know an emulator which is better than Hatari.
Please help me to get right results. If anyone has the 68030 based Atari please run PI-ST.TOS and PI-ST30.TOS for me. Many thanks in advance.
One more question. Are there ways to increase the processor performance on the ST/STE/TT/Falcon? For example, the Commodore 64 is faster when its screen is blank.
-
-
Sorry I missed "The address range from 0 to $800 (2048) can be accessed only in the supervisor mode".
But is there a way to get a timer value from an application program?
It was unknown to me that the ST actually has an MMU which can protect some memory areas. The Amiga doesn't have any such things despite having a multitasking OS. It is an interesting discovery for me.
-
On 4/22/2021 at 9:01 PM, JB said:Similarly, the 9995 takes a 12 MHz input and divides it by four internally. We can only speculate as to why, but it clearly doesn't "run at" 12 MHz.
We still don't have the proof for it.
On 4/22/2021 at 9:47 PM, Stuart said:Vol, have you read post #20 of
Thank you for this link to interesting information. I have corrected a bit my blog entry.
-
I just want to read a timer value by
move.l $4ba,d0and I get "Panic: bus error".
What is it? Documentation shows that it is a long value. What is wrong? Please help me to read the timer. Thank you.
-
5 hours ago, mizapf said:Yes, I just wanted to point out that this seems intentional to me, in order to calculate that ratio of instruction execution rate (MIPS) by input clock rate (MHz). As said, you can calculate a lot of ratios when the day is long.
1 hour ago, apersson850 said:Yes, it may be intentional. But to make that correct you have to look at how many phases the clock cycle has too, since they don't do all these phases just because they like complexity. The actually perform something for each phase, so from that point of view, the TMS 9900 is a 12 MHz device too.
Thanks a lot for information. However it seems I am rather baffled now. My point is quite easy, according to official data the TMS9900 is at 3 MHz in the TI99/4A and the TMS9995 is at 12 MHz in the Geneve 9640. Indeed the Geneve processor is much faster but it uses 4 times more clock cycles. So if somebody wants to find out how fast are both processors at the same clock frequency he gets the obvious result that the TMS9900 is faster. Maybe it is possible to think that the TM9900 actually uses 6 or 12 MHz in the TI99/4a, or the TMS9995 uses 3 MHz in the Geneve. But I just took the numbers from open specs. I added a comment in the pi-spigot result table "all official instruction timings use 1/4 of the CPU freq as the base, so ER may be regarded as 4 times less". I added it because the matter is really complex for me. However I have to confess that I prefer to think that the TMS9995 uses 12 MHz internally rather than just dividing it by 4...
-
20 hours ago, mizapf said:I'm just a bit too busy right now, will take a note of your inquiry.
Anyway, we have enough Geneve users here who may run your tests, if you still don't trust MAME's accuracy.
Thank you. I trust MAME. It is great for the Geneve and TI99/4A. Thank you very much for them. I just want to get 100% accurate results. If you check the pi-spigot result table you find that most results came from real hardware...
-
My request for help is still actual. Please run PI-EXE on a real Geneve 9640 for me! Help me with my research. MAME is very good for the Geneve 9640 emulation but it is still not 100% accurate. I need timings (they are printed) for 100, 1000, and 3000 digits. A screenshot or two would be a good supplement too. Thanks a lot in advance.
A disk image is here.
-
1
-
-
On 4/19/2021 at 11:30 PM, mizapf said:I cannot really follow the argument that the 9995 is slower than the 9900 in any respect. How do you get to that conclusion? Did you compare the instruction execution times from the tables in the specification documents?
Running benchmarks on the Geneve and the TI-99/4A does not prove much, as the systems add their specific amount of wait states. GPL mode 1 is strongly slowed down by additional wait states to achieve a comparable speed as the TI-99/4A. The fact that GPL speed 1 is slightly slower than the execution on the TI-99/4A does not entail that the 9995 is slower.
One example from the tables, let's take A (Add words), registers in external memory, instruction in external memory, no wait states.
TMS9900: 14 clock cycles @ 333 ns = 4.662 µs
TMS9995: 8 clock cycles @ 333 ns = 2.664 µs
The base cycles for Add for the 9995 are 4 cycles; due to the 8 data lines the number of cycles is twice, i.e. we get 8 cycles. This is still almost twice as fast as the TMS9900. As I said, this is all without wait states.
The base speed of the 9900 is half the speed of the 9995, since it takes two clock cycles for a single machine cycle, while the 9995 takes one clock cycle per machine cycle.
8 TMS9995 cycles in your example are actually 32 input (CLOCKIN) clock cycles. So you have proved that the [email protected] is about 50% faster than the [email protected]
It seems it is rather easy to prove my point that the TMS9900 is faster than the TMS9995 as the same input clock frequency.
Let's analyse A (Add) command timing. For the the best case it takes 4 cycles on the TMS9995 but they are CLKOUT cycles which are 4 times longer than CLKIN cycles. So actually it takes at least 16 input clock cycles to do an addition on the TMS9995. The TMS9900 only needs 14 input cycles to this op. So it is about 15% faster for this case.
Let me present a short table which consists timings for some TM9900/9995 instructions:TMS9900 TMS9995 ABS 12-14 12 AI 14 16 B 8 12 BL 12 16 BLWP 26 44 DIV 92-124 112 INC 10 12 JMP 8-10 12 LDCR 22 44 LI 12 12 LIMI 16 20 MOV 14 12 MPY 52 92 RTWP 14 24 SLA 14 24It is interesting that in some cases (ABS, DIV, MOV) the TMS9995 can be a bit faster than the TMS9900. However all these numbers are for a case when the TMS9995 uses its fast internal memory for code and data. The size of this memory is small so in general the TMS9900 which has 16-bit data bus is faster. However the TMS9995 has instructions for signed division and multiplication, and therefore the TMS9900, indeed, is slower for these cases. IMHO it is surprizing that shift instructions are so slow on the TMS9995, they are two times slower than MOV.
The fact that the TMS9995 uses one frequency for input and another for instruction timings is really confusing. Some processors (the R800, 486DX2, ...) use higher internal frequencies but the TMS9995 does something rather opposite to this.
-
Thank you all for your interesting comments.
On 4/19/2021 at 10:58 PM, Stuart said:"This is a very unusual processor. The external data bus is 8-bit." Not very unusual I think. The TMS9980/81 had the same - 16 bit internally, 8-bit external data bus. Intel done the same on some processors.
Indeed processors using 16-bit ALU and 8-bit data bus were quite known. Besides the 8088 (the IBM PC), we have the 68008 (the Sinclair QL), and 65816 (the Apple IIgs). I wrote that the TMS9995 is unusual, having in mind its other features: its internal memory, divided clock frequency, internal timer.
-
On 4/16/2021 at 7:41 AM, InsaneMultitasker said:Yes, much fun can be had with the 9900/9995
I'm sure I learned how to do that from others and/or from sample [Geneve] code, which is not as abundant as in days past.
I have used your BBS. Thank you very much. It reminded me my experience in the early 90s when I was an active BBS user. It is sad that BBS doesn't allow to run a program like telnet or ssh do.
-
On 4/15/2021 at 5:29 PM, InsaneMultitasker said:That is most curious; without any investigation, my guess is that it has to do with the interpreter needing to bank in a page of memory for one or more operations.
It is quite easy to check. Start MAME with the -debug option. Let load the system (XB for instance) and then open a memory window and check address >8005. There are two numbers >31 and >EF which are shown at this address.
-
I have added a summary about the TMS9995. I hope I wrote correct information. Please let me know if there is anything wrong or if it just can be improved. Thank you.
-
On 4/10/2021 at 6:35 PM, InsaneMultitasker said:I attached source for a program I wrote some time ago that is self-contained and uses a few XOPs. The debug flag (near the end of the program) appears to be enabled, so the self-check CRC shouldn't hinder your running the program. One word of caution: this program uses sector IO to read/write from sector-based, floppy-drive devices. If you wish to disable this, find the XOP calls to "@IO" and hop/skip/jump accordingly.
Thank you very much. I am very impressed by your TTYOUT routine. IMHO the TMS9900 provides a way to write the shortest such kind routine.
-
1
-
-
On 4/8/2021 at 6:24 PM, mizapf said:I mean, I always tell the students that a mere comparison of clock rates is futile, and this is a good example.
So I invented the ER value which shows how efficiently can CPU electronics convert clock cycles into the performance.
The similar measure is called MIPS/MHz.
-
1
-

Benchmarking...
in Atari ST/TT/Falcon Computers
Posted
No, moulinaie's programs also don't work with redirection