Timing normal 32K RAM vs 32K 16bit RAM

+mizapf · June 15, 2022

Would it be helpful if we had a break of, say, one week on this topic? There may be lots of other points of interest that deserve our attention. And after that, we can carry on confusing each other.

Retrospect · June 15, 2022

7 hours ago, RXB said:

You can see the fire response is faster and the bullets move way faster.

I've not delved into what you guys have been testing, either incrementing numbers or something else, but I think both sides are "right".

Certain things we do only give 2 percent say and others up to twice the speed I would think?

When I looked at the Tombstone City source code, I was obviously out of my depth as it's not my native language but I did notice that the bullets were handled the same way I do them (most of the time) .... they weren't automotion, they were incremented. So I guess the 16-bit ram would play a part here and speed up quite a bit. Incrementation of a value would be the first thing that gets quicker with faster ram I would have thought. So the difference in Parsec for example, would be down to how long we wait for enemies to appear rather than their movement as they were automotion? Just my penny's worth to this thread.

Reciprocating Bill · June 16, 2022

1 hour ago, Retrospect said:

Certain things we do only give 2 percent say and others up to twice the speed I would think?

A few issues have become conflated in this thread.

The original (and primary) question concerned whether Extended BASIC derives much performance benefit when using expansion RAM rather than VDP RAM for program storage. The short answer is, "Not much."

A second question, sometimes conflated with the first, is the impact upon Extended BASIC performance using 16-bit expansion RAM, relative to 8-bit expansion RAM. The answer there is, "even less." In any event, most users don't have that modification.

Tombstone City is written entirely in assembly language, so its performance has no bearing upon the above Extended BASIC performance questions.

But, by virtue of being written entirely in assembly, it does nicely show off the performance kick afforded by 16-bit RAM to pure assembly programs when running out of expansion RAM. In theory there can be a 50% or greater advantage, although in practice it tends to range from 20% to 35%, because most well written assembly code employs scratchpad RAM for workspaces and small speed sensitive loops, mitigating the slow main memory to some extent.

That impact will also vary by instruction, reflecting primarily the number of cycles consumed by memory accesses relative to cycles doing internal things.

Edited June 16, 2022 by Reciprocating Bill

apersson850 · June 16, 2022

Considering that the TMS 9900 architecture is as memory dependent as it is, it's not really any surprise that fast memory makes a big difference for assembly programs.

Interpreters, which frequently run in other memory, memory that's not affected by going to 16 bits for the main memory expansion, and already have their workspaces in fast RAM, they only read an occasional opcode from expansion RAM. That's why the difference is so small.

The more common instructions in the TMS 9900 uses 10-14 cycles for the instruction itself. That can include four memory accesses (like for MOV and Add). One memory cycle to read the instruction, three for data. The difference between 8-bit and 16-bit memory access is 4 cycles for a word read by the CPU. Thus if data is already in fast memory and only the instruction is in the memory expansion, then the difference is 14 vs. 18 cycles. But if everything is in slow memory, we have 14 vs. 30 cycles. Even more memory accesses are needed for indexed addressing.

Still, something to think about when optimizing TMS 9900 code is that the instructions themselves require quite a lot of cycles, even in their most basic form. Adding advanced addressing doesn't add that much, relatively spoken. This leads to that it's almost always better to do the work with the most complicated addressing modes available, if that implies that you can use fewer instructions! The cost for advanced addressing is less than for more instructions.

Example:

MOV *R2,*R3 is faster than

MOV @ALPHA,@BETA

But

MOV @ALPHA,@BETA

is much faster than

LI R2,ALPHA

LI R3,BETA

MOV *R2,*R3

On the other hand, if you have frequent accesses to ALPHA, then it's better to load ALPHA to a register and then all the time access ALPHA indirectly via that register, rather than using symbolic (memory) addressing over and over again.

RXB · June 16, 2022

7 hours ago, apersson850 said:

Considering that the TMS 9900 architecture is as memory dependent as it is, it's not really any surprise that fast memory makes a big difference for assembly programs.

Interpreters, which frequently run in other memory, memory that's not affected by going to 16 bits for the main memory expansion, and already have their workspaces in fast RAM, they only read an occasional opcode from expansion RAM. That's why the difference is so small.

The more common instructions in the TMS 9900 uses 10-14 cycles for the instruction itself. That can include four memory accesses (like for MOV and Add). One memory cycle to read the instruction, three for data. The difference between 8-bit and 16-bit memory access is 4 cycles for a word read by the CPU. THUS if data is already in fast memory and only the instruction is in the memory expansion, then the difference is 14 vs. 18 cycles. But if everything is in slow memory, we have 14 vs. 30 cycles. Even more memory accesses are needed for indexed addressing.

Still, something to think about when optimizing TMS 9900 code is that the instructions themselves require quite a lot of cycles, even in their most basic form. Adding advanced addressing doesn't add that much, relatively spoken. This leads to that it's almost always better to do the work with the most complicated addressing modes available, if that implies that you can use fewer instructions! The cost for advanced addressing is less than for more instructions.

Example:

MOV *R2,*R3 is faster than

MOV @ALPHA,@BETA

But

MOV @ALPHA,@BETA

is much faster than

LI R2,ALPHA

LI R3,BETA

MOV *R2,*R3

On the other hand, if you have frequent accesses to ALPHA, then it's better to load ALPHA to a register and then all the time access ALPHA indirectly via that register, rather than using symbolic (memory) addressing over and over again.

I have the same issue and Lee and I working on it came to the problem of getting the inputs from XB into Scratch Pad so everything is fast as possible.

As the only fast RAM in the Console is just 256 bytes.

XB already has Registers at >83E0 so from XB the values are fetched and used where they were, problem is FAC is always used for getting values.

Using the same exact location for every time a value is needed is really a bad design.

Just as bad as only having 256 bytes of fast RAM to begin with.

This using LI R2,ALPHA is restricted as much as possible and MOV @ALPHA,@BETA is uses instead but all in FAST RAM.

This makes much more work as only having 256 bytes makes it tough to do.

Definitely we need more fast RAM.

apersson850 · June 17, 2022

13 hours ago, RXB said:

Definitely we need more fast RAM.

Yes. It would have been cool if there had ever been an official TI 99/4B with 80 K RAM (16 VDP + 64 CPU). Similar to the design I made myself. Just imagine what kind of programs could have been made for it then.

Anyway, even if you have to do MOV @ALPHA,@BETA in standard, slower memory, it's still faster than additonal instructions to move around data via faster memory.

When I was programming the TI a lot, the real machine, I mean, that was one of the major benefits with my own internal memory expansion. It didn't matter what I did with workspaces and code placement in memory. It was at the highest speed possible anyway.

GDMike · June 17, 2022

Just think about the types of programs that can be made now, thankfully some are still writing..

Edited June 17, 2022 by GDMike

Reciprocating Bill · June 17, 2022

10 hours ago, apersson850 said:

It didn't matter what I did with workspaces and code placement in memory. It was at the highest speed possible anyway.

The Editor and Assembler of the eponymous Editor/Assembler both run noticeably faster themselves with all fast RAM, particularly the Editor, which is not IO bound. (But nothing like on Classic99 unthrottled.)

Edited June 17, 2022 by Reciprocating Bill

apersson850 · June 17, 2022

1 hour ago, GDMike said:

Just think about the types of programs that can be made now, thankfully some are still writing..

Yes, I know, there are some like you, Mike. But today the 99/4A can't compete in utilitarian value to today's machines, no matter how it's equipped. It was thinking about back then, 40 years ago. Compared to contemporary machines, that is.

GDMike · June 17, 2022

1 hour ago, apersson850 said:

Yes, I know, there are some like you, Mike. But today the 99/4A can't compete in utilitarian value to today's machines, no matter how it's equipped. It was thinking about back then, 40 years ago. Compared to contemporary machines, that is.

Not comparing to machines today, nobody said that. And it did quite well 40 years ago too. Depending on what the compare is.

Retrospect · June 28, 2022

In the Classic99 updates thread I was asked if I would mention if I saw any speed differences between normal RAM and the 16-bit RAM mod. I think there is a speed difference, I just hope that my recording program shows this successfully.

Here is a project of mine, running under normal ram and abnormal ram.

Normal RAM:

Abnormal RAM:

It does seem somewhat peppier with 16-bit.

Reciprocating Bill · June 29, 2022

Demo on stock console versus demo on modified (16-bit) console:

RXB · June 29, 2022

A simple counter would show which one finished first along with access to clock.

The demo alone does not show much if the difference is in milliseconds.

Mainframe super computers brag about beating each other by trillionths of a second.

No human could tell so why they use math like PI to do this.

Or video tests but that stresses a separate Video processor not the CPU.

Reciprocating Bill · June 29, 2022

I'm a simple counter. I can tell which one finishes first.

Scrubbing the videos, I see that the stock console takes ~ 9' 42" to run the demo, while the 16-bit console finishes in ~ 6' 50".

apersson850 · June 30, 2022

And, as we have concluded before, if the difference is in the millisecond range, then it doesn't matter.

RXB · June 30, 2022

6 hours ago, apersson850 said:

And, as we have concluded before, if the difference is in the millisecond range, then it doesn't matter.

LOL that is funny tell that to the Computer Geeks that milliseconds do not matter!

That is like telling race car drivers how fast you go does not matter!

Or a carpenter that a inch more does not matter.

Yea it does matter as computers work in smaller increments that add up over time.

Works the same way as the guy that stole 4 billion dollars from banks, .001 cent stolen per transaction added up to 4 billion over time. (5 months)

GDMike · June 30, 2022

But the slower the processor the less it matters.

Edited July 1, 2022 by GDMike

+TheBF · June 30, 2022

12 minutes ago, RXB said:

LOL that is funny tell that to the Computer Geeks that milliseconds do not matter!

That is like telling race car drivers how fast you go does not matter!

Or a carpenter that a inch more does not matter.

Yea it does matter as computers work in smaller increments that add up over time.

Works the same way as the guy that stole 4 billion dollars from banks, .001 cent stolen per transaction added up to 4 billion over time. (5 months)

It is all about appropriate units of measurement for the job.

The carpenter does not need to work to the nearest .0001 of an inch for example but a CNC mold maker does.

When the speed difference between two programs is more that 2.5 minutes, (150,000 mS) one millisecond here or there is irrelevant.

Edited June 30, 2022 by TheBF
too many zeros. :-)

GDMike · June 30, 2022

3 hours ago, TheBF said:

The carpenter does not need to work to the nearest .0001 of an inch for example but a CNC mold maker does.

Just watch out for falling debris from above..

Edited June 30, 2022 by GDMike

Reciprocating Bill · June 30, 2022

At any rate, the purpose of my original post above (#362) is to display several examples of the performance boost afforded by the use of 16-bit RAM, so we can actually see it in running code. Much as Retrospect did with his compiled game demo.

I didn't cite any numbers at all.

Play them side by side and decide for yourselves if the difference is significant. In fact, scrub to individual segments within the demo and race them side by side. The impact varies from segment to segment.

+dhe · June 30, 2022

I'm over 50 so I am allowed to repeat myself. We *REALLY* need to get someone to reproduce the Tops Radio Supply 16-bit daughter board.

You unsolder the 9900, you put in a socket. You put the daughter board in to the socket. - Testing is as easy as unplugging the daughterboard and putting a 9900 back in.

That was the same approach Don and Gary took with 'The Accelerator' - maybe we can get Gary to finish that in a year or so?

TheMole · July 1, 2022

6 hours ago, dhe said:

I'm over 50 so I am allowed to repeat myself. We *REALLY* need to get someone to reproduce the Tops Radio Supply 16-bit daughter board.

You unsolder the 9900, you put in a socket. You put the daughter board in to the socket. - Testing is as easy as unplugging the daughterboard and putting a 9900 back in.

That was the same approach Don and Gary took with 'The Accelerator' - maybe we can get Gary to finish that in a year or so?

Wait, there was a project to create an accelerator for the TI? Is there any documentation available on it?

I had been mulling such a project myself, but I'm not a hardware guy so given that I will have a lot to learn throughout the project it would take me quite some time to get something useful.

I was thinking the following:

Accelerator board form factor: desolder the tms9900, solder in a ic socket, plug in the board and plug the CPU into the ic socket on the board, no other soldering needed
1/2/4meg RAM onboard, support for SAMS bankswitching
supports banking in RAM over the entire address space, perhaps with write-through support for the ROM areas, ala C64 or Apersson's internal 16-bit memory.
All RAM on the accelerator is on the 16-bit bus, if 'legacy' memory (anything on the motherboard or in the PEB ...) is paged in, normal wait states apply
Perhaps a "Block transfer"/DMA feature to write to VDP at max bandwidth without holding up the CPU?
Perhaps variants that support the tms9995 or tms99105?

Anyway, probably a bit ambitious for a hardware novice, but I look at it more as a learning execise than anything else...

+dhe · July 1, 2022

Don O'Neil did the initial design, Gary Bowser did some clean up work. It was based upon the TMS99105 - It actually ran, but there was some screen glitcheyness in basic. But Basic did run much faster. It didn't make it out of prototype stage, but was a good proof of concept. Tops Radio Supply wasn't as ambitious, a TMS9900, a couple of LS chips and a couple of static RAM chips, but still a nice performance boost - and you could easily test every chip (except the TMS9900) with a TL-866.

apersson850 · July 1, 2022

On 6/30/2022 at 6:53 PM, RXB said:

LOL that is funny tell that to the Computer Geeks that milliseconds do not matter!

That is like telling race car drivers how fast you go does not matter!

Yea it does matter as computers work in smaller increments that add up over time.

No, it doesn't matter, and most of the geeks will understand that. It's one millisecond more or less of the total time we're talking about here.

And we are not racing. We are sitting there waiting for the thing to get ready. If we wait three minutes or three minutes and one millisecond doesn't matter.

As you say, if milliseconds add up a million times, then they matter. But not a single one. Not in this case.

apersson850 · July 1, 2022

An accelerator board you're talking about like it's described will not fit inside the case, as far as I understand.

Timing normal 32K RAM vs 32K 16bit RAM

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members