Technical Inquiry - 64K Memory/16 Bit Bus (Why it never happened)

Tursi · February 27, 2015

If someone were to modify GPL you could certainly see a speedup there but you'll be replacing the system GROMs (ROMs?)

GPL doesn't suffer from a lack of scratchpad RAM. The GPL interpreter is in the 16-bit ROMs and is also running at high speed. It's just very, very inefficient. A rewrite of GPL that used all the exact same memory configuration it does today could easily run much faster - remember that reading a byte from VDP RAM doesn't have to be any slower than reading a byte from CPU RAM -- it's only slower if you need to set the address first. An easy target (for example) is the MOVE instruction -- for every byte it fetches it reparses the memory opcodes, resets the memory pointers, and fetches the byte, then it does the same to write it. Lots of ways to optimize that, especially running between GROM and VDP (or other combination where the address pointers are preserved).

For larger scale banking, the system AMS uses is elegant and works well - and importantly already has some software support. Why design a new system with no software support?

JamesD · February 27, 2015

Everyone in my house that owns a TI does

I'm sorry you are such a lonely person.

*edit*

Ok, so maybe you are the only TI owner.

Edited February 27, 2015 by JamesD

JamesD · February 27, 2015

Just imagine instead of designing GPL, that TI99 put a Forth Interpretar in the Console ROM and GROMs were really FROMs running Forth instead of GPL.

(Ok, getting off-topic here now!, but these days we could easy replace the console roms with a whole new stock boot-up along with new simulated GROMs running the new code.)

Similar to how the 99/8 was to have P-Code built-in, and how P-Code Card on the TI99/4a used GROMs.

BTW, If you have no tried Willsy's TurboForth you should, as its not at all like TI's original crappy one.

(ok, I really off-topic now, lets get back to this 64Kx16 design/kit idea, I think it is great if somehow it can be produced to be easy enough to install, that been the problem with the idea from day one I think).

I think having a built in P-Code type interpreter was a good idea then and still would be.

It's easier to generate compiler output in P-Code than native code, source code is protected from prying eyes, it's much faster than BASIC and it's portable.

JamesD · February 27, 2015

GPL doesn't suffer from a lack of scratchpad RAM. The GPL interpreter is in the 16-bit ROMs and is also running at high speed. It's just very, very inefficient. A rewrite of GPL that used all the exact same memory configuration it does today could easily run much faster - remember that reading a byte from VDP RAM doesn't have to be any slower than reading a byte from CPU RAM -- it's only slower if you need to set the address first. An easy target (for example) is the MOVE instruction -- for every byte it fetches it reparses the memory opcodes, resets the memory pointers, and fetches the byte, then it does the same to write it. Lots of ways to optimize that, especially running between GROM and VDP (or other combination where the address pointers are preserved).

For larger scale banking, the system AMS uses is elegant and works well - and importantly already has some software support. Why design a new system with no software support?

But does GPL access more than one address when dealing with floating point?

Remember, floating point itself is multiple bytes and if you deal with multiple numbers you are accessing even more addresses.

I have a hunch it's constantly having to change the VDP address.

No question GPL could be optimized in other ways, as I said, that was the first thing that came to mind.

Tursi · February 27, 2015

But does GPL access more than one address when dealing with floating point?

Remember, floating point itself is multiple bytes and if you deal with multiple numbers you are accessing even more addresses.

I have a hunch it's constantly having to change the VDP address.

GPL spends comparably little time working with floating point numbers, and most of the floating point functions are written in assembly (I haven't looked at whether they are efficient). Even in TI BASIC, where all the numbers are floats, far more time is wasted in the GPL interpreter on housekeeping and basic interpretation. I'd be surprised if the difference was even visible if all you changed was the float storage location - I've been surprised a few times in simple GPL optimization attempts.

pnr2015 · February 27, 2015

The idea I'm noodling is that in branching to a subroutine in the bank area, the process of executing a BL (or BLWP, or XOP) against a specific address would cause the bank switch. If banking and landing on the exact routine is not possible, then maybe landing on a vector table in the banked-in memory to complete the jump to the proper routine... Maybe.

This was done in the later versions of 16 bit Unix (V7M, 2BSD). It split the 64K area in a base area (say 48K) and up to 15 overlays (say 16KB each). It worked like this:

- first you compiled/assembled your code to separate object files.

- then you linked it into the final program, specifying which object files went to into the base, and which into each overlay.

- the linker worked out when a function call went from base to an overlay or from overlay to a different overlay.

- the linker then added little stub subroutines to base, one for each such target function; the calls to the target function were patched up to be calls to the stub

- the stub would check if the currently mapped overlay was the correct one and if so jump to the real target subroutine; if not, it would first map in the right overlay and only then make the jump

BSD2.11 (released in 1992) managed to run a 250KB kernel in a 64KB address space.

pnr2015 · February 27, 2015

The 9900 based models had 64K.

I think the models that had over 64K used the 99000 series CPUs or discrete logic.

Are you sure? I thought the TMS9900-based TI990/4 had a 74612 mapper. The discrete logic TI990/10 used a MMU with three "base+extent" segments. The TMS99000 based TI990/10A used the same MMU, but integrated into a single custom VLSI chip (referred to as "MAPPER". It is a 40-pin DIP package; I don't think TI ever sold this chip separately).

pnr2015 · February 27, 2015

Memories can fade: I looked at the documentation again and indeed the TI990/4 had no mapper at all. The MAP chip in a TI990/10A is a 64 pin package, not 40.

JamesD · February 27, 2015

GPL spends comparably little time working with floating point numbers, and most of the floating point functions are written in assembly (I haven't looked at whether they are efficient). Even in TI BASIC, where all the numbers are floats, far more time is wasted in the GPL interpreter on housekeeping and basic interpretation. I'd be surprised if the difference was even visible if all you changed was the float storage location - I've been surprised a few times in simple GPL optimization attempts.

I had another look at the ROM disassembly.

What I had seen before was actually the float routines fetching from the VDP stack.

Scratchpad RAM probably wouldn't be sufficient to correct that but if accesses to the VDP stack could be accesses to CPU RAM that would speed everything up.

It doesn't surprise me at all that the housekeeping routines are slow.

At one time, Applesoft BASIC had a bug in it's housekeeping routines and it would cause programs to pause for 30 seconds to several minutes.

They patched it with a DOS update. The correct code was actually already in the ROM, somehow it hadn't been used.

matthew180 · February 27, 2015

This was done in the later versions of 16 bit Unix (V7M, 2BSD). It split the 64K area in a base area (say 48K) and up to 15 overlays (say 16KB each). It worked like this:

- first you compiled/assembled your code to separate object files.

- then you linked it into the final program, specifying which object files went to into the base, and which into each overlay.

- the linker worked out when a function call went from base to an overlay or from overlay to a different overlay.

- the linker then added little stub subroutines to base, one for each such target function; the calls to the target function were patched up to be calls to the stub

- the stub would check if the currently mapped overlay was the correct one and if so jump to the real target subroutine; if not, it would first map in the right overlay and only then make the jump

BSD2.11 (released in 1992) managed to run a 250KB kernel in a 64KB address space.

That is an interesting technique and only requires a linker to transparently (from a programmer's perspective) use paged memory. Even the assembler is unaware and does not have to worry about it.

Sign In

Technical Inquiry - 64K Memory/16 Bit Bus (Why it never happened)

Recommended Posts

Tursi

Link to comment

Share on other sites

JamesD

Link to comment

Share on other sites

JamesD

Link to comment

Share on other sites

JamesD

Link to comment

Share on other sites

Tursi

Link to comment

Share on other sites

pnr2015

Link to comment

Share on other sites

pnr2015

Link to comment

Share on other sites

pnr2015

Link to comment

Share on other sites

JamesD

Link to comment

Share on other sites

matthew180

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More