Jump to content

Photo

TMS-9900 CP/M?


87 replies to this topic

#51 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Thu Aug 31, 2017 3:16 PM

UCSD Pascal attempted to create a common development/target environment.

The biggest problem with UCSD Pascal is that it used a virtual machine.
That in itself isn't a horrible idea but the virtual CPU is stack based and most processors did not include stack relative instructions, plus the compiler wasn't a modern optimizing design.
I think Apple Pascal runs about 30% faster than Applesoft II BASIC which is about what you get from Applesoft BASIC compilers.

Apple added support for native code segments to add speed and to allow native hardware support but that isn't portable and a better design from the start would have made it faster without having to write native code.

The virtual machine is the biggest advantage of UCSD p-system, not the biggest problem.

The p-system is all about portability. That's the whole purpose of the system. And since it's mainly implementing Pascal, and is written in Pascal itself, the stack machine is obvious. The whole idea about a language like Pascal is its scope of variables and recursive capability. That lends itself well to a stack-based machine. A memory-to-memory architecture like the TMS 9900 can also operate on values on the stack directly, as well as address anything offset a certain amount from the top of stack.

The native code converter is as portable as anything else. You write your program in Pascal, then do the native conversion on the target machine, if you feel you need to. You don't have to, but if you then want to run the program on a different target computer, you simply move the Pascal code to that machine (remember, you don't have to re-compile) and apply the native code conversion on the new target, with the appropriate NCG software. They were available for all processors, but unfortunately TI didn't release any for the TI 99/4A.

 

Yes, of course, an emulated machine, like the PME, is slower than native code. Even when we let the architecture of the machine stay the same, I've shown to you that at the extreme, there are about eight instructions executed to do the one that's really needed. But more complex things, like procedure calls, have far less overhead, since they are more complex in nature. P-codes like CXG aren't converted by the NCG either.

But the TI 99/4A p-system does support native code generation. It just lacks the program for the automatic conversion. But once converted, the program works. I've tried, since I out of curiosity converted some small routine by hand, and patched the code file to make it look like what a NCG would create. It runs just as it should. I've played with the idea of making a NCG for the TI, but have written my own assembly routines instead, when needed.

 

To understand why the decision to go with the virtual machine, the PME, instead of a native compiler, and why the trade off between portability and speed was the right decision at that time, you have to consider exactly that, the time when it was taken.

At that time, there were a large plethora of different computer systems. Many of them had similar capabilities, but they were still incompatible with each other. It was obvious that this state of affairs would continue for decades, since to change that would take that some manufacturer would come up with a computer architecture, that was good enough so that everybody would embrace it, and also was fully open and documented, so that everyone could make hard- and software for this non-existent computer. But the only corporation that perhaps could pull off such a stunt would probably be the big blue, IBM. And if you could take a few things for granted, then it was that

  1. IBM would never introduce a computer based on microprocessor.
  2. IBM would never tell you all the details about the innards of such a computer.
  3. IBM would never hire a failed university student, who was hardly dry behind his ears, to write the operating system for such a computer.

Hence, a snowball in hell would have better odds for survival than the thought that there would be a "universal computer architecture" that everyone could have, in the office and at home, for which programs could be developed with the aim to make them as efficient as possible, but without having to worry about these programs being possible to run on some other computer system.

 

I concur with the voices above, that says that a CP/M based on a TMS 9900 is a nice exercise, but rather pointless today. There are quite a lot of programs in the old USUS base that could run on the 99/4A, if that's what you want to accomplish. The TI 99/4A is hardly used for productive things today anyway, so this is for fun. Why not take on the task to actually do the NCG for the p-system in the 99/4A instead? That would speed it up, perhaps by a factor of five, which is quite a difference.



#52 Ksarul OFFLINE  

Ksarul

    River Patroller

  • 4,215 posts

Posted Thu Aug 31, 2017 6:06 PM

Excellent, @RickyDean--if you have access to a Kryoflux or to a SuperPro card, you can connect a 5.25 disk drive to them and do a flux-level copy of the disks. That then lets you look at the copies instead of the fragile originals--and you can write the images to real floppies again using the card too.

 

It is very good to see some of the software for these cards survived!



#53 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,734 posts
  • Location:Flyover State

Posted Thu Aug 31, 2017 10:17 PM

The virtual machine is the biggest advantage of UCSD p-system, not the biggest problem.

...

Notice what you quoted
 

 

UCSD Pascal attempted to create a common development/target environment.

The biggest problem with UCSD Pascal is that it used a virtual machine.
That in itself isn't a horrible idea but the virtual CPU is stack based and most processors did not include stack relative instructions, plus the compiler wasn't a modern optimizing design.
...

My problem isn't with a virtual machine, it's with the specific implementation.
It's easy to write a compiler for the stack machine, but it's not particularly fast on anything short of a mini-computer at the time it was created.

It also doesn't make for something that can be run through a simple translator to get a significant speed boost out of.  
You skip the decoding phase, but you are still stuck with a stack machine.
Sure, you might run the source through a compiler that generates native code for speed critical functions, but if you only have the executable, you can't do that.
And assembly certainly isn't portable which was one of the primary goals in the first place.

Had they used a register based virtual CPU, you could translate the code to fairly efficient assembly that operates on virtual registers at fixed locations in memory.
With a pass through a simple peephole optimizer, you would have pretty fast code without needing the original source or hand written assembly.

But I understand why they did it that way.  It's easy to generate code for the stack machine, it fits in 64K, and the virtual machine is probably easier to implement.
When interpreted, the register based virtual CPU probably wouldn't be any faster.  But translated to native code... it's close to speeds from a native code compiler.



#54 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Fri Sep 1, 2017 1:49 AM

Sure, you might run the source through a compiler that generates native code for speed critical functions, but if you only have the executable, you can't do that.
 

With the p-system, you can. You have to flag the procedure that's a candidate for native code conversion when you write the Pascal source, but once you've done that, the respective NCG software for the processor at hand works on the code file, not the source file, to convert the code file to native code. So it's still portable and it removes the interpreter overhead.

And that's the point. That's what you primarily want to do, not the peephole optimizing.

 

Look at the p-code for add integer. It's implemented as A *SP+,*SP. Of course this takes twice the time of A R3,R4, but that's not the culprit. The major slowdown comes from the seven or eight other instructions that have to run, in order to figure out what to do. You also have to consider that you pretty quickly run out of registers in the CPU, so you can't store everything there. In comparison, the stack is "infinite". Thus you sometimes need to generate instructions to move things in and out of registers, which is another kind of overhead.

Now the TMS 9900 has a bit of and advantage in this case, as it's actually an on-chip implementation of a minicomputer at that time.

To make it as fast as possible, the PME running on the TMS 9900 uses eight CPU registers for various data. Among them are the

  • Stack pointer
  • Current activation record frame pointer (local data of executing procedure)
  • Global data record frame pointer
  • Instruction pointer (next p-code to execute)
  • Pointer to p-machine inner interpreter
  • Code location flag (since the TI 99/4A can run p-code from three different memory pages, i.e. CPU RAM, VDP RAM and p-code card GROM).

The capability to access data both indirectly and index from a register is well used. 

 

You're right about that the p-system Pascal compiler is focused on compiling a full Pascal version into a small memory environment, not on optimizing for speed. You can optimize some things, but the programmer has to do that in the source.

 

For example:

i := succ(i);

is smaller and faster than

i := i+1;

 

x := 17;

is smaller and faster than

x := 5+4+8;

 

accessing a and b is faster if you declare

var

  a,b: intgeger;

  c: array[0..58] of integer;

than if you declare

var

  c: array[0..58] of integer;

  a,b: intgeger;

 

accessing the array is faster if you declare

array[0..58] of integer;

than if you declare

array[1..58] of integer;

 

and

fillchar(c,sizeof©,chr(0));

is much faster than

for i := 0 to 58 do

  c[i] := 0;

 

It's very easy to think that the p-system was designed according to wrong decisions, when we look at it today. And indeed it was, since it has literally died. But at design time, it didn't matter if a program took five hours instead of one hour to execute, since the major issue was that moving it to the next computer on the desk would take five days, or five weeks, if the two weren't identical. Besides, it was designed to run in an interactive manner (remember that such an idea was quite novel at the time), not in batch by entering punched cards. Thus, in many cases, a lot of the runtime was actually not runtime, but waiting for the user to press a button.

 

Today, I can take a program I wrote in Turbo Pascal, running under DOS in 1985, and run on a PC running Windows XP. It doesn't work without tricks on later models, but even with that limitation, it's a portability nobody had heard of, or could even dream about, in the 1970's. A funny thing is that it actually runs on the TI 99/4A too, under the p-system, with minimal changes. They only take care of accessing the function keys on the keyboard. That's quite astonishing, actually. A good Windows XP computer will do in what's like an instant what an 80286 based PC AT did in three seconds and the TI 99/4A does in two minutes, but the do run the same program.

Since the code is 4000+ lines, a more manual conversion (like from one Forth to another) would take quite some time.



#55 RickyDean OFFLINE  

RickyDean

    Dragonstomper

  • 685 posts

Posted Fri Sep 1, 2017 3:12 PM

Excellent, @RickyDean--if you have access to a Kryoflux or to a SuperPro card, you can connect a 5.25 disk drive to them and do a flux-level copy of the disks. That then lets you look at the copies instead of the fragile originals--and you can write the images to real floppies again using the card too.

 

It is very good to see some of the software for these cards survived!

Well, I hope I didn't damage the disk beyond repair or use last night. I tried a drive I was given recently, that was a 360 k drive teac, and it had allowed me to run a couple of disks, successfully, that I knew if it damaged wouldn't hurt me. It seems, when i inserted this disk, that it made a woosh woosh noise, then started a smal screechy sound. I pulled it out immediately, but the are some marks on the disk. I will look into the kyroflux option, I have read about them, but I still think I need a reliable disk drive too. e'll see



#56 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,734 posts
  • Location:Flyover State

Posted Fri Sep 1, 2017 8:46 PM

 

 

Sure, you might run the source through a compiler that generates native code for speed critical functions, but if you only have the executable, you can't do that.

With the p-system, you can. You have to flag the procedure that's a candidate for native code conversion when you write the Pascal source, but once you've done that, the respective NCG software for the processor at hand works on the code file, not the source file, to convert the code file to native code. So it's still portable and it removes the interpreter overhead.

And that's the point. That's what you primarily want to do, not the peephole optimizing.

...

If you don't have the source how do you flag the procedure?
If someone flagged a procedure for native code and you have a native code generator... you can generate native code. 
But I've seen the Z80 version.  It's not very efficient.

You quickly run out of registers?  Which is why stack based CPUs still rule today and register based ones went the way of the dinosaur?

And the things you bring up that are stored in registers on the 9900.

So?  Would a register based virtual CPU not load those into the real CPU registers when they are needed for some reason?

It's not the same thing as I'm talking about anyway.



#57 mozartpc27 OFFLINE  

mozartpc27

    Chopper Commander

  • 146 posts

Posted Fri Sep 1, 2017 9:17 PM

Way late to this party but I have to admit, as a Commodore user, the notion suggested by the OP two years ago of porting GEOS to the TI-99/4A is fascinating to me. Even as a hypothetical not sure how feasible it is, maybe with the memory expansion? Still, interesting.

#58 RXB ONLINE  

RXB

    River Patroller

  • 2,798 posts
  • Location:Vancouver, Washington, USA

Posted Fri Sep 1, 2017 10:19 PM

LOL the entire point of a memory mapped CPU chip like the 9900 was to have infinite REGISTERS using BLWP as that means every memory location could be a Register?

 

Example: 

 

FSTRAM  BLWP @FASTRAM

                 A        R2,R4 

                MOV  R4,*R12+NADDRS+14

NMRAM   BLWP @NRAM

                 SLL    5,R7

 

Yea Normal RAM is slower but with tons of Registers instead of restricting all Registers to just FASTRAM seems self defeating for this chip design.


Edited by RXB, Fri Sep 1, 2017 10:20 PM.


#59 RXB ONLINE  

RXB

    River Patroller

  • 2,798 posts
  • Location:Vancouver, Washington, USA

Posted Fri Sep 1, 2017 10:23 PM

If you don't have the source how do you flag the procedure?
If someone flagged a procedure for native code and you have a native code generator... you can generate native code. 
But I've seen the Z80 version.  It's not very efficient.

You quickly run out of registers?  Which is why stack based CPUs still rule today and register based ones went the way of the dinosaur?

And the things you bring up that are stored in registers on the 9900.

So?  Would a register based virtual CPU not load those into the real CPU registers when they are needed for some reason?

It's not the same thing as I'm talking about anyway.

Entire RAM memory is the Register set on a 9900. Every single memory in RAM could be a REGISTER with BLWP so maybe you forgot we have a Memory Mapped CPU?

 

Every other CPU made has a FINITE number of Registers and that is a big limitation for programming.

 

Imagine a INTEL CPU Memory Mapped would lose some speed true, but over all would outperform today chips as the code would be tighter and faster to run.

 

The amount of wasted time moving shit around all the time is a really drag on number of lines of code.


Edited by RXB, Fri Sep 1, 2017 10:27 PM.


#60 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Sat Sep 2, 2017 2:42 AM

If you don't have the source how do you flag the procedure?
If someone flagged a procedure for native code and you have a native code generator... you can generate native code. 
But I've seen the Z80 version.  It's not very efficient.

You quickly run out of registers?  Which is why stack based CPUs still rule today and register based ones went the way of the dinosaur?

And the things you bring up that are stored in registers on the 9900.

So?  Would a register based virtual CPU not load those into the real CPU registers when they are needed for some reason?

It's not the same thing as I'm talking about anyway.

Yes, it does take that the programmer was clever enough to realize which procedures could be good candidates for native conversion. But it's not too unlikely that he did that, if such a procedure was considered to be too time consuming.

 

I referred to if you try to stuff as much as possible of what you need to access (local variables, global variables, parameters...) into registers for max speed, then that quickly adds up. I was involved in a project once, where we attempted to do parameter passing during procedure calls through registers. Lightning fast, but quite limited.

 

Storing things in registers in the current PME obviously doesn't prevent other systems/compilers from doing the same. I was just pointing out that the current PME, in the 99/4A, does use these tricks already.

 

I've not used any NCG for the Z80, so I can't comment on it. I don't have any for the 9900 either, but I've seen listings of what it does.

 

It does for example convert this code

MOVB *PC+,R1
SRL  R1,7
MOV  @OPCODE(R1),R2
MOV  *R2+,R0
B    *R0
A    *SP+,*SP
B    *R12

into this

A    *SP+.*SP

Whether that is efficient or not can be discussed, but it should be obvious that the second version executes faster.

 

So yes, things could always be done differently. The p-system didn't prioritize speed, but portability. At that time, it was probably a decision that would come out the same once again, even if it today could have been different. Do you know how Turbo Pascal 4.0 generates code? I don't, but I've used it, and the Pascal it supports is very similar to the p-system IV (same major edition as is used in the 99/4A). Turbo Pascal is very fast, in both compilation and execution, so they must have used the tricks in the book.

But I don't know anything about the inside strategies used for it.



#61 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,734 posts
  • Location:Flyover State

Posted Sat Sep 2, 2017 3:35 AM

Entire RAM memory is the Register set on a 9900. Every single memory in RAM could be a REGISTER with BLWP so maybe you forgot we have a Memory Mapped CPU?

 

Every other CPU made has a FINITE number of Registers and that is a big limitation for programming.

 

Imagine a INTEL CPU Memory Mapped would lose some speed true, but over all would outperform today chips as the code would be tighter and faster to run.

 

The amount of wasted time moving shit around all the time is a really drag on number of lines of code.

I suggest you look at how GCC supports the 9900.  It's a good example of how a compiler allocates registers.
Versions of GCC for the 6809 and 68hc11 even use virtual registers to create faster code.

The thing is, you have to constantly swap things on and off the stack with the stack oriented CPU. 
There are accessed with indexed addressing, preferably stack relative addressing.
The 8080, 6800, 6502, Z80, 6803, 8085, etc... don't have stack relative addressing, so you have to at least copy the stack pointer to an index register or worse yet, maintain the stack manually like on the 6502.  

Virtual registers can be accessed with extended addressing.

Extended addressing is faster than indexed addressing on most CPUs, so virtual registers are generally faster if there are enough of them you don't have to constantly shuffle the contents.  

8 virtual 16 bit registers would only occupy 16 bytes of RAM, and one or two of those would probably be temporary locations that aren't preserved across function calls, and would hold the return value from procedures.   
Further, on the 6800, 6502, and 6803, 6809, 65816, and 68hc11 you can place the virtual registers on the direct page (page 0 on the 6502), to use direct addressing which saves a byte and clock cycle for every instruction. GCC for the 6809 and 68hc11 uses this.  
Not only can the interpreter use extended or direct addressing, but the code that is generated by a translator can use them addressing.

 

The 6809, 65816, 68000, etc... have stack relative addressing, so they are more efficient for implementing the P-Machine, but stack relative addressing still takes more clock cycles than extended or direct addressing due to having to load the offset value. 

I realize the P-Machine was created in the 70s and compiler technology was a bit primitive, but BASIC-09 on the OS-9 operating system used what they called an I-Machine which is similar, and it runs much faster.  So faster virtual machines existed even in the 80s well before the final version of UCSD Pascal.
And we certainly aren't in the 80s now.  I don't see why we can't point out shortcomings in the design based on what we know now.

 



#62 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,734 posts
  • Location:Flyover State

Posted Sat Sep 2, 2017 3:56 AM

Yes, it does take that the programmer was clever enough to realize which procedures could be good candidates for native conversion. But it's not too unlikely that he did that, if such a procedure was considered to be too time consuming.

 

I referred to if you try to stuff as much as possible of what you need to access (local variables, global variables, parameters...) into registers for max speed, then that quickly adds up. I was involved in a project once, where we attempted to do parameter passing during procedure calls through registers. Lightning fast, but quite limited.

 

Storing things in registers in the current PME obviously doesn't prevent other systems/compilers from doing the same. I was just pointing out that the current PME, in the 99/4A, does use these tricks already.

 

I've not used any NCG for the Z80, so I can't comment on it. I don't have any for the 9900 either, but I've seen listings of what it does.

 

It does for example convert this code

MOVB *PC+,R1
SRL  R1,7
MOV  @OPCODE(R1),R2
MOV  *R2+,R0
B    *R0
A    *SP+,*SP
B    *R12

into this

A    *SP+.*SP

Whether that is efficient or not can be discussed, but it should be obvious that the second version executes faster.

 

So yes, things could always be done differently. The p-system didn't prioritize speed, but portability. At that time, it was probably a decision that would come out the same once again, even if it today could have been different. Do you know how Turbo Pascal 4.0 generates code? I don't, but I've used it, and the Pascal it supports is very similar to the p-system IV (same major edition as is used in the 99/4A). Turbo Pascal is very fast, in both compilation and execution, so they must have used the tricks in the book.

But I don't know anything about the inside strategies used for it.

The problem is, other CPUs don't have such a feature rich instruction set for using the stack.
On the 6502 that's 6+ opcodes using their slowest addressing mode.

Direct addressing with a virtual register saves something like 12+ clock cycles.
 

Using an index register on the Z80 takes something like 24 "T States" per access, fewer if you use HL but  direct addressing is... 6? 9?
I'd have to look it up.

I have spent some time porting the P-Machine to the 6803. direct addressing would make a huge difference.
 


Edited by JamesD, Sat Sep 2, 2017 3:56 AM.


#63 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Sat Sep 2, 2017 8:50 AM

is there a description of how the GCC works, or do I have to look at code it creates or how do I go about?

I suggest you look at how GCC supports the 9900.  It's a good example of how a compiler allocates registers.

<snip>

I realize the P-Machine was created in the 70s and compiler technology was a bit primitive, but BASIC-09 on the OS-9 operating system used what they called an I-Machine which is similar, and it runs much faster.  So faster virtual machines existed even in the 80s well before the final version of UCSD Pascal.
And we certainly aren't in the 80s now.  I don't see why we can't point out shortcomings in the design based on what we know now.

Of course we can use today's knowledge when looking at old stuff. After all, there's a reason the p-system isn't around any longer.

But I consider it wrong to just say "that thing is bad" without asking oneself "why did they do it like that"?

 

As you point out, the TMS 9900 is actually one of the better to drive the PME, due to its instruction repertoire actually being designed as a mini-computer in the early 70's.



#64 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 760 posts
  • Location:Belgium

Posted Sat Sep 2, 2017 8:56 AM

is there a description of how the GCC works, or do I have to look at code it creates or how do I go about?

 

Insomnia has explained some of it in the gcc thread, specifically in the posts starting at number #16: http://atariage.com/...e-ti/?p=2033241

I think the actual register usage might have changed a little over time as he tweaked the design and fixed bugs, but the general principles still apply.


Edited by TheMole, Sat Sep 2, 2017 8:57 AM.


#65 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Sat Sep 2, 2017 9:30 AM

Yes, thank you, I did look around and found that thread. It's a bit as I expected. The compiler has to handle cases where you run out of registers (like for passing arguments) in a different way.

I've been through the same issues when designing compilers.



#66 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,734 posts
  • Location:Flyover State

Posted Sat Sep 2, 2017 11:12 AM

...

Of course we can use today's knowledge when looking at old stuff. After all, there's a reason the p-system isn't around any longer.

But I consider it wrong to just say "that thing is bad" without asking oneself "why did they do it like that"?

 

As you point out, the TMS 9900 is actually one of the better to drive the PME, due to its instruction repertoire actually being designed as a mini-computer in the early 70's.

We'll, the first thing we have to recognize is that the P-Machine works.  There's no denying that.
But they had at least a decade to improve performance.

Why did they do it that way?

When they started, it was about teaching Pascal and colleges and universities.  
That means it was meant for mini-computers and mainframes.  I think when they started, micros didn't even exist yet.

Portability and ease of implementation was important but they didn't care about speed.

It wasn't intended to be a consumer product that everyone would use.
It only became a general consumer product after Apple licensed it.  Before that it was really expensive.

 

 

There were a lot of hurdles to overcome for a virtual register design.
The P-Machine was designed for 64K and it wasn't until 82 that 64K became the standard memory configuration for a pc from any manufacturer.
Register allocation and code optimization requires quite a bit of code. It's not suited for a 64K environment.  
But once they started supporting over 64K, it should have become possible.
The Apple IIe was the first 8 bit that had a 128K upgrade from the factory, and it's a weird memory layout.  
3rd party upgrades before that were mostly used as a RAM disk.  They were also rare and expensive.

Almost every machine has a compiler that generates native code in the early 80s.
Native code is faster than a P-Machine unless they have some sort of native code translation.  
By the time it was really feasible, 16/32 bit CPUs were taking over and GUIs were the standard.  

A text based virtual machine that only directly addresses 64K was primitive.
UCSD would have required some sort of graphics support, native GUI support, better support for extended memory, automatic native code generation...
A lot of the technology we use today just didn't exist.  The GCC compiler that supports the 68hc11 didn't come out until 2005 or 2006.(?)
CC65 uses the concept but I'm not sure if it originally did in the 80s.  It's not super efficient though.
 

 

Yes, thank you, I did look around and found that thread. It's a bit as I expected. The compiler has to handle cases where you run out of registers (like for passing arguments) in a different way.

I've been through the same issues when designing compilers.

But the important thing, is that variables accessed the most reside in registers when they are in use so they can be accessed without some sort of indexed addressing.
Immediate addressing takes half the clock cycles of indexed addressing when you can use it on most 8 bits.
Anything that can be translated to immediate addressing, extended addressing, or direct addressing is going to be significantly faster than indexed addressing.
I would think it's also faster on the 9900.

*edit*
Immediate addressing would mostly be when translated to native code


Edited by JamesD, Sat Sep 2, 2017 11:40 AM.

  • RXB likes this

#67 RXB ONLINE  

RXB

    River Patroller

  • 2,798 posts
  • Location:Vancouver, Washington, USA

Posted Sat Sep 2, 2017 5:09 PM

 

Insomnia has explained some of it in the gcc thread, specifically in the posts starting at number #16: http://atariage.com/...e-ti/?p=2033241

I think the actual register usage might have changed a little over time as he tweaked the design and fixed bugs, but the general principles still apply.

So let me get this straight....Registers 13,14 and 15 allow for BL or BLWP and MULTIPLE SETS of REGISTERS is less efficient than a limited number of Registers?

As each GOSUB is BL or CALL / GOSUB is BLWP that means each GOSUB has it's own Register set to work with, is less effective than stack or a limited set of Registers?

 

Just spitballing why I think multiple Registers are one hell of a lot better idea than a set limit of Registers and Stack also has limits, in that Stack is user unfriendly to use.

Like Forth is a really great language....for computers not so much for humans to use. Forth is 100% designed around Stack, while C is designed around limited registers.

 

Really no one BUT Texas Instruments thought about unlimited Registers instead.

 

To bad no one ran with this since the 9900.


Edited by RXB, Sat Sep 2, 2017 5:18 PM.


#68 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Sun Sep 3, 2017 11:00 AM

It's also a question about the speed of memory vs. the speed of internal registers. At the time when TI took the "register-in-memory" path, a memory chip that was about as fast as the CPU had just been invented. So when they designed the multi-chip, multi-card CPU, the one that the TMS 9900 later implemented in silicon in one chip, it made sense to utilize the fact that you could just as well use memory as a limited set of internal registers.

Ten years later, processor technology outpaced memory, and the decision was already a bad one, from a speed perspective. But there are still not too many processor architectures, and almost none that's contemporary, that's as good as the TI 990/9 for the programmer. Of those I've programmed in assembly language, it's only the Digital PDP-11/VAX 32-bit architecture I've liked better.

 

The DSK 86000 (a CPU dedicated to robotics applications) implemented a similar idea, but used 16 registers out of a larger file or registers inside the CPU. But that was used only for some internal robotic applications.



#69 RXB ONLINE  

RXB

    River Patroller

  • 2,798 posts
  • Location:Vancouver, Washington, USA

Posted Sun Sep 3, 2017 11:37 AM

This limitation was created by RAM being so much slower and plugged into a side board with a whole set of other chips so you could never put registers thanks to INTEL or Motorola or others.

 

This bad lay out forced limited number of Registers and limited ability to create multiple paths,

 

 so today an insane amount of time is wasted by the CPU moving stuff around vs just all data/programs having a location with Registers inset into any location, overall much faster way to process.

 

Imagine this like making a log cabin, currently CPU today are like Tower of Hanoi constantly shuffling data to page in what is needed to be used with attempting to like tower of Hanoi swap positions.

 

A updated CPU designed like the TI 9900 would have same memory all plugged directly into the CPU, not in a separate box or away from the CPU like TI made the TI994/4A, would faster only 

 

due to not having to the Towers of Hanoi moving of data like current CPU are forced to do. After all they hamstrung the memory into being like a Disk Drive Memory.



#70 RXB ONLINE  

RXB

    River Patroller

  • 2,798 posts
  • Location:Vancouver, Washington, USA

Posted Sun Sep 3, 2017 11:38 AM

This limitation was created by RAM being so much slower and plugged into a side board with a whole set of other chips so you could never put registers thanks to INTEL or Motorola or others.

 

This bad lay out forced limited number of Registers and limited ability to create multiple paths,

 

 so today an insane amount of time is wasted by the CPU moving stuff around vs just all data/programs having a location with Registers inset into any location, overall much faster way to process.

 

Imagine this like making a log cabin, currently CPU today are like Tower of Hanoi constantly shuffling data to page in what is needed to be used with attempting to like tower of Hanoi swap positions.

 

A updated CPU designed like the TI 9900 would have same memory all plugged directly into the CPU, not in a separate box or away from the CPU like TI made the TI994/4A, would faster only 

 

due to not having to the Towers of Hanoi moving of data like current CPU are forced to do. After all they hamstrung the memory into being like a Disk Drive Memory.



#71 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,734 posts
  • Location:Flyover State

Posted Sun Sep 3, 2017 9:57 PM

It's also a question about the speed of memory vs. the speed of internal registers. At the time when TI took the "register-in-memory" path, a memory chip that was about as fast as the CPU had just been invented. So when they designed the multi-chip, multi-card CPU, the one that the TMS 9900 later implemented in silicon in one chip, it made sense to utilize the fact that you could just as well use memory as a limited set of internal registers.

Ten years later, processor technology outpaced memory, and the decision was already a bad one, from a speed perspective. But there are still not too many processor architectures, and almost none that's contemporary, that's as good as the TI 990/9 for the programmer. Of those I've programmed in assembly language, it's only the Digital PDP-11/VAX 32-bit architecture I've liked better.

 

The DSK 86000 (a CPU dedicated to robotics applications) implemented a similar idea, but used 16 registers out of a larger file or registers inside the CPU. But that was used only for some internal robotic applications.

It's at this point in time that I'm going to say I think the speed issue is total BS.  
There were machines clocked faster than the TI minis with internal registers.  

Cache, pipelining, etc... already existed.  That indicates CPUs were running faster than the RAM.  
Can anyone even cite who originally said this?
 



#72 RXB ONLINE  

RXB

    River Patroller

  • 2,798 posts
  • Location:Vancouver, Washington, USA

Posted Sun Sep 3, 2017 10:15 PM

I think we are looking long into the future of SYSTEM ON A SINGLE CHIP that is RAM + CPU + IO all on a single chip. This the only way for the future.



#73 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Mon Sep 4, 2017 2:38 PM

Can anyone even cite who originally said this?

 

I don't know. If we assume it ever was true, it must have been in the early 70's, as the TI 990/9 came out in the first half of the 1970's.

That was the first mini from TI with the memory-to-memory architecture.



#74 JamesD OFFLINE  

JamesD

    Quadrunner

  • 7,734 posts
  • Location:Flyover State

Posted Mon Sep 4, 2017 8:54 PM

I don't know. If we assume it ever was true, it must have been in the early 70's, as the TI 990/9 came out in the first half of the 1970's.

That was the first mini from TI with the memory-to-memory architecture.

I'll take that as a no.



#75 apersson850 OFFLINE  

apersson850

    Moonsweeper

  • 436 posts

Posted Tue Sep 5, 2017 3:18 AM

Suit yourself. I'm just saying that I don't know who came to that conclusion originally. I've just seen it repeated in many cases.




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users