Jump to content
IGNORED

First steps in TI9900 assembly language with prior knowledge of 6502


Dexter

Recommended Posts

Google for "TH Nürnberg Michael Zapf" :)

 

As for the 8-bit data bus, I was referring to the 9995 (used in the Geneve and TI-99/4A) and the fact that back in the past and obviously occasionally today as well, people talked about the TI-99/4A as an "8-bit computer" ... which makes me really, really upset. ;)

Edited by mizapf
Link to comment
Share on other sites

Yes indeed, as I'm also studying matthew's thread , I was refering to performance.

 

Note that you can use all 256 bytes of scratch pad (16-bit) RAM for you own programs if you turn off interrupts (LIMI 0) - this puzzled me when I started to write assembly. If you need to use disk access or other 'OS features' it's a different story, but you don't need any of this to write a game. If you want to write high performance game code my advice is: 1. Always keep you workspace and main variables in scratch pad. 2. Make all transfers to the VDP in very tight, unrolled loops. 3. Don't use the build-in KSCAN routine - it takes ages, and it's quite simple to read keyboard and joysticks using direct I/O (CRU). 4. Use a sprite rotating scheme (aka flicker algorithm). 5. Use the debugger in Classic99 to time your code and find any bottlenecks.

  • Like 2
Link to comment
Share on other sites

To the earlier question, I/O space. I would say this refers to the range >8000 - >9FFF, but it's all technically 8-bit. It's a little more than that on the schematic, but from the software side it doesn't matter, because the entire range triggers the wait states and the hardware is all meant to be accessed as bytes.

 

The other 16-bit memory in the system besides the scratchpad is the ROM, from >0000 - >1FFF

Link to comment
Share on other sites

I know of a CPU that can do relative branches through the entire address range, that’s a very nice feature.

But you increase the instruction size. There is always a trade-off. On the TMS9900 all instruction opcodes fit in a single 16-bit word, with some instructions having one or two additional words in memory following the opcode (depending on the addressing of the instruction or if it was an immediate instruction). The relative branch opcodes and their offsets fit in a single 16-bit word and are therefore more memory efficient. The larger a CPU's address range, the more efficient relative branch instructions become because they don't have the overhead of requiring a full address stored with the opcode.

Link to comment
Share on other sites

It is indeed interesting to learn how other platforms handle certain issues. I learned a bit about the MIPS platform, since this is part of the topics for the students in the first semester.

 

The MIPS platform is a RISC 32-bit platform. Its command width is 32 bits - always. There are no arguments outside of the command word. It only has three command formats; one for three registers (2 source, 1 destination), the second for immediate operations, the third for jumps. The first six bits are always reserved for the opcode.

 

The immediate value of immediate operations are stored in the lower 16 bits. Relative jumps in MIPS belong to the immediate operations. So the lower 16 bits are the jump offset, just like the one we know from the TMS9900, as a signed integer with 16 bits, but also referring to words, not bytes. Since the MIPS address space uses 32 bits, there is a similar issue as for our TI platform.

 

The Jump instruction (branch and jump take opposite roles, as in many other platforms) takes an own format with the first six bits reserved for the command, and the remaining 26 bits for the absolute jump address. However, 26 bits are not enough to reach all addresses.

 

Actually, jump addresses can only be word-aligned, so we have to add two zeros behind the 26 bits. Still, this is only a sixteenth of the address space, because the leftmost four bits are missing. What MIPS does is to keep them from the address where the jump operation is located. This, on the other hand, means that you may fail to jump forward a single word when your jump instruction happens to sit on the very last word of that segment. You can use relative jumps, though.

 

Of course there is a way to reach any address in the space, but it requires a register where you put the address. The command JR (jump register) resembles our B *Rx.

  • Like 1
Link to comment
Share on other sites

MIPS... I enjoyed the hell out of that. I still have SPIM installed on a couple of my machines.

 

In one assignment we had to write code for a complex project with a penalty of an "F" on the assignment if we stalled the pipeline at all.

Link to comment
Share on other sites

The one I'd love to play with more is conditional execution... so that you can do away with all the jumping over a single instruction, the instruction executes or doesn't by the condition bits. I can't remember what CPU had that, was it ARM? (I don't do as much modern assembly language as I should ;) ).

Link to comment
Share on other sites

Google for "TH Nürnberg Michael Zapf" :)

Very interesting “Fachgebiete”! Your TI must be connected with your home WLAN.

As for the 8-bit data bus, I was referring to the 9995

Ah OK, didn’t know that, so we’re lucky to have the TMS9900 instead of the 9995 in our computers. :) (read about the development not being ready in time for the 4A)

 

The ‘816 even has 24-bit’s address bus, al be it multiplexed, and 8-bit data bus. It’s also little endean instead of big. On the software side, it can be in 65C02 emulation mode, or in 16-bit native mode. Well, too much to put in a nutshell. See for your self. http://en.wikipedia.org/wiki/WDC_65816/65802

 

and the fact that back in the past and obviously occasionally today as well, people talked about the TI-99/4A as an "8-bit computer" ... which makes me really, really upset. ;)

Pure jealousy. :)

 

 

Note that you can use all 256 bytes of scratch pad (16-bit) RAM for you own programs

Yeah, I’m now fully aware now of the ini mini 128 words, perhaps even for small temporary loops for copying or moving data. (I’ve read about the 32KiB 16-bit RAM upgrade a few days ago) :-o

If you want to write high performance game code my advice is: 1. Always keep you workspace and main variables in scratch pad. 2. Make all transfers to the VDP in very tight, unrolled loops. 3. Don't use the build-in KSCAN routine - it takes ages, and it's quite simple to read keyboard and joysticks using direct I/O (CRU). 4. Use a sprite rotating scheme (aka flicker algorithm). 5. Use the debugger in Classic99 to time your code and find any bottlenecks.

I really appreciate those guidelines, especially while knowing those excelent four (or more) games you’ve written.

 

By sprite rotating scheme, you probably mean the 5th sprite not being showed while on the same scanlines. By rotating them, they show all 5, but just a little bit dim?

 

 

To the earlier question, I/O space. I would say this refers to the range >8000 - >9FFF, but it's all technically 8-bit. It's a little more than that on the schematic, but from the software side it doesn't matter, because the entire range triggers the wait states and the hardware is all meant to be accessed as bytes.

 

The other 16-bit memory in the system besides the scratchpad is the ROM, from >0000 - >1FFF

No, I had to express myself a bit more precisely. I was not sure about what routines were stored at >0000 - >1fff.

But according to this overview, it’s more clear now.

post-41771-0-39285700-1428082927_thumb.png

However, I’m still confused about the 3 “slots” :) console GROM’s and console ROM at >0000 - >1fff. If I’m counting right, that’s 4K of 16-bit ROM and 3 x 6KiB GROM? But it’s probably described in TI Intern.

 

But you increase the instruction size. There is always a trade-off. On the TMS9900 all instruction opcodes fit in a single 16-bit word, with some instructions having one or two additional words in memory following the opcode (depending on the addressing of the instruction or if it was an immediate instruction). The relative branch opcodes and their offsets fit in a single 16-bit word and are therefore more memory efficient. The larger a CPU's address range, the more efficient relative branch instructions become because they don't have the overhead of requiring a full address stored with the opcode.

I see what you mean, I actually had it wrong with the relative branches, as you'll see further ahead. :)

 

It is indeed interesting to learn how other platforms handle certain issues. I learned a bit about the MIPS platform, since this is part of the topics for the students in the first semester.

 

The MIPS platform is a RISC 32-bit platform. Its command width is 32 bits - always. There are no arguments outside of the command word. It only has three command formats; one for three registers (2 source, 1 destination), the second for immediate operations, the third for jumps. The first six bits are always reserved for the opcode.

 

The immediate value of immediate operations are stored in the lower 16 bits. Relative jumps in MIPS belong to the immediate operations. So the lower 16 bits are the jump offset, just like the one we know from the TMS9900, as a signed integer with 16 bits, but also referring to words, not bytes. Since the MIPS address space uses 32 bits, there is a similar issue as for our TI platform.

 

The Jump instruction (branch and jump take opposite roles, as in many other platforms) takes an own format with the first six bits reserved for the command, and the remaining 26 bits for the absolute jump address. However, 26 bits are not enough to reach all addresses.

 

Actually, jump addresses can only be word-aligned, so we have to add two zeros behind the 26 bits. Still, this is only a sixteenth of the address space, because the leftmost four bits are missing. What MIPS does is to keep them from the address where the jump operation is located. This, on the other hand, means that you may fail to jump forward a single word when your jump instruction happens to sit on the very last word of that segment. You can use relative jumps, though.

 

Of course there is a way to reach any address in the space, but it requires a register where you put the address. The command JR (jump register) resembles our B *Rx.

 

That’s a coincidence, my son had a class where he was introduced to an educational CPU called PP2. That was the processor I was referring to. It’s an FPGA soft processor. I don’t recall all details, but I’s based to fit in a Spartan 3E, it has 20K x 18-bit words of RAM, 8 general purpose registers. Code *and* data are relocatable! It’s a module placed on a board and has a few I/O’s, like buttons and LED’s, and can be build by yourselves when really desired. It does sound less interesting as I explain it, but I attached the main manual to this post, to get the idea. Perhaps you’ll enjoy it. For those who are interested, I have the permission to provide all documentations, emulator and cross assembler in java, everything necessary to recreate the whole system. Except for a PCB, but that’s not really essential.

 

I recalled incorrectly that it could branch across the whole address range, although it can branch pretty far. Because of its limited RAM, it actually can branch across *that* range.

 

I’m not exactly sure, but I think the instruction size is limited to 18 bits. So that is confirm with the optimization matthew180 wrote about.

 

I don’t know the MIPS platform, but it looks like a similar concept. (any specific link?)

 

 

Anyhow, I've learned a great deal in this thread! :thumbsup:

 

Tonight I hope to learn a few new instructions...

rh299 The PP2 Practicum Processor Architecture and Instruction Set.pdf

Link to comment
Share on other sites

...

However, I’m still confused about the 3 “slots” :) console GROM’s and console ROM at >0000 - >1fff. If I’m counting right, that’s 4K of 16-bit ROM and 3 x 6KiB GROM? But it’s probably described in TI Intern.

...

 

GROM space is memory mapped through CPU RAM >9800 (GROM read) and >9C00 (GROM write). There is quite a bit about this in the UberGROM thread, et al.

 

...lee

Link to comment
Share on other sites

Very interesting “Fachgebiete”! Your TI must be connected with your home WLAN.

 

No, it has not come that far, but indeed I'm quite lucky that my hobby contributes to my work and vice versa. :)

 

 

Ah OK, didn’t know that, so we’re lucky to have the TMS9900 instead of the 9995 in our computers. :) (read about the development not being ready in time for the 4A)

 

No, no, quite the opposite. We would have been lucky with the 9995, as we have it in the Geneve. The TMS9900 has another problem - it can only address full 16 bit word. The byte operations (like AB, MOVB etc.) only differ from the word operations inside the CPU. This ultimately meant that when you want to change a single byte, the CPU must first pull the complete word, then modify the byte, then write back the complete word. This read-before-write takes an awful lot of cycles, and it is one of the reasons why the 9995 is so much faster.

 

As I said, I had some opportunities to learn about other platforms, and this can be quite insightful. For example, after I understood why the TMS9900 uses this read-before-write I thought that this is necessarily the case for all processors that have byte commands.

 

Later I saw how the 8088 and later x86 processors dealed with this problem. Actually, these processors have separate control lines which are used to turn off or on parts of the data bus. The 80386, for example, has 30 address lines (A31-A2), no A1 and no A0 (note that TI used the inverse numbering). So this seems as if it could only address full 32-bit words. However, it also has four lines BE3-BE0 ("byte enable"), each of which controls a part of the data bus.

Link to comment
Share on other sites

We would have been lucky with the 9995 in the 99/4A's architecture. If the 99/4A has not been broken in so many ways to make the 9900 work in it's 8-bit design, the system could have been really nice.

 

The 9995 only has a few features that give it an advantage over the 9900:

 

* Opcode prefetch (however this only saves one cycle per instruction and is more cute than anything)

* No read before write

* Faster clock

 

It is really the last two that make the most difference, and all three of these can be corrected in an FPGA-based 9900.

 

It would have been nice if the 9900 could have provided the UB/LB (upper-byte / lower-byte) signals like other processors to avoid the read-before-write. Also note that it is not just the CPU that needs to provide the signals, but the memory subsystem needs to support them as well. If you look at the datasheet for modern SRAM and SDRAM memories, you will see that most are 16-bit data bus width and support the UB/LB masks.

Link to comment
Share on other sites

 

 

No, I had to express myself a bit more precisely. I was not sure about what routines were stored at >0000 - >1fff.

But according to this overview, it’s more clear now.

 

There's only a few things you can actually use, it wasn't actually set up to have user-addressable utilities. It's mostly the GPL interpreter. (Even the hardware init happens through GPL).

 

 

 

However, I’m still confused about the 3 “slots” :) console GROM’s and console ROM at >0000 - >1fff. If I’m counting right, that’s 4K of 16-bit ROM and 3 x 6KiB GROM? But it’s probably described in TI Intern.

 

Uh oh.. my terminology is spreading, there'll be hell to pay now. ;)

 

It's probably best to keep memory sizes in bytes, to avoid confusion. We don't tend to count the words on the TI... so, there is 8K of ROM and yes, 3 6K GROMs. (And yes, they are all nicely documented in TI Intern! I still use that book frequently.)

 

The ROM is pretty much just the interrupt handler (which is pretty locked down - if it's not a vertical blank, cassette, or DSR interrupt, it takes some trickery to receive it) and the GPL Interpreter. There's only one published version of the ROM known as far as I am aware (even on the 99/4!)

 

GROM0, the one at >0000, contains the boot code and the menuing system, as well as lookup tables for KSCAN and a few other helper functions.

 

GROM1 and GROM2, at >2000 and >4000 respectively, contain TI BASIC. The rest of GROM space (>6000 to >FFFF) is open. The other gotcha of GROM is it's accessed through pre-defined memory addresses, sometimes referred to as a memory base. (>9800, for instance). Incrementing that base by 4 (>9804) opens up a new bank of GROMs by the system defined inside the console. However, the cartridge needs hardware to differentiate the memory base in use, and the console doesn't, so those three GROMs respond to all bases. Most cartridges will as well.

Link to comment
Share on other sites

Sorry Lee, again failed to state that I wanted to know what’s *IN* those (G)ROM areas. Tursi explains it further ahead, and it’s indeed all perfectly laid out in TI-99 Intern.

 

OK, I had to look at the 9995’s datasheet, and indeed, I can see perfectly clear now. :)

 

Surely I was wondering about the 15 address lines, however that, and the read before write, is also cleared out now. The 80386 example shows it even more extremely.

 

So the TMS9995 is something to desire. Besides the Geneve, I saw two nice computers:

 

This one was brought to my attention by Ksarul

http://www.powertrancortex.com/index.html

 

And some thread here in the forum lead me to this one

http://www.avjd51.dsl.pipex.com/tms9995_breadboard/tms9995_breadboard.htm

 

 

The other gotcha of GROM is it's accessed through pre-defined memory addresses, sometimes referred to as a memory base. (>9800, for instance). Incrementing that base by 4 (>9804) opens up a new bank of GROMs by the system defined inside the console. However, the cartridge needs hardware to differentiate the memory base in use, and the console doesn't, so those three GROMs respond to all bases.

 

 

If I understand correctly, this is done by the 4 address lines A10..A13 attached to the ATmega1284P on the UberGROM cartridge?

 

Pffff, that’s a lot of enlightening information. :) I hope to be able to put it in practice soon.

Link to comment
Share on other sites

To see how the GROMs function in General, you might want to look at the discrete logic GROM simulation circuit described in the Bunyard Book (I just went looking online and didn't find a copy of it in either of the main book repositories, so I will have to dig mine out to scan it sometime). I also may have a schematic I updated for the Wiesbaden Supermodul which also outlines the logic behind it.

Link to comment
Share on other sites

Thanks Jim, I guess I just found it on web, and might have saved you the trouble of scanning it. :)

If it's a problem that I've uploaded it here, I'll delete it immediately.

 

Is it the one you're suggesting?

edit

hardware-manual-for-the-texas-instruments-99/4A Home Computer by Michael L. Bunyard, PE

 

Oh well, perhaps its better to do it the otherway around. If it's OK, I'll upload it here or some other solution...

/edit

Edited by Dexter
Link to comment
Share on other sites

That is the one--there is a nice appendix at the back with the circuitry used to simulate a GROM. Once you've followed the signaling through, the way a GROM works becomes a LOT easier to follow. Read Appendix D to see what I'm talking about here. . .I just found it on Archive.org. . .

Link to comment
Share on other sites

That is the one--there is a nice appendix at the back with the circuitry used to simulate a GROM. Once you've followed the signaling through, the way a GROM works becomes a LOT easier to follow. Read Appendix D to see what I'm talking about here. . .I just found it on Archive.org. . .

 

Is it the one that is very clear and OCR’d?

 

...lee

Link to comment
Share on other sites

Yes, it is!

Wow, it's a marvelous piece of documentation. Many thanks for the sugestion. :thumbsup:

 

I had two reasons for asking. One you answered. The other is to let you know that particular PDF (if, in fact, it is the one I have) has Appendices F & G not only in reverse order but also stuck between pages 8-6 & 8-7! :-o

 

...lee

Link to comment
Share on other sites

If I understand correctly, this is done by the 4 address lines A10..A13 attached to the ATmega1284P on the UberGROM cartridge?

 

That's correct. When the GROM select line goes high, we can look at the lower bits of the address bus (remember TI numbers backwards, A0 is the MSB and A15 is the LSB) to see which "base" the program was looking for. We skip A15 and A14 because we want offsets of 4 bytes. Checking 4 lines gives us 16 bases, which is all that the console routines check for, but Thierry Nouspikel (who documents a crapton of the system on his site) notes there is room in the memory map for 256 bases. :)

 

Since I mentioned A15, I'll also note (and I think you already saw that) that A15 is a 'fake' address line created by the multiplexer for 8 bit access -- the CPU doesn't have it. This means that for any memory access, two bytes are always accessed, and A15 will always toggle. This is why even hardware addresses that are close together in memory (like the bank switching we use) are spaced by 2 bytes instead of one.

Link to comment
Share on other sites

And a little side note on GROM base checks through the console's cartridge port: the routines in the 99/4A check for 16 bases, as Tursi said, with the rest of the 256 possible bases only available on the expansion bus--but the TI design for the 99/8 didn't follow the same schema. It still uses 256 possible GROM bases, but only 4 of those bases are available on the cartridge port. This isn't so important to most TI users due to the rarity of the 99/8, but it is a potential compatibility issue to remember.

Link to comment
Share on other sites

To see how the GROMs function in General, you might want to look at the discrete logic GROM simulation circuit described in the Bunyard Book.

Yes, I get a pretty good idea how they work now, in general and some details. As I understand it, the GROM’s are only for data or GPL programs, not for assembler. They’re also rather slow, that’s why it can be emulated by an AVR with only the internal 8MHz clock source. OTOH, they can hold a LOT of data. i.e. 16(bases) x 5(slots) x 6KiB = 480KiB. First three slots are always occupied.

 

 

 

Checking 4 lines gives us 16 bases, which is all that the console routines check for, but Thierry Nouspikel (who documents a crapton of the system on his site) notes there is room in the memory map for 256 bases. :)

 

 

And a little side note on GROM base checks through the console's cartridge port: the routines in the 99/4A check for 16 bases, as Tursi said, with the rest of the 256 possible bases only available on the expansion bus

So the other 240 bases are usable, but won’t be detected by the OS, are only available on the I/O port, and are not available on the UberGROM cart / AVR?

Link to comment
Share on other sites

The other 240 GROM bases are usable through the I/O port. The p-Code card for the PEB (or the ultra-rare sidecar version of the same) is the only device manufactured by TI that ever took advantage of this (it has 8 GROMs on it and takes over the system during the boot process), but the capability is also used by the Mechatronics GRAM Karte, the Horizon p-GRAM card, and the SNUG HSGPL card.

Link to comment
Share on other sites

Um, that's not true, Ksarul. All differentiating the bases requires is access to the address lines. To get 256 bases (all that is decoded in the hardware), you only need 8 lines, from A13 - A6 -- and those are most certainly available at the cartridge port.

 

The "16" comes from what the software in the console /actually/ searches when it builds the menu, there's no hardware reason for that limitation.

 

I can't speak to the 99/8 from experience, but unless the way GROMs work changed substantially, it should be true there too.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...