Assembly on the 99/4A

Airshack · September 8, 2017

Answering rather late, sorry.

I always put my workspace at >8300. No particular reason other than I have always done it that way and it is out of the way of using the rest of the scratch pad RAM for variable storage. IMO you should *always* use a workspace in the 16-bit scratch pad RAM, otherwise you pay a hefty performance penalty. As others have mentioned, if you are going to use console routines (ROM or GROM) or allow the console ISR to run, then you will need to know and respect the use of scratch pad based on those services.

That's pretty much what I gathered regarding >8300 being a choice because "I have always done it." I'm relieved there wasn't something I was missing. Of course, scratchpad RAM is the way to go for speed.

Assumption: Most guys leave LIMI 0 and poll the CRU bit for timing so...

May I assume the rest of the scratchpad is safe for variables? How about the ISR's Workspace at the end of the Scratchpad (>83E0 - >83FF)? Assuming it is usable as long as LIMI 0 *interrupt disabled ?

Edited September 8, 2017 by Airshack

sometimes99er · September 8, 2017

Assumption: Most guys leave LIMI 0 and poll the CRU bit for timing so...

Yes.

May I assume the rest of the scratchpad is safe for variables? How about the ISR's Workspace at the end of the Scratchpad (>83E0 - >83FF)? Assuming it is usable as long as LIMI 0 *interrupt disabled ?

Yes.

+Lee Stewart · September 8, 2017

... How about the ISR's Workspace at the end of the Scratchpad (>83E0 - >83FF)? ...

Not to put too fine a point on it, but >83E0 – >83FF is GPL workspace. ISR workspace is >83C0 – >83DF.

...lee

Airshack · September 8, 2017

Not to put too fine a point on it, but >83E0 – >83FF is GPL workspace. ISR workspace is >83C0 – >83DF.

...lee

Hey Lee,

Anything you have to add is ALWAYS appreciated. The details matter. Thank you for today's ounce of enlightenment.

I had no idea about ISR/GPL Workspace differences. So... I'm thinking with interrupts disabled, they're both areas I may use for variable space without concern? -j

+Lee Stewart · September 8, 2017

Hey Lee,

Anything you have to add is ALWAYS appreciated. The details matter. Thank you for today's ounce of enlightenment.

I had no idea about ISR/GPL Workspace differences. So... I'm thinking with interrupts disabled, they're both areas I may use for variable space without concern? -j

Yes.

You will only have a problem if you decide to enable interrupts at any time in your program. Then, there are a few(?) registers in ISR WS that probably need particular values—notably, R1 (various flags) and R2 (ISR hook). Since the ISR also uses (or causes use of) the GPL WS, you will also need to be sure R13 and R15 are restored to >9800 (or whatever different GROM base your program might need) and >8C02 (VDP RAM Write Address Register), respectively. This last comment applies also if you call any GPL routines, even without enabling interrupts. The graphic in post #364 shows details for the registers in both workspaces.

...lee

Asmusr · September 8, 2017

limi 0

lwpi >8300

do whatever you want

apersson850 · September 9, 2017

There is sort of a conflict of interest here.

The simplest context to write assembly language programs for are those that are completely self-supporting. They start, they run and do their own thing and they end by a system reset. Hence as long as you take control (don't allow interrupts), you can use all resources in the machine and do whatever you like.

Things get a bit more complex if you want to use routines in the machine as subroutines for your own code. If you want to use the floating point math, for example, then you must adapt to how they are written. If they use GPL WS, FAC and ARG areas in scratch pad RAM, then you can't store your things there and suspect them to survive.

It gets even more complex if you want to add assembly support routines to another environment, like Extended BASIC or Pascal. In such a case, you must respect the fact that these environments have mapped all of scratch pad RAM, and some other RAM too, for their own purpose, and only selected parts can be modified without ruining the capability to return to the calling environment.

For a game or major other application (like TI-Writer), the self-supporting route is no problem. But I've almost always found the best use of assembly to augment Extended BASIC or Pascal, and then you need to be careful. That's when consoles with all 16-bit RAM are good, as there's no speed penalty for me, regardless of where the workspace is or the code runs.

+TheBF · September 9, 2017

There is sort of a conflict of interest here.

The simplest context to write assembly language programs for are those that are completely self-supporting. They start, they run and do their own thing and they end by a system reset. Hence as long as you take control (don't allow interrupts), you can use all resources in the machine and do whatever you like.

Things get a bit more complex if you want to use routines in the machine as subroutines for your own code. If you want to use the floating point math, for example, then you must adapt to how they are written. If they use GPL WS, FAC and ARG areas in scratch pad RAM, then you can't store your things there and suspect them to survive.

It gets even more complex if you want to add assembly support routines to another environment, like Extended BASIC or Pascal. In such a case, you must respect the fact that these environments have mapped all of scratch pad RAM, and some other RAM too, for their own purpose, and only selected parts can be modified without ruining the capability to return to the calling environment.

For a game or major other application (like TI-Writer), the self-supporting route is no problem. But I've almost always found the best use of assembly to augment Extended BASIC or Pascal, and then you need to be careful. That's when consoles with all 16-bit RAM are good, as there's no speed penalty for me, regardless of where the workspace is or the code runs.

That is excellent advice Apersson850.

It's also important to balance the expected speed of your assembly code versus your XB or Pascal application speed.

My point is, even if you use 8 bit RAM for your workspace, the upgrade in speed is still many many times faster than your hi level language speed.

So I like the Steve Jobs approach:

"Make it work, then make it better"

B

apersson850 · September 9, 2017

Yes, and that pretty much describes my "TI life". When I bought the 99/4A, I was still in school. At that time, Pascal was THE language. C had just emerged and wasn't used for anything important. Algol had just stepped down from the top for technical/scientific applications.

So I pretty quickly acquired the expansion box, including the p-code card. After a couple of years, I had upgraded to four DS/DD disks, built my own clock and I/O card and installed 64 K 16-bit wide RAM inside the console. That of course required quite a lot of assembly programming to make it work, but on the other hand, I've hardly written any stand-alone assembly program. Some smaller ones, but mainly for technical applicaitons.

But I have made a large number of assembly support routines, both for the p-system and also for Extended BASIC, but then mainly with other people in mind, since many more had Extended BASIC than the p-system. Still, I had half a dozen or so of friends who all used Pascal on the 99. I've understood that it was a bit rare at that time that so many Pascal users actually knew each other, and three of use lived within a few hundred meters from each other.

Most of what I've used the TI for has been text based, and then the p-system is 40 column wide window to an 80 column wide screen from the beginning. But by using assembly I've for example implemented the ability to display pop-up messages on the screen, without disturbing what's there from the beginning. That's not too difficult in theory, since the 80 column screen is in low RAM, but the visible window in VDP RAM. So just write to VDP RAM to display the window, then restore the screen and the original text is back. This also means that the pop-up will be visible regardless of which of the three windows the user is displaying.

But the logic in the assembly routine to wrap words in strings so they are not cut at the end of the window is quite complex. But done in assembly to give an instant display of the message.

All my own hardware has RAM in DSR space. If they aren't battery backed, I need to reload the programs on startup. To make that easy, I wrote a loader (in Pascal), which loads object files into memory at any location, even if a CRU bit must be set to enable that memory. The loader benefits from the p-system's assembler's capability to generate code files with multiple procedures in one segment. Thus I can simply assemble a code file containing all the different code files I want loaded in one file. I included extra header information about memory and CRU address to load to, so the different procedures can end up on different cards, and also in my own internal RAM expansion (which is paged with CRU bits) in one single run.

But to write to a word in memory that's paged in by a CRU bit requires assembly. Thus there's the function crupoke(value,addr,crubase: integer): boolean; external; linked to that Pascal program. But all other stuff is done by Pascal, since it's fast enough in that case. Pascal is also able to read directly from any sector in a file, or any sector on a unit, so no assembly is needed to access the code file in my special way. Crupoke is a function instead of a procedure, since it not only writes to memory, but also reads back and verifies that there is actually RAM at the accessed location. If there isn't, the program simply skips to the next procedure to load in the code file. By specifying a CRU address of 4000H (they can't be larger than 3FFFH), it loads to normal RAM instead. If I set the CRU address to 8000H, it loads the code, or rather data, to VDP RAM.

This is what I consider a good example of where almost all of the logic of a program is in high level language, which is easier to write and debug, but that critical part that can't be done in Pascal is written in assembly.

MueThor · September 11, 2017

Hello all,

Unfortunately, nobody of you wanted to answer my questions posed in my message under the link http://atariage.com/forums/topic/218904-playground/page-6. I would be delighted, if anyone of you will be so nice to answer my questions, give an outline and/or links regarding the topic short machine language programs to be executed in scratchpad RAM. Hence I repeat the aforementioned message in the following:

Here (in the program Playground) a way was found to access scratchpad RAM directly from TI BASIC. It is surely possible to access this scratchpad RAM from XB and EA too, or? If so, can you return from scratchpad RAM to XB and EA? BTW, does there exist an overview of machine language addresses, which can be accessed from scratchpad RAM in dependance of the programming language used for accessing scratchpad RAM, i.e. TI BASIC, XB or EA?

Regards

Edited September 11, 2017 by MueThor

+Lee Stewart · September 12, 2017

Hello all, ...

Unfortunately, nobody of you wanted to answer my questions posed in my message under the link http://atariage.com/forums/topic/218904-playground/page-6. ...

See my response in the above link.

...lee

Airshack · September 24, 2017

On p.404-406 of the EA manual we see a detailed description of the Scratchpad usage with TI BASIC, console floating-point routines, GPL, interpreter, KSCAN, etc.

A question I have for Matthew is just how much of that CPU RAM is free for variable usage while using this thread's approach to speed optimized game programming?

More specifically, may I use >8300->83FF as needed if I avoid interfacing with BASIC and the console routines?

Thanks for your patience on this line of questioning. It's confusing me a bit.

-james

Sent from my iPhone using Tapatalk Pro

Tursi · September 24, 2017

If you avoid BASIC and the console routines then you can use all of >8300 - >83FF. Note that this includes the interrupt handler, so you'd have to leave interrupts off and instead poll for end of frame (if you need it).

Airshack · September 25, 2017

If you avoid BASIC and the console routines then you can use all of >8300 - >83FF. Note that this includes the interrupt handler, so you'd have to leave interrupts off and instead poll for end of frame (if you need it).

...and, as Lee says, >83C0 - >83DF are off limits if I elect to use LIMI 2 for the smooth-scrolling CRU bit timing methodology, right?

Vs using the less reliable VDP-centric timing methodology with LIMI 0, where >8300 - >83FF are free and clear of disruption?

Sent from my iPhone using Tapatalk Pro

Tursi · September 25, 2017

I'm a little unclear of the connection there... In fact I would say that your question addresses three separate (but related) concepts.

If you use LIMI 2 to enable interrupts, then /all/ scratchpad memory used by the console interrupt function is off limits. Not only >83C0 through >83DF, but also the GPL Workspace from >83E0 through >83FF. Settings certain bits in there just right can actually lock the console up in the interrupt routine.

If you use LIMI 0 and thus never enable interrupts, you can use all of scratchpad.

Now, this is (slightly) unrelated to how you poll the VDP for the end of frame. If you enable LIMI 2, of course, you should not poll the VDP at all - if an interrupt is active when you LIMI 2, then it will automatically jump to the interrupt code and run it for you.

If you run without interrupts, you can either use CRU bit 2 to check for an active interrupt pending from the VDP, or poll the VDP Status register. Due to race conditions inside the VDP, we tend to recommend the CRU method.

Now, whether it's less reliable is an interesting discussion. I would say it's not only NOT less reliable to poll, but faster overall. To defend that argument, I need to look at the organization of a typical TI program.

Typically, TI programs run with the interrupt disabled (LIMI 0). Then, at a central point of the program, interrupts are briefly enabled and then disabled. (LIMI 2/LIMI 0). If the VDP has an interrupt pending, the LIMI 2 will enable a vector to the interrupt handler.

If the program is structured in this way, then it is arguably equivalent to polling. In fact, if you have your own interrupt that you want to run, it's faster to poll, because there's no overhead used running the console code. Instead of "LIMI 2/LIMI 0", you would just have "TB 2/JNE xxx" (assuming you can keep R12 zeroed for the CRU ).

Both approaches let interrupts run with the same frequency and at the same point in the code. If the code runs for longer than one frame before enabling interrupts, then it will skip frames, the same as the TB version does. The only difference is whether it's the code or the 9900 interrupt circuit that looks at that CRU bit. (and, of course, in the TB version you jump directly to your own handler, instead of the ROM code running first).

Some systems and programmers DO run the other way, with the code running freely with interrupts enabled, and jumping to the interrupt handler any time they like. On the TI this is typically discouraged, though, since you can't safely access VDP memory in that case (as an interrupt will corrupt the VDP address without notifying you). It's totally possible to structure a program this way, however. In that case, polling is actually harder to do well. In fairness, though, I don't think many people code this way on purpose just because of the issues it causes with VDP access.

+Lee Stewart · September 25, 2017

... Some systems and programmers DO run the other way, with the code running freely with interrupts enabled, and jumping to the interrupt handler any time they like. On the TI this is typically discouraged, though, since you can't safely access VDP memory in that case (as an interrupt will corrupt the VDP address without notifying you). It's totally possible to structure a program this way, however. In that case, polling is actually harder to do well. In fairness, though, I don't think many people code this way on purpose just because of the issues it causes with VDP access.

TI Forth and fbForth run with interrupts enabled except when accessing the VDP or running any other selfish processes.

...lee

Airshack · September 25, 2017

I'm a little unclear of the connection there... In fact I would say that your question addresses three separate (but related) concepts.

>>>> Your patience with my multiple misunderstandings is appreciated.

If you use LIMI 2 to enable interrupts, then /all/ scratchpad memory used by the console interrupt function is off limits. Not only >83C0 through >83DF, but also the GPL Workspace from >83E0 through >83FF. Settings certain bits in there just right can actually lock the console up in the interrupt routine.

>>>> Got it! Two separate Workspace areas (16 registers * 2-bytes each) are reserved if I use interrupts: >83C0 through >83DF as well as > 83E0 through >83FF. That's a total of 64-bytes of the 256-byte Scratchpad unusable.

>>>> So... I can use the remaining 192 bytes as I wish, and I'll want 32-bytes usually beginning at >8300 for my program's own 32-bit Workspace.

>>>> This leaves 160-bytes of fast CPU RAM available for my program's variables when using LIMI 2, and an additional 64-bytes available if I choose to just LIMI 0 the whole deal.

If you use LIMI 0 and thus never enable interrupts, you can use all of scratchpad.

>>>> And LIMI 0 is the simpler way to go until you consider timing issues associated with polling the VDP. It's my understanding now that polling the VDP status register can result in missed end of frames. As in, it's an inaccurate method.

Now, this is (slightly) unrelated to how you poll the VDP for the end of frame. If you enable LIMI 2, of course, you should not poll the VDP at all - if an interrupt is active when you LIMI 2, then it will automatically jump to the interrupt code and run it for you.

>>>> Understood^^^^^.

If you run without interrupts, you can either use CRU bit 2 to check for an active interrupt pending from the VDP, or poll the VDP Status register.

>>> Yet the CRU Bit 2 method is superior to polling the VDP Status register method for smooth scrolling and music, right?

Due to race conditions inside the VDP, we tend to recommend the CRU method.

>>>> Understood^^^^. I recall it's something to do with the VDP Status being cleared temporarily which leads to skipped frames? Matthew addressed this point earlier in this thread.

Now, whether it's less reliable is an interesting discussion. I would say it's not only NOT less reliable to poll, but faster overall. To defend that argument, I need to look at the organization of a typical TI program.

Typically, TI programs run with the interrupt disabled (LIMI 0). Then, at a central point of the program, interrupts are briefly enabled and then disabled. (LIMI 2/LIMI 0). If the VDP has an interrupt pending, the LIMI 2 will enable a vector to the interrupt handler.

If the program is structured in this way, then it is arguably equivalent to polling. In fact, if you have your own interrupt that you want to run, it's faster to poll, because there's no overhead used running the console code. Instead of "LIMI 2/LIMI 0", you would just have "TB 2/JNE xxx" (assuming you can keep R12 zeroed for the CRU ).

>>>> This part is debatable I believe.

Both approaches let interrupts run with the same frequency and at the same point in the code. If the code runs for longer than one frame before enabling interrupts, then it will skip frames, the same as the TB version does.

>>>> Oh! I see your point! ^^^^ Skipped frames are difficult to avoid no matter which way you go.

The only difference is whether it's the code or the 9900 interrupt circuit that looks at that CRU bit. (and, of course, in the TB version you jump directly to your own handler, instead of the ROM code running first).

Some systems and programmers DO run the other way, with the code running freely with interrupts enabled, and jumping to the interrupt handler any time they like. On the TI this is typically discouraged, though, since you can't safely access VDP memory in that case (as an interrupt will corrupt the VDP address without notifying you).

>>>> You're going to have to LIMI 0 any time you write/read the VDP, and then LIMI 2 when done. So... more opportunity for missed frames.

It's totally possible to structure a program this way, however. In that case, polling is actually harder to do well. In fairness, though, I don't think many people code this way on purpose just because of the issues it causes with VDP access.

This conversation is adding clarity to the cloudy situation between my ears. Thank you! Of course, any additional clarification efforts are encouraged and appreciated in advance. -j

Sent from my iPhone using Tapatalk Pro

Edited September 25, 2017 by Airshack

Tursi · September 25, 2017

>>>> Got it! Two separate Workspace areas (16 registers * 2-bytes each) are reserved if I use interrupts: >83C0 through >83DF as well as > 83E0 through >83FF. That's a total of 64-bytes of the 256-byte Scratchpad unusable.

>>>> So... I can use the remaining 192 bytes as I wish, and I'll want 32-bytes usually beginning at >8300 for my program's own 32-bit Workspace.

That's only if you don't use any of the other ROM or GPL routines, of course. There are other reserved areas in scratchpad for those.

Due to race conditions inside the VDP, we tend to recommend the CRU method.

>>>> Understood^^^^. I recall it's something to do with the VDP Status being cleared temporarily which leads to skipped frames? Matthew addressed this point earlier in this thread.

There's an apparent race in the VDP that if you read the status byte exactly as it's setting the end of frame bit, it can clear it without ever reporting it to the CPU. As such, if you are checking for end of frame, it's probably better to check externally for interrupt and /then/ read the status register. (You have to anyway, to clear the interrupt).

If the program is structured in this way, then it is arguably equivalent to polling. In fact, if you have your own interrupt that you want to run, it's faster to poll, because there's no overhead used running the console code. Instead of "LIMI 2/LIMI 0", you would just have "TB 2/JNE xxx" (assuming you can keep R12 zeroed for the CRU ).

>>>> This part is debatable I believe.

The reason I spelled it out so carefully was to try to make it not debatable, but there's always another view.

If you are using your own interrupt routine, the minimum number of instructions before the console interrupt handler will call your code is 21 instructions. You can probably figure out how to branch in fewer than 21 instructions. That's the only point I was making.

>>>> Oh! I see your point! ^^^^ Skipped frames are difficult to avoid no matter which way you go.

>>>> You're going to have to LIMI 0 any time you write/read the VDP, and then LIMI 2 when done. So... more opportunity for missed frames.

I think you might be misunderstanding -- interrupts do not need to be enabled at the exact moment of end of frame -- the VDP will hold the signal until you explicitly clear it from the CPU. So as long as you always handle it before the /next/ end of frame, you won't miss one, even if your timing is inconsistent from frame to frame. You only miss one if another one comes along before you process it -- because the VDP only reserves one bit for end of frame. So it can't tell you how many elapsed.

Airshack · September 25, 2017

I think you might be misunderstanding -- interrupts do not need to be enabled at the exact moment of end of frame -- the VDP will hold the signal until you explicitly clear it from the CPU. So as long as you always handle it before the /next/ end of frame, you won't miss one, even if your timing is inconsistent from frame to frame. You only miss one if another one comes along before you process it -- because the VDP only reserves one bit for end of frame. So it can't tell you how many elapsed.

Yes, I was misunderstanding this point. Thanks for recognizing my error. I thought with the interrupt ON method we were somehow attempting to capture an end of frame as it happens.

Sent from my iPhone using Tapatalk Pro

Airshack · September 26, 2017

"So there is an apparent race in the VDP..."

WHAT'S A "RACE IN THE VDP" SUPPOSED TO MEAN?

Well... this PM helped clear things up:

In software (and in hardware too, I guess), a "race condition" exists when you have multiple contexts (for lack of a better word) interacting with the same data structure -- and correct operation of the system depends on the order in which the actions occur.

To give an example that is relevant to this case, the "end of frame" bit on the VDP has two separate contexts (this is my word, not terminology) acting on it:

First, when the TV picture reaches the bottom of the screen, this is the "end of frame". The VDP does a single action:

E1 - Set EOF bit in the status word

The second "context" is the CPU reading the status word. In that case, the VDP takes a sequence like this:

C1 - Copy the status bit to the CPU register

C2 - Clear the EOF bit in the status word

In normal circumstances, this works fine, but consider the case where they both happen at the same time. In particular, consider the case where "E1" happens immediately after "C1":

C1 - copy the status bit to the CPU register

E1 - Set EOF bit in the Status word

C2 - clear the EOF bit in the status word

You can see that just because the order in which the tasks interleaved, that you actually lose the EOF bit for that frame, because it's accidentally cleared by the CPU read handling.

This kind of thing happens in software ALL the time, and so it's a very common condition. The real time, truly parallel nature of hardware makes it a little less common, but it can still happen. And while I'm guessing at the internal sequence of the VDP there, based on the output we get something along those lines is probably happening.

It may also be true for the sprite collision and 5 on a line bits, too, but testing has been inconclusive. It's safest to assume that it can always happen.

Oh, and to be clear, it's called a "race" because the two contexts are seen as racing to finish first.

Sent from my iPhone using Tapatalk Pro Edited September 26, 2017 by Airshack

matthew180 · September 27, 2017

Well, it looks like Tursi and Lee have taken care of your questions, so I'm mostly just saying the same thing in a different way (not sure why, it is late, but I feel compelled to write something...)

The "race condition" in the VDP that Tursi mentioned is usually called a "hazard" in hardware, and we have some information from Karl Guttag (one of the 9918A designers who has graciously answered many of our questions via email) that the 9918A most likely does not deal with the hazard. Here is his reply:

Matthew,

I'm not sure I can be much help. I don't have the schematics and I did not work directly on that section of the 9918.

It was my first design at TI and I know that it was not robust in terms of dealing with asynchronous events (on my next project, the 9995, I remember spending some considerable effort on asynchronous hazards). To get the design to fit we did all the logic as minimally as possible. So it is possible that the designer did not worry about someone reading and resetting at the same time so long as it did not cause the chip to melt.

Somewhere deep in my mind, I think there might have been an issue with polling (but I really don't remember).

Sorry I couldn't be of more help,

Karl

In hardware the interrupt status bit would most likely be implemented as a form of an RS-flip-flop with asynchronous reset. Since hardware is truly parallel, the circuit that sets the interrupt bit (called the "Frame" flag) is completely separate from the hardware that implements the host CPU interface (i.e. the circuit that resets the interrupt bit). Typically in flip-flops the reset overrides any other action on the flip-flop (i.e. setting the flip-flop to '1'), so if the flip-flop is being set to '1' by the end-of-frame circuit, but also being reset by a host CPU status read, the reset will win and the end-of-frame status bit will be lost. This is the hazard that is probably not accounted for in the 9918A silicon.

To avoid this hazard, it is recommended that the host CPU only read the status bit after receiving the VDP's interrupt. Because the interrupt is generated by the Frame flag itself, by following this rule the host CPU should never be reading the status at the same time the Frame flag is being set. The host CPU should also read the status as quickly as possible after receiving the interrupt so it does not miss the next frame interrupt signal.

However, if the host CPU is constantly reading the VDP status in a loop (polling), it is possible to expose the hazard. Note that the F18A avoids this hazard.

In the 99/4A there are several ways to read and control the VDP interrupt:

First, the 9918A VDP itself can be set to generate, or not generate the Frame interrupt. Regardless if the VDP is set to generate the interrupt or not, the Frame bit in the VDP status register still works the same, i.e. it will be set at the end of the frame and reset when the status register is read, even if the interrupt output from the VDP is disabled. If you disable the interrupt output then you have no choice but to poll the status VDP status register.

Assuming the VDP interrupt is enabled, the interrupt itself goes to the 9901 I/O chip in the 99/4A. It is the 9901 that then sends the interrupt to the 9900 CPU. In this case there are two conditions based on the CPU's interrupt mask (set by LIMI 0 or LIMI 2).

If the CPU mask is set to allow interrupts, i.e. LIMI 2 has been used, then the VDP interrupt goes to the 9901, then to the CPU, and the console Interrupt Service Routine (ISR) runs. You can disable a lot of the standard ISR functions, but you still have some overhead (I think Tursi said 22-ish instructions before your code is even executed).

If the CPU mask is set to ignore interrupts, i.e. LIMI 0 has been used, then you have to poll for the VDP status. However, you have two choices for how to poll the VDP status, and this is where the nuance comes in. The VDP still generates its interrupt, and it still goes to the 9901, but if you poll the 9918A itself then you risk triggering the hazard describe previously.

However, instead of polling the 9918A, you can poll the 9901 to see if it has received the VDP interrupt, and if it has then you can then safely read the VDP status. Since reading the 9901 will not cause the 9918A interrupt status to change, polling the 9901 is safe. To remove the interrupt from the 9901, the source of the interrupt (in this case the VDP) has to remove its input from the 9901, which is accomplished when you read the VDP status.

So your code loop is something like:

turn off CPU interrupts
running:

  do stuff
  
  if 9901 interrupt
    read VDP status
    set end-of-frame flag for reset of program
  end if

  goto running

Usually I don't like to block on waiting for the interrupt. If my main loop is faster than the frame and the interrupt has not happened, I may still have work I can do. When the interrupt does come, all that happens is a flag is set that the rest of the code uses in the "do stuff" section of the program (which also resets the flag). Keeping your response to interrupts as small and fast as possible is a good practice, especially since in an ideal system this code would be a real ISR.

If you keep your "do stuff" part fast and break up long running tasks, you will have a more responsive program. Albeit, probably a more complicated program, but it is up to you how to structure your code.

Asmusr · September 27, 2017

TI Scramble and Road Hunter are both polling the VDP directly (in a tight loop), whereas my later games are polling the 9901. I have not tested any of them intensively on the old hardware (my usual test machine has a F18A), so the question is if there is any evidence that the race condition have a negative effect on the former games?

Tursi · September 27, 2017

TI Scramble and Road Hunter are both polling the VDP directly (in a tight loop), whereas my later games are polling the 9901. I have not tested any of them intensively on the old hardware (my usual test machine has a F18A), so the question is if there is any evidence that the race condition have a negative effect on the former games?

There is evidence in deliberate test applications that you can miss it -- but a single frame here and there at 60hz is pretty hard to perceive as a human. I once took advantage of that on a Playstation experiment by blocking every 5th (6th?) frame so that my title ran at 50hz even on an NTSC machine, so it'd be consistent across NTSC and PAL. I also had a MOD player on the Amiga that did the same thing. You couldn't tell in either case.

It's Sparky · October 26, 2017

Hey Guys/Girls

I really love this topic. Different authors, different angles on the same topic.

I fell in love with the 9900 years ago (yeah I'm too old to specifically write the exact number) and I am happy I'm not the only one.

Really like to contribute some stuff here, and got some questions too.

Maybe start with a simple one:

Let's assume I have r1 pointing to a byte in memory, and I want to test it for being zero (the byte that is!). What would be the shortest (fastest) way of doing that.

A possible solution might be something like:

MOVB  *R1,R2
JEQ   ITSZERO

This will (obviously) store the byte in R2 and set the flags. Apart from wasting a register (which is not that bad) I'm not really interested in writing the byte (back) to memory and don't want to spoil the cycles involved with it.

The pdp11 (a mini/micro processor similar to the 9900) had an instruction called tstb which actually tested a byte (compared it to zero):

tstb   (r1)
beq    itszero

Wondering if someone can shed some light on this!

PS:

Some funny details: the pdp11 used the term 'branch' for short jumps (the 9900 calls them 'jumps') and jumps for long jumps (the 9900 calls them branches)!

Furthermore (r1) means *R1, just a different notation.

Way back when I first wrote my TI99/4 assembly program I loved the capitals, right now I write my 9900 programs in lowercase!

Edited October 26, 2017 by It's Sparky

jens-eike · October 26, 2017

Maybe start with a simple one:

Let's assume I have r1 pointing to a byte in memory, and I want to test it for being zero (the byte that is!). What would be the shortest (fastest) way of doing that.

A possible solution might be something like:
MOVB  *R1,R2
JEQ   ITSZERO
This will (obviously) store the byte in R2 and set the flags. Apart from wasting a register (which is not that bad) I'm not really interested in writing the byte (back) to memory and don't want to spoil the cycles involved with it.

How about MOVB *R1,*R1 ?

Still testing for 0 without using any extra register, but slower (about 2 cycles).

Assembly on the 99/4A

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members