Time is tight. Really tight! The general approach has been to think of the TV frame as the limitating factor for the capabilities of the machine. Whatever you can do in "one frame" (i.e, nominally @60Hz on NTSC or @50Hz on PAL)... that's IT. So in fact you can work out exaclty how much time you have to do stuff. As we've seen in earlier tutorials, the '2600 programmer has to pump data out to the TIA in synch with the TV as it's drawing scanlines. You need to feed the TV scanlines to draw a proper picture. There are 76 cycles per scanline, and 262 scalines per standard TV frame (312 for PAL). So 76 * 262 = 19912 cycles per frame. Multiply that by the NTSC frame rate (actually 59.94Hz) and you get.... 1193525.28 (i.e., there's our 1.19MHz CPU clock speed). It all makes sense.
So, just 262 lines. The visible screen is smaller than that, of course (usually 192 scanlines of actual graphics)-- so we only need to pump data to the screen for a smaller number of lines. The rest is black, nothing to see. See http://www.atariage....timing-diagram/
for a good visual diagram of where the time goes. So, during those blank lines, the CPU doesn't have to pump data to the screen. In fact these two major areas of "blackness" (that is, the vertical blank, and the overscan) account for 37 scanlines (*76 = 2812 cycles) and 30 scanlines (*76 = 2280 cycles). Now that's not exactly swimming in available CPU capacity but it's better than nothing. So the general usage of these blank areas has been to whack in "stuff" that takes a fair bit of time to do.
The problem is, you can't whack in too MUCH stuff. Because when those 37 scanlines of time have elapsed, you MUST be writing to the TIA again to make sure the next frame is displaying properly. Same for the 30 lines of overscan. There's no getting around it; you take too much time, and you stuff up the timing, and consequently the TV picture will roll, judder and basically look horrible. The hard and fast rule has been to simply stay within the limitation, or to reduce the number of visible scanlines to give more processing time for doing more complex STUFF. Each scanline of visible data you sacrificed, you got 76 scanlines of available time to do your stuff. A compromise.
Fortunately, we have the timer registers. Thes are single countdown registers that will regularly decrement a value written to them. I only use TIM64T -- this one counts 64 cycle blocks. If I write 10 to it, then I would expect it to reach 0 some 640 cyles later. So, the usage has been to calculate the amount of time before the screen drawing has to (re)commence, divide by 64, and put that value in TIM64T. By reading INTIM and waiting until that reaches 0, you effectively wait the right number of cycles. You can do your (variable time) "stuff" and not really care about how long it takes (as long as it doesn't take TOO long), and after it's finish you enter a tight loop just reading INTIM and waiting for it to go to 0. When it goes to 0, fire off a WSYNC and then begin the TV frame drawing once again.
That's how it's BEEN done, but that's not how I did it in Boulder Dash!
The INTIM register effectively tells you not only if you're out of time, but also exactly how MUCH time you have remaining (in blocks of 64 cycles if you're using TIM64T). So, if you think about it, you can actually make decisions about if you should call a subroutine based on this value. For example, say you had a small routine which you know takes (say) 1000 cycles to run. That's 1000/64 units (= 15.625). So, if INTIM was reading 16 or greater you KNOW you can call that subroutine and not run out of time! This gets rather nice. Given a guaranteed maximum run-time for any subroutine (and you get this by cycle-counting the subroutine very very carefully), you can use this knowledge to determine if/when it's appropriate to call that subroutine. Furthermore, after you HAVE called the subroutine, you can repeat the process -- look at INTIM and determine if there's enough time to run OTHER subroutines.
So the whole concept of '2600 programming basically changes here. Now we have an asynchronous system, where you have a queue of "tasks" that you have to do. These tasks in Boulder Dash are generally creature logic (process a boulder, the amoeba, etc). Each of these tasks are cycle-counted so we know exaclty how long the worst-case is. And each of these tasks is only run if there's available time. If not, then they simply return and in the next chunk of available time, they will be called again.
So, this is how the timeslicing engine works! Every part of the game logic is broken down into as small (quick) units of code as practicable. Rather than have the whole processing for an object in a single huge and costly block of code, where possible these are broken down into even smaller "sub-tasks". And those tasks are effectively placed in a queue which is processed by the task manager. The task manager is a tight loop which pulls a task off the task stack, vectors to the appropriate handler for the task, and repeats. The tasks themselves are responsible for deciding if there's enough time for them to do their own stuff (i.e., fairly object-oriented in that regard). If a task doesn't think there's enough time (again, by simply reading INTIM and comparing with it's own timing equate), it simply returns. If it has enough time to do its stuff, it does so and makes sure that it's no longer on the task queue. Tasks can even add other tasks to the queue, for later processing!
The upshot of all this is that a game doesn't have to be able to handle the very worst case most expensive thing ever in a single frame. The tasks split across multiple frames, if needed. In other words, there's now a separation between game logic (running over multiple frames if requried) and the frame display (running exactly at the TV frame rate). Yes, Virginia, '2600 games can slow down. Now for most situations this isn't ideal -- but in reality it doesn't really matter. Most of the gameplay for the '2600 Boulder Dash just never slows down. But occasionally, very occasionally (say, when an amoeba turns into 200 boulders and they all start falling at the same time) -- well, the system can handle it. Because although it may only have enough processing power to handle (say) 20 boulders in a single frame, that's OK, because the other boulders are effectively stacked and processed the next frame. And the queue may be really big for a few game loops, and the game will lag... probably not very noticeably... but when the queue is empty again, everything is back to running full speed.
So the above is the secret to making much more complex games than have heretofore been produced on the machine. You CAN keep the TV display going full speed (60Hz) while doing processing-intensive game logic. And you CAN do very very very complex game logic taking absolutely heaps of processing time. The trick, as noted, is to separate out the two so they are not synchronous -- and to divide the complex logic into discrete, very quick, sub-components.
Divide and conquer!