Session 25: Advanced Timeslicing

+Andrew Davie · May 20, 2012

Time is tight. Really tight! The general approach has been to think of the TV frame as the limitating factor for the capabilities of the machine. Whatever you can do in "one frame" (i.e, nominally @60Hz on NTSC or @50Hz on PAL)... that's IT. So in fact you can work out exaclty how much time you have to do stuff. As we've seen in earlier tutorials, the '2600 programmer has to pump data out to the TIA in synch with the TV as it's drawing scanlines. You need to feed the TV scanlines to draw a proper picture. There are 76 cycles per scanline, and 262 scalines per standard TV frame (312 for PAL). So 76 * 262 = 19912 cycles per frame. Multiply that by the NTSC frame rate (actually 59.94Hz) and you get.... 1193525.28 (i.e., there's our 1.19MHz CPU clock speed). It all makes sense.

So, just 262 lines. The visible screen is smaller than that, of course (usually 192 scanlines of actual graphics)-- so we only need to pump data to the screen for a smaller number of lines. The rest is black, nothing to see. See http://www.atariage.com/forums/topic/27192-session-6-tv-timing-diagram/ for a good visual diagram of where the time goes. So, during those blank lines, the CPU doesn't have to pump data to the screen. In fact these two major areas of "blackness" (that is, the vertical blank, and the overscan) account for 37 scanlines (*76 = 2812 cycles) and 30 scanlines (*76 = 2280 cycles). Now that's not exactly swimming in available CPU capacity but it's better than nothing. So the general usage of these blank areas has been to whack in "stuff" that takes a fair bit of time to do.

The problem is, you can't whack in too MUCH stuff. Because when those 37 scanlines of time have elapsed, you MUST be writing to the TIA again to make sure the next frame is displaying properly. Same for the 30 lines of overscan. There's no getting around it; you take too much time, and you stuff up the timing, and consequently the TV picture will roll, judder and basically look horrible. The hard and fast rule has been to simply stay within the limitation, or to reduce the number of visible scanlines to give more processing time for doing more complex STUFF. Each scanline of visible data you sacrificed, you got 76 scanlines of available time to do your stuff. A compromise.

Fortunately, we have the timer registers. Thes are single countdown registers that will regularly decrement a value written to them. I only use TIM64T -- this one counts 64 cycle blocks. If I write 10 to it, then I would expect it to reach 0 some 640 cyles later. So, the usage has been to calculate the amount of time before the screen drawing has to (re)commence, divide by 64, and put that value in TIM64T. By reading INTIM and waiting until that reaches 0, you effectively wait the right number of cycles. You can do your (variable time) "stuff" and not really care about how long it takes (as long as it doesn't take TOO long), and after it's finish you enter a tight loop just reading INTIM and waiting for it to go to 0. When it goes to 0, fire off a WSYNC and then begin the TV frame drawing once again.

That's how it's BEEN done, but that's not how I did it in Boulder Dash!

The INTIM register effectively tells you not only if you're out of time, but also exactly how MUCH time you have remaining (in blocks of 64 cycles if you're using TIM64T). So, if you think about it, you can actually make decisions about if you should call a subroutine based on this value. For example, say you had a small routine which you know takes (say) 1000 cycles to run. That's 1000/64 units (= 15.625). So, if INTIM was reading 16 or greater you KNOW you can call that subroutine and not run out of time! This gets rather nice. Given a guaranteed maximum run-time for any subroutine (and you get this by cycle-counting the subroutine very very carefully), you can use this knowledge to determine if/when it's appropriate to call that subroutine. Furthermore, after you HAVE called the subroutine, you can repeat the process -- look at INTIM and determine if there's enough time to run OTHER subroutines.

So the whole concept of '2600 programming basically changes here. Now we have an asynchronous system, where you have a queue of "tasks" that you have to do. These tasks in Boulder Dash are generally creature logic (process a boulder, the amoeba, etc). Each of these tasks are cycle-counted so we know exaclty how long the worst-case is. And each of these tasks is only run if there's available time. If not, then they simply return and in the next chunk of available time, they will be called again.

So, this is how the timeslicing engine works! Every part of the game logic is broken down into as small (quick) units of code as practicable. Rather than have the whole processing for an object in a single huge and costly block of code, where possible these are broken down into even smaller "sub-tasks". And those tasks are effectively placed in a queue which is processed by the task manager. The task manager is a tight loop which pulls a task off the task stack, vectors to the appropriate handler for the task, and repeats. The tasks themselves are responsible for deciding if there's enough time for them to do their own stuff (i.e., fairly object-oriented in that regard). If a task doesn't think there's enough time (again, by simply reading INTIM and comparing with it's own timing equate), it simply returns. If it has enough time to do its stuff, it does so and makes sure that it's no longer on the task queue. Tasks can even add other tasks to the queue, for later processing!

The upshot of all this is that a game doesn't have to be able to handle the very worst case most expensive thing ever in a single frame. The tasks split across multiple frames, if needed. In other words, there's now a separation between game logic (running over multiple frames if requried) and the frame display (running exactly at the TV frame rate). Yes, Virginia, '2600 games can slow down. Now for most situations this isn't ideal -- but in reality it doesn't really matter. Most of the gameplay for the '2600 Boulder Dash just never slows down. But occasionally, very occasionally (say, when an amoeba turns into 200 boulders and they all start falling at the same time) -- well, the system can handle it. Because although it may only have enough processing power to handle (say) 20 boulders in a single frame, that's OK, because the other boulders are effectively stacked and processed the next frame. And the queue may be really big for a few game loops, and the game will lag... probably not very noticeably... but when the queue is empty again, everything is back to running full speed.

So the above is the secret to making much more complex games than have heretofore been produced on the machine. You CAN keep the TV display going full speed (60Hz) while doing processing-intensive game logic. And you CAN do very very very complex game logic taking absolutely heaps of processing time. The trick, as noted, is to separate out the two so they are not synchronous -- and to divide the complex logic into discrete, very quick, sub-components.

Divide and conquer!

tokumaru · May 20, 2012

That's a pretty good idea! My only concern is that this might cause inconsistent images to be generated in some cases... Since the 2600 has barely enough RAM to keep track of the game's state (positions of objects, physics parameters, scores, etc.) and its view (sprite coordinates, graphics and color pointers, etc.), it isn't really possible to double buffer the display, so when an object is updated for the next frame but the other isn't, it might be a problem depending on the interaction between these objects.

For example, if you have a character riding a floating platform that waves sideways, ideally they would move in perfect sync every frame. However, if one of them decides to postpone its updates for the next time, the next rendered picture will show one object in the new position and the other in the old position, which represents an inconsistent game state. In this particular case, either the platform or the character would appear to be jittering relative to the other (if CPU time is scarce for several consecutive frames), while they should be moving together smoothly.

I can't think of many cases when this would be a problem though, specially if updates aren't postponed too frequently. This should be really useful for anyone trying to write more complex game logic but were too afraid of extrapolating the time reserved for it. I will probably be doing something like this in my game, so thanks for the tip! =)

+Random Terrain · May 20, 2012

Here's my first try at adapting Session 25:

www.randomterrain.com/atari-2600-memories-tutorial-andrew-davie-25.html

SeaGtGruff · May 20, 2012

For example, if you have a character riding a floating platform that waves sideways, ideally they would move in perfect sync every frame. However, if one of them decides to postpone its updates for the next time, the next rendered picture will show one object in the new position and the other in the old position, which represents an inconsistent game state. In this particular case, either the platform or the character would appear to be jittering relative to the other (if CPU time is scarce for several consecutive frames), while they should be moving together smoothly.

If you have two things that absolutely need to be handled together-- as in your example-- then don't do one unless there's time to do both. Also, you can set higher priorities on things that absolutely must be done every frame.

tokumaru · May 21, 2012

If you have two things that absolutely need to be handled together-- as in your example-- then don't do one unless there's time to do both.

True, we should always make sure to group together tasks that are dependent on each other.

Also, you can set higher priorities on things that absolutely must be done every frame.

Yeah, I imagine that first of all we'd do all the stuff that absolutely must be done, and then the remaining time would be distributed like Andrew suggested. This really is a very gracious way to handle eventual slowdowns.

+Gemintronic · May 22, 2012

Any more detail on how the task queue works? Is it an array with process names? It seems shorter tasks would get run more often thus messing with game dynamics. Also, what would prevent tasks earlier in the code from getting executed more often?

Is this an example of proto threads?

http://en.wikipedia.org/wiki/Protothreads

Edited May 22, 2012 by theloon

+Andrew Davie · May 22, 2012

The task queue will be specific to a game. But I can speak to the engine used for Boulder Dash, if you like.

First there are a number of "objects" or creatures. These include the player, boulders, diamonds, etc. They are all processed from an "object stack". Only the top entry on the stack is processed, and if there's not enough time for it, then the task manager simply aborts. So the object queue is processed in order, effecively LIFO. But there are other things in the game that need to happen, too. Stuff like drawing objects on the screen. So there's another task queue for drawing stuff. If the object queue doesn't have enough time to do sometihng, then the draw queue has a go at doing stuff (drawing stuff is generally quick and only requires small timeslices). Once the draw queue runs out of time, well there are other queues which kick in. Stuff like sorting the objects, for example. You can't do much of a sort in the vertical blank, but you can do a little bit. So a simple bubble sort is very easy to break down into very small independent sections; so that's what's done there.

What I'm basically saying is that depending on the game, some things will run every frame, some things will run in order, some may run out of order only when there's time available. It's up to you.

In my engine, objects are processed ASAP, and inbetween those, the character drawing takes the spare time, getting a look-in where it can. These are totally asynchronous events; neither depends on the other. Shorter tasks don't get run "more often" if you only put tasks on a queue when it's their turn to run. And again, putting everything on a queue makes sure that stuff that shouldn't run more often.... doesn't. Some stuff you want to run as often as possible... e.g., screen draw updates.

Cheers

A

RevEng · May 22, 2012

Is this an example of proto threads?
http://en.wikipedia....ki/Protothreads

Protothreads is another specific implementation of time slicing, but it's not really well suited to the 2600.

For beginners reading this tutorial, it's worth mentioning that the basic (traditional, less-flexible, synchronous) way of time-slicing on the 2600 is just hand-picking which tasks will happen on which frames...

lda framecounter
and #%00000011 ; Only use the lower 2 bits of the frame counter. The result will be a count in the sequence 0,1,2,3,0,1,2,3,...
cmp #0 ; Not required, but included for clarity
beq checkcollisions ; Check collisions on the first frame of every 4 frame set.
cmp #1
beq updateplayfield ; Update the playfield on the second frame of every 4 frame set.
cmp #2
beq throwbanana ; Ditch the banana on the third frame of every 4 fame set.
; The kill-the-witch routine follows, which happens on the fourth frame of every 4 frame set...

zilog_z80a · October 24, 2017

Protothreads is another specific implementation of time slicing, but it's not really well suited to the 2600.

For beginners reading this tutorial, it's worth mentioning that the basic (traditional, less-flexible, synchronous) way of time-slicing on the 2600 is just hand-picking which tasks will happen on which frames...
	lda framecounter
	and #%00000011 ; Only use the lower 2 bits of the frame counter. The result will be a count in the sequence 0,1,2,3,0,1,2,3,...
	cmp #0 ; Not required, but included for clarity
	beq checkcollisions ; Check collisions on the first frame of every 4 frame set.
	cmp #1
	beq updateplayfield ; Update the playfield on the second frame of every 4 frame set.
	cmp #2
	beq throwbanana ; Ditch the banana on the third frame of every 4 fame set.
	; The kill-the-witch routine follows, which happens on the fourth frame of every 4 frame set...

hi, where can be this example code be placed? , vblank and overscan too?

when we talk about (multiple frames) is there any file to look at and see the use of this method?

pls, is there some kind of example about this?

ty in advance.

zilog_z80a · October 12, 2020

On 10/24/2017 at 5:15 PM, zilog_z80a said:

hi, where can be this example code be placed? , vblank and overscan too?

when we talk about (multiple frames) is there any file to look at and see the use of this method?

pls, is there some kind of example about this?

ty in advance.

lol

+Andrew Davie · October 13, 2020

19 hours ago, zilog_z80a said:

lol

I hesitate to reply to a "lol" but...
It's complex to find where everything is, but the Sokoboo source code is a complete implementation of the described method.

It might be a bit much -- not packaged as a stand-alone.... but anyway have at it...

https://github.com/andrew-davie/Sokoboo

zilog_z80a · October 13, 2020

36 minutes ago, Andrew Davie said:

I hesitate to reply to a "lol" but...
It's complex to find where everything is, but the Sokoboo source code is a complete implementation of the described method.

It might be a bit much -- not packaged as a stand-alone.... but anyway have at it...

https://github.com/andrew-davie/Sokoboo

hi @Andrew Davie i laugh of my self, the example was in front my eyes in 2017 and me asking for an example.

cheers.

Sokoboo ty for that!!

Edited October 13, 2020 by zilog_z80a

Session 25: Advanced Timeslicing

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members