Jump to content
IGNORED

Assembly on the 99/4A


matthew180

Recommended Posts

On another level, the p-system can do the same thing, but also allows cooperation between assembly and Pascal programs.

The assembler allows definition of procedures and functions that can be called by other assembly programs, but also from Pascal. It also allows you to define data that's accessible from the outside (from Pascal), as well as allow referencing global data, that's declared in a Pascal program, from assembly level.

To make this possible does require that the linking is done separately, before loading. Thus there's no linking loader that can handle these kind of cross references. You link the Pascal and assembly programs together and produce a code file, containing them all. Then that's the code file you execute.

 

The p-system does have the same kind of link and load (actually load and link) capability too, but then it's between a Pascal program and separately compiled units. These units are resident in some library file, which is referenced from the main program. The referenced units will then be loaded by the operating system, as much as is necessary to find them, at load time. The code in the units will only be loaded when it's actually used, and can be rolled out from memory again, if other code segments need the space (and the first one isn't used any more, of course). It's also possible to break up both units and your own programs into segments, which will be loaded when referenced.

  • Like 3
Link to comment
Share on other sites

That's why I liked it best for program development. A standard program isn't as fast as Forth, but much faster than Extended BASIC, when it executes. But if you time the total time from idea until debugged implementation, then it competes favorably.

Since it's easy to convert whatever procedure is needed to assembly, and you can have these assembly routines inside a precompiled unit, it's also a pretty slick procedure to develop a working solution, using Pascal only program and library. If you want to run it many times, you can start converting critical procedures, either in your main program or in the library, or in both, to assembly, as you prefer.

  • Like 4
Link to comment
Share on other sites

As for the EA3 and EA5 topic, and the relationships to exe/elf files vs. com files, we learned from our TI systems that the work is not done until you have a EA5 program file. All the EA3 files somehow look as if someone was unable or unwilling to do this last step, and we wondered why TI actually offered a loader for EA3 in the first place.

 

Comparing with the PC world, I found it interesting to see that the EA3 format makes much more sense. The only (and of course relevant) reason to go to EA5 is loading speed. If EA3 loaded at a much faster pace, there were no reasons for EA5 at all.

 

Also, in PC operating systems, processes have segments like text segment (program), data segment (memory locations with given values), stack segment, heap segment, and BSS segment. The latter indeed reminds us of the BSS assembly directive. In fact, the exe/elf and other relocatable code files just define the lengths of reserved memory blocks (BSS) and skip this amount of memory on loading, without having to store the whole reserved area in the file.

  • Like 1
Link to comment
Share on other sites

 

You don't have to decide to go with one or the other. Initially you might want to use EA3 because converting to EA5 is often an extra step (depending on the assembler). If you want to end up with a cartridge you need to convert to EA5 at some point anyway, but as long as loading time is not an issue EA3 is fine for development.

Thanks! That’s a relief and I understand things a little better after re-reading the EA manual, 24.5, p.420 — SAVE Utility.

 

EA5 is simply loading a memory image formatted file. No need to differentiate between the two during development as you can always save the file for EA5 at the end.

 

I’ve been developing using the EA3 option and ignoring EA5, so basically just curious.

 

 

Sent from my iPhone using Tapatalk Pro

Link to comment
Share on other sites

What is the technique to synchronize screen updates with the vertical sync?

To avoid tearing by trying to draw in between video frames.


My game loop is like this:


LOOP

enable interrupts briefly

write screen and sprites to VDP

KSCAN

update internals

B @LOOP


I'm not sure the VDP writes still take more than 1/60th of a second, but I'm working on that.

3 address changes for 3 blocks of 128 consecutive bytes seems like it could fit into one refresh cycle.

2 writes to Bitmap name table and another to sprite attribute table.
Link to comment
Share on other sites

 

What is the technique to synchronize screen updates with the vertical sync?
To avoid tearing by trying to draw in between video frames.
My game loop is like this:
LOOP
enable interrupts briefly
write screen and sprites to VDP
KSCAN
update internals
B @LOOP
I'm not sure the VDP writes still take more than 1/60th of a second, but I'm working on that.
3 address changes for 3 blocks of 128 consecutive bytes seems like it could fit into one refresh cycle.
2 writes to Bitmap name table and another to sprite attribute table.

 

 

I honestly wouldn't worry about vertical sync issues until you observe them. Until you're trying to blast bitmap graphics in real-time you don't really have an issue with this in assembly. (Coming from Basic and Extended BASIC it's easy to think "But sprites and graphics are SLOW!")

 

Your best method of handling the VDP is to treat it like an output device, like a printer. Don't try and figure out WHAT you're writing in the midst of writing to VDP. Use buffers in CPU and pre-calculate so you can just blast it all out in one large block without stopping. You can also use a alternate screen in VDP and do swaps with a single VDP register change.

Link to comment
Share on other sites

 

I honestly wouldn't worry about vertical sync issues until you observe them. Until you're trying to blast bitmap graphics in real-time you don't really have an issue with this in assembly. (Coming from Basic and Extended BASIC it's easy to think "But sprites and graphics are SLOW!")

 

Your best method of handling the VDP is to treat it like an output device, like a printer. Don't try and figure out WHAT you're writing in the midst of writing to VDP. Use buffers in CPU and pre-calculate so you can just blast it all out in one large block without stopping. You can also use a alternate screen in VDP and do swaps with a single VDP register change.

 

I think I am seeing some tearing. And I am enjoying unrolling loops to just blast it all out!

 

 

Hmm, I forgot that I could move the name table from >1800 to >3800 to do page flipping.

 

But isn't there something with reading VDPSTA? I did something with the 9938 back in the day to synchronize page flipping (register write) with vertical retrace, because it was tearing even more obviously during a page flip.

Link to comment
Share on other sites

 

I think I am seeing some tearing. And I am enjoying unrolling loops to just blast it all out!

 

 

Hmm, I forgot that I could move the name table from >1800 to >3800 to do page flipping.

 

But isn't there something with reading VDPSTA? I did something with the 9938 back in the day to synchronize page flipping (register write) with vertical retrace, because it was tearing even more obviously during a page flip.

 

What exactly are you writing to VDP? Can you share some code?

Link to comment
Share on other sites

There are two ways of polling for vertical sync. The first is by reading the most significant bit of the VDP status register. The second is by reading CRU bit 2.

 

I have used vsync in all my games and usually also some kind of double buffering/page flipping. For scrolling I find it essential in order to avoid screen tearing.

  • Like 2
Link to comment
Share on other sites

 

What exactly are you writing to VDP? Can you share some code?

 

I've got 2 ideas going:

 

one is re-writing a 16x16 area in the bitmap name tables to cycle through 8 versions of the characters, to make it scroll one pixel at a time. This is 16 writes of 16 bytes each to the name table.

 

the other is rewriting the corresponding pattern table (hopefully leaving color alone) which is 2k of data.

I've organized the name table so that it increments characters going down, not across. That way I can blast 64 bytes for a precomputed column of 8x1, then do the next, without setting up the address again.

 

Doing the second, it takes more time than one refresh to write 2k, so yes it's tearing obviously. It's going at about 4 frames per second.

Link to comment
Share on other sites

Okay so you ARE blasting bitmap then. :)

 

What exactly are the name tables? There is the pattern table, the color table, the screen table, sprite pattern table and sprite attribute table.

 

Regardless, the first thing I would do is make sure my screen table is right next to the pattern table at >1800. Then I would have a 2.75k buffer in CPU and make sure all my changes are there. I'd then use 16 bit registers in scratch pad and put the VDP ports IN registers for maximum speed and just write the 2.75k in one linear write.

Link to comment
Share on other sites

There are two ways of polling for vertical sync. The first is by reading the most significant bit of the VDP status register. The second is by reading CRU bit 2.

 

I have used vsync in all my games and usually also some kind of double buffering/page flipping. For scrolling I find it essential in order to avoid screen tearing.

You prefer reading CRU bit 2 over reading the VDP status register, due to the hardware race or hazard (which results in occasionally missing a vsync) mentioned earlier in this thread, right?

 

 

 

 

Sent from my iPhone using Tapatalk Pro

  • Like 2
Link to comment
Share on other sites

 

Yes, I believe that was the conclusion.

So let’s say I successfully poll the CRU for vsync and eventually detect one...

 

In my code, I now know I’ve hit a 1/60th of a second NTSC (or 1/50th second, PAL) event. I believe that’s it.

 

This vsync event now triggers a decision tree gate which runs a specific segment of code. Perhaps I decide to do three things when vsync is detected:

 

A. Update the player character position.

B. Hit my roll-ur-own sound routine to update background music.

C. Refresh the game scoreboard.

 

How do I know there’s enough time to do these things between vsyncs?

 

How many tasks may I tie to the vsync event?

 

I’m thinking about counting vsyncs and enabling sub-events based on determined vsync counts...

 

Am I attempting to process as much as I can before the CRT beam travels from bottom right to top left on the screen?

 

I guess what I’m looking for is a few tips and techniques on the topic.

 

The idea of using the vsync signal as a constant interval timer is manageable.

 

If I try to do too much after a vsync is detected I’m thinking I’ll miss the next vsync, and my interval timer will then become inconsistent and result in stuttering video/audio?

 

 

 

 

Sent from my iPhone using Tapatalk Pro

Edited by Airshack
  • Like 1
Link to comment
Share on other sites

A game like TI Scramble runs the entire main loop between two vsyncs, so you have more time than you may think.

 

You have 50000 clock cycles in NTSC between two vsyncs, and you can use Classic99 to measure how much time you have used: Add a breakpoint with the start and end addresses of you main loop, e.g. T(A024-A038).

 

Updates to the display you should try to finish as soon as possible after vsync. If you use two name tables as a double buffer you can update alternating tables each frame and just do a page flip right after vsync. I have even double buffered the sprite attribute table this way.

 

Timing of sound playing is less important, I often do it after all the other work has been done.

 

If you run over the frame, perhaps deliberately, remember to read the VDP status register once to clear the interrupt before you wait for the next one. Otherwise your polling loop will terminate immediately in the middle of a frame. You must also read the VDP status after detecting the interrupt, like this:

vsync:
       movb @vdpsta,r12
       clr  r12
vsync_1:
       tb   2                          ; Test CRU bit for VDP interrupt
       jeq  vsync_1
       movb @vdpsta,r12
       rt

In Flying Shark the main loop takes up to 4 frames depending of the complexity of the graphics. In that game I had to count the frames and add delays in order to obtain an even pace, which was a real pain. If possible try to do approximately the same amount of work each frame. That doesn't mean you have to do the same things each frame. One frame can move the sprites while another scrolls the screen, for instance.

  • Like 6
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...