Jump to content
IGNORED

Display list question


Recommended Posts

I'm finally porting some code to the Atari 8 bit and want to be sure I understand what I'm reading about display lists.
The book I'm reading doesn't really spell something out and doesn't really show a good example.

If I use Mode 8 graphics, do I have to have an entry setting mode 8 for every line of pixels on the display?
I have 3 blank lines commands, then mode 8 + LSM, the screen address, 191 mode 8 commands, 3 blank lines, JVB, List Address
And if it crosses a page boundary, a jump to then next page... which I shouldn't need since my list is just over 200 bytes.
Is that correct?

Edited by JamesD
Link to comment
Share on other sites

Normally you need the DList entry for each occurrence of the mode line. Blank lines aren't needed at the bottom, the JVB can occur anywhere in the normal display area without extra needs.

 

1K boundary is the limitation of DLists, you can use jump command to get over but it's rare because it creates a usually not wanted blank scanline.

 

Graphics can't cross 4K boundary so to get over that you have to ensure the last line won't overlap the 4K boundary then use the LMS+mode just like the first line in order to set the address counter right. Obviously it's an advantage for the previous line to end at exactly the last byte of the previous 4K so that memory use is contiguous which makes doing lines etc easier.

 

The case where you can get away with not repeating mode lines for a DList is if you have a mode set then disable DList DMA within a DLI usually. It's rarely used and not really worth the trouble for the sake of saving a few bytes. It's more a specialist thing for if you absolutely need the CPU cycles for a demo or picture or are doing some other tricky effect.

  • Like 1
Link to comment
Share on other sites

Mode 8 requires 40 * 192 bytes, or 7680 bytes.
So a mode 8 screen will cross a 4K boundary, and my list is a little more complex.
To have contiguous screen RAM, I have to align the end of a 40 byte line with the end of the first 4K block.
4K = 4096 bytes. 4096 / 40 = 102.4. So 102 lines can fit in the first 4K page.
So find the 4K boundary I want, subtract 102 * 40 and that will give me the screen address.
Which means (if I follow correctly) my DLIST needs 3 blank lines, mode 8 + LSM, screen address, 101 mode 8s, mode 8 + LSM, next screen address, 89 mode 8s, JVB dlist

Edited by JamesD
Link to comment
Share on other sites

Sort of... generally you'd work backwards. Fill the second 4K block with the 102 scanlines. The first 4K with the remainder, so you don't end up with the spare memory at the end - in theory you could put the DList at the end but you'd still end up with about 300 bytes wasted.

 

Where you can run into problems is when your screen is over 8K. That means there's 3 LMS needed and there will be a memory hole if you're using standard 40 byte width since 4,096 isn't an integer multiple of 40.

In cases like that you'd need to use tables to keep track of each scanline in memory although that's usual practice if you want to use fast plot/line routines.

  • Like 1
Link to comment
Share on other sites

Sort of... generally you'd work backwards. Fill the second 4K block with the 102 scanlines. The first 4K with the remainder, so you don't end up with the spare memory at the end - in theory you could put the DList at the end but you'd still end up with about 300 bytes wasted.

Ok, so just swap the number of mode 8's and change screen addresses. No problem.

 

Where you can run into problems is when your screen is over 8K. That means there's 3 LMS needed and there will be a memory hole if you're using standard 40 byte width since 4,096 isn't an integer multiple of 40.

In cases like that you'd need to use tables to keep track of each scanline in memory although that's usual practice if you want to use fast plot/line routines.

LOL, that reminds me of an embedded system I worked on.

There was unused RAM at the end of every display line.

I put the I/O buffers there.

 

*edit*

 

Thanks for your help!

Edited by JamesD
Link to comment
Share on other sites

FWIW, I have my 64 column text code running on the Atari's narrow playfield.
I introduced a bug in the scroll when I added 80 column + an extra row support so I have to track that down. oops.

No big deal but I'll have to fix that before I start messing with normal and wide playfields.

Right now I have an issue with the display list when it switches to the 2nd 4K page.
It's entering the wrong graphics mode... so I have some counting and math to do to fix that.
But I'm also experiencing a crash that looks like an interrupt issue.
I'm guessing I'm stomping on some system interrupt code.
Is there any good reading material on this out there?

Link to comment
Share on other sites

I ended up just bumping the start address for my page 0 use and the program runs fine with no other changes.

It runs faster than the Acorn Atom, but when I set the Atom emulator to 1.5 * speed the Atom is definitely faster.

Other than the display setup, display list, and a few address changes, the code is the same.
Based on what I see, it's about 15-20% faster than a 1 MHz Atom. I had hoped for a little better due to the CPU speed.
The speed is very similar to the MC-10 version, but not quite as fast, and that still uses a multiply where the 6502 code avoids that by using multiple tables for the font.

*edit*
FWIW, the screen scroll hurts the 6502 versions.
The 16 bit D register and stack pointer make the 6803 much more suited to the task and specialized looping instructions on the Z80 really help it out.
I think the 6502 probably does quite well just drawing the characters.

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

Musing more then helping. :) What I always do is call the OS to set up the graphics mode within the constraints of what I want to do. From there, I just copy the display list for inclusion in my program. Makes it easier to do things like have your character sets in high memory above the screen and AFAIK, the OS never gets it wrong.

Link to comment
Share on other sites

@ClausB

That's an interesting approach but it would definitely be Atari specific. I may look at doing that in the future.

I'd probably use LSB MSB tables of addresses and then load the LMS pointers based on an index.

That would be sooooo much faster. That's what? 72 loads/stores + a little management overhead vs 14,000+ loads/stores.

I have to rewrite the 6502 scroll already. It certainly works and it's pretty fast for how much work it's doing, but the screen is unreadable a lot of the time on the Atari.

On the other machines, I do a line of text at a time thanks to the 16 bit index registers. On the 6502 I do columns due to the faster indexing.

If I were using this in a terminal program, I could kinda see what was scrolling by on the other versions... but not on this.

This will work well for something like a text adventure where the scrolling would stop after a few lines.

 

As far as processor speed comparisons go that change wouldn't really be fair, but for machine A vs machine B it's certainly interesting.

The Plus/4 will require less code for writing drawing text due to the different memory layout of the bitmap.

I think it's 3 instructions shorter per line of the font written to the screen, so it should be faster drawing text by ~ 10 cycles per byte.

But the Atari would kick it's butt anytime a scroll is involved using your approach.

 

The 65802/16 MVN instruction would make this fast and trivial.

The current scroll could be faster than other CPUs except for a 6309 or Z180/HD64180 which have their own fast memory moves.

The VZ200 requires moving a couple lines of text across video memory pages so it would be at a disadvantage.

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

Musing more then helping. :) What I always do is call the OS to set up the graphics mode within the constraints of what I want to do. From there, I just copy the display list for inclusion in my program. Makes it easier to do things like have your character sets in high memory above the screen and AFAIK, the OS never gets it wrong.

I've already gotten the hang of it, it's just a little math.

 

 

 

FWIW, if I implement a wide playfield version, I'll probably use the LMS manipulation to scroll,.

After thinking about it for 2 seconds, I realized that the 80 column code is already using a lookup table for the base address for the line... so I already have a lot of the work done anyway. Drawing graphics might be a bit awkward though.

 

It would be nice if I could get the .align directive on ca65 to work so I didn't have to assemble the code twice every time I make a significant change.

I fought with that assembler's features for 3 hours trying to get the 80 column version to assemble. I got it working, but what a PITA!

Link to comment
Share on other sites

...

The 65802/16 MVN instruction would make this fast and trivial.

The current scroll could be faster than other CPUs except for a 6309 or Z180/HD64180 which have their own fast memory moves.

The VZ200 requires moving a couple lines of text across video memory pages so it would be at a disadvantage.

Just a little correction here... not really a disadvantage unless things are clocked at close to the same speed.

The VZ200 is clocked at 3.5+ MHz.

Oops

Link to comment
Share on other sites

...

The 65802/16 MVN instruction would make this fast and trivial.

The current scroll could be faster than other CPUs except for a 6309 or Z180/HD64180 which have their own fast memory moves.

The VZ200 requires moving a couple lines of text across video memory pages so it would be at a disadvantage.

The 65802/16 MVN instruction requires 7 clock cycles per byte. Kinda disappointing really. I was hoping for 4 or less.

 

The 6309 TFM instruction requires 3 clock cycles per byte. Told you it was fast! :grin:

 

I couldn't find specs on the Z180/HD64180 DMA, but in it's fastest mode I would definitely expect it to be faster than the 65802/16 and possibly 1 byte every 2 cycles.

One very interesting feature of the DMA controller is that you can set it to lock out the CPU, or allow the CPU to steal a cycle for each byte transferred.

If the CPU is running internal cycles while the DMA takes place in the cycle steal mode, it is kinda working in parallel.

You could have the Z80 continue processing for the last part of the screen while the scroll finished on it's own, making internal cycles pretty much free vs the old way.

At least I think that's how it works.

The one drawback here is the CPU isn't a pin for pin replacement.

 

In spite of the 65802/65816 seemingly being the slow poke here, it's still better than the long list of LDA #****,X STA #****,X instructions I'm using now which take at least 8 plus some loop overhead.

Even if it's only saving one clock cycle per byte, that saves over 6100 clock cycles per scroll.

 

The code is pretty simple, and it's portable to any 65802/65816 system

I'll see if I can run a test today.

	rep	#$30				; Make Accumulator and index 16-bit
	LDX	#StartAddress
	LDY	#DestAddress
	LDA	#BytesToMove
	MVN	SourceBank DestBank		; RAM banks  00 00
	sep	#$30				; make Accumulator and index 8-bits
 
Link to comment
Share on other sites

 

What problem do you have with the .align directive?

Actually, after looking into it more, it's the linker.

The generated XEX file doesn't have the required padding before the display list, I'm having to assemble the code, then look at how much padding is needed based on the lst file, and reserve bytes accordingly.

 

I'll have to spend some time with it to figure out exactly what's happening.

There is a very real chance it's due to my lack of experience with the tools rather than it being a bug.

The documentation is a bit brief.

 

Link to comment
Share on other sites

Actually, after looking into it more, it's the linker.

The generated XEX file doesn't have the required padding before the display list, I'm having to assemble the code, then look at how much padding is needed based on the lst file, and reserve bytes accordingly.

 

 

Can you give a (simplified) example?

Link to comment
Share on other sites

This is what I had. It's in a data segment.

 

;
	RTS

.data
	.align  256,0

;	.res	172
;
;Atari Antic Display List
; Screen Mode 8, 256/320/384 pixels wide x 192 high, 1 bit per pixel
;
HLIST:
	.BYTE	$70,$70,$70		; 3 BLANK LINES
...
I'm going to start a separate topic on the assembler. This isn't the only problem I've had with it. Edited by JamesD
Link to comment
Share on other sites

You are right that this is more a "problem" of the linker.

 

If you want to align to some boundary, the output section itself has to be aligned to at least that boundary.

 

You didn't write your compile and link commands, so I'm assuming you are using the atari-asm.cfg linker config file for now.

 

What you have to do is to add a "align=256" attribute to the DATA segment in the config file:

$ diff -u ~/atari800/cc65-git/cc65/cfg/atari-asm.cfg my-atari-asm.cfg 
--- /data/home/chris/atari800/cc65-git/cc65/cfg/atari-asm.cfg   2016-05-25 01:22:36.519147078 +0200
+++ my-atari-asm.cfg    2016-08-29 20:12:48.006714981 +0200
@@ -24,7 +24,7 @@
     MAINHDR:  load = MAINHDR, type = ro,  optional = yes;
     CODE:     load = MAIN,    type = rw,                  define = yes;
     RODATA:   load = MAIN,    type = ro   optional = yes;
-    DATA:     load = MAIN,    type = rw   optional = yes;
+    DATA:     load = MAIN,    type = rw   optional = yes, align = 256;
     BSS:      load = MAIN,    type = bss, optional = yes, define = yes;
     AUTOSTRT: load = TRAILER, type = ro,  optional = yes;
 }

I've extended your snippet to be compilable:

$ cat jamestest.s 
.include "atari.inc"
        .export         __AUTOSTART__: absolute = 1
        .export         __EXEHDR__: absolute = 1
        .import         __MAIN_START__, __BSS_LOAD__

.segment        "EXEHDR"
        .word   $FFFF

.segment        "MAINHDR"
        .word   __MAIN_START__
        .word   __BSS_LOAD__ - 1

.segment "AUTOSTRT"
        .word   RUNAD                   ; defined in atari.inc
        .word   RUNAD+1
        .word   ENTRY

.code
;
ENTRY:
        LDA     HLIST
        RTS

.data
        .align  256,0

;       .res    172
;
;Atari Antic Display List
; Screen Mode 8, 256/320/384 pixels wide x 192 high, 1 bit per pixel
;
HLIST:
        .BYTE   $70,$70,$70             ; 3 BLANK LINES

$

Then compile (assemble) and link:

$ ca65 -tatari -o jamestest.o jamestest.s 
$ ld65  -C my-atari-asm.cfg -o jamestest.com jamestest.o
$

In the resulting file HLIST is properly aligned:

$ hexdump -C jamestest.com 
00000000  ff ff 00 2e 02 2f ad 00  2f 60 00 00 00 00 00 00  |...../../`......|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 00 00 00 00 00 70 70  70 e0 02 e1 02 00 2e     |......ppp......|
0000010f
$

If you have more questions just ask...

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...