Jump to content
IGNORED

VBXE speed


Recommended Posts

As I understand it: the blitter can use the full 14MHz bandwidth of VBXE local memory (8x Atari bus speed), but it has lowest priority relative to anything else including MEMAC. In general, it will fill at 1 cycle/byte, copy at 2 cycles/byte, and do read/modify/write or collision check operations at 3 cycles/byte. The blitter can skip cycles and run faster if the source is constant (AND mask = $00), if it's doing a RMW operation and the source byte is $00, or when repeating bytes with X zoom. Y zoom is not optimized and will re-read the source bytes.

 

The blitter can slow down to as low as quarter speed depending on the amount of DMA contention involved, particularly from the overlay. A 320x192 standard overlay, for instance, will consume 20-25% of total bandwidth. Running code out of MEMAC costs up to one-eighth of total bandwidth. Still, the blitter is fast enough to redraw the entire screen every frame if you keep overdraw low. Blitter lists can help you do this; it is feasible to have to blitter draw sprites with automatic background save/restore -- put your sprite position tables in one of the MEMAC windows and use the blitter to blit the positions into the save/restore blit lists. Similarly, you can emulate a tilemap by constructing a huge blit list that first blits from the tilemap into the source addresses of the rest of the blit list that copies one tile at a time.

  • Like 1
Link to comment
Share on other sites

Additional consideration is that like legacy hardware there's the same ratio of less cycles per frame available for NTSC vs PAL.

 

For squeezing every last cycle out though, the key things would be - keep CPU and Antic access to VRam to a minimum. Consider screen architecture such as where attribute maps are needed or not, where narrow mode might be sufficient, where text mode might be sufficient.

 

Not sure if Antic refresh cycles can generate a wait state for VBXE, in theory it could just ignore them, VRAM is static Ram so never needs refreshing.

  • Like 1
Link to comment
Share on other sites

I might be wrong, but I don't think refresh cycles count because they have no address to decode to a MEMAC window.

 

CPU accesses to VRAM are relatively cheap if you don't execute code from the window, as then it's likely to be <5% total local bandwidth. That's a fairly low cost to be able to do things like place MEMAC A at 0, which lets you context switch quickly and also store to VRAM at 3 cycles/byte. ANTIC, on the other hand, should just be switched off to let the CPU run faster.

Link to comment
Share on other sites

I am running into following issue (compared to my Lynx demos ;)) and did not find same kind if speed for blitting spans of polygons.

 

but my render loop does 1 blit per span meaning only 1 BCB.... including wait for blitter stop.

 

but still not satisfied with speed.....

Link to comment
Share on other sites

What's a spanlist?

 

BCBs just execute one after another sequentially until one with the "NEXT" bit cleared in it's BCB finishes which signifies end of processing.

It is a bit annoying... what would have been nice is a skip command so you could leave objects defined but selectively not display them instead of having to modify the BCB so it doesn't render.

 

One solution I found is to use an initial blit or two which populates parameters within the string of BCBs, it's just way faster to do minimal CPU processing and just have blits to do much of the pre-processing since it moves data around so quickly.

 

Another timesaving thing - if you have a large object with a shape with lots of blank or common space, consider breaking it into smaller objects to save unnecessary blits, plus using constant data mode has it's cycle savings as well.

  • Like 1
Link to comment
Share on other sites

think of n-poly (not triangle but same there)...

 

I calc via CPU 2 buffers (left edge, right edge with miny, maxy vars to see which areas are covered on screen).

 

then fill those spans

 

so kind of

 

for y=miny to maxy-1

set span_xpos in BCB to rightege(y)

set span_size_x in BCB to rightedge-leftdge

blit span

wait blit

next y

 

BCB sets blitter to copy mode 0, AND #0, EXOR span colorm step x = -1

 

 

so... one idea was to have say 200 BCBs (for 200 scanlines, like unrolled code) CPU sets blitter start BCB based on miny (basicly 21*y), set all positions and sizes between miny and maxy, and clear next bit in the maxy bcb.

 

just non proofed idea of having one big poly bcb list... such stuff helped in the Elements Lynx demo as CPU dont need to wait for each span finished blit.

Edited by Heaven/TQA
Link to comment
Share on other sites

If possible set all the blits up first and run them in one go.

 

Running single blits or groups of a few then having CPU intervention where it waits for IRQ or the flag the starts the next lot would be somewhat wasteful.

Also don't forget - for some stuff you can make use of the blit for normal Antic graphics.

That's what I did with Quadrillion - I initially converted the game with the graphics remapped from the cell to linear type but nasty bugs crept in and I had to start again.

So I went with the idea to just leave the rendering mostly alone, using the blit to convert the entire 8K bitmap from Plus4 mapping to Atari every frame.

 

If you can live with Antic graphics for certain stuff then potentially the blitter can do 4-8 times the number of pixel shifting.

Link to comment
Share on other sites

so... here are some altirra screenshots

 

yellow color appears when I start blitter operation.

 

black when finished

 

 

 

vbxe_blitter_face_nowait_span.png

 

this draws 1 span without waiting the blitter to finish

 

vbxe_blitter_face_nowait.png

 

same as span but 1 face

 

vbxe_blitter_face.png

 

same as above but with wait

 

 

vbxe_blitter_face_nowait_200span.png

 

this one blitting 200 spans in a blitter block list...

 

what makes me wonder...

 

that's not "fast"?

 

why is it always starts nearly the same screen position... is there any "align" or "sync" happening?

 

$d400 is 0, blitter is set to fill mode...

 

 

 

 

lda #15

sta $d01a
lda #1
sta $D653 ; start blitter (draw span)
@ lda $D653 ; wait until not-busy
bne @-
sta $d01a

post-528-0-85622600-1494418337.png

post-528-0-07850900-1494418345.png

post-528-0-77797100-1494418357.png

post-528-0-69525300-1494418423.png

Link to comment
Share on other sites

100 bytes to fill would need 100 cycles in blitter fill mode

100/8 =12.5 (blitter 8x faster than cpu)
so in my world… the CPU would get control back after 12.5 cycles… and atari has 112 cycles per rasterline so the yellow bars would be much thinner???
where is my misunderstanding?
though have not checked real hw yet.
Link to comment
Share on other sites

Are you doing lots of single blits? I would think that's a big problem, especially considering some line draws are like 10 pixels wide.

 

Consider VBXE reads the BCB and starts executing it in less than 3 CPU cycles. A 10 pixel line is another 2 cycles with some spare. The overhead in setting up for individual blits, monitoring and starting the next one could potentially see the blitter spending more time idle than actually working.

 

 

Pretty good looking sequence BTW... another optimization you might try - in standard mode the scanlines are 320 bytes apart. Depending on how you do your calculations, if you can spare some VRam, put the scanlines 512 bytes apart which for some graphical stuff can speed things up... fairly sure I did that in Moon Cresta so all the 6502 had to do was some bit-shifting to calculate the sprite start addresses.

Edited by Rybags
Link to comment
Share on other sites

i have 256 byte scanlines... (more easy... $baseYYXX)

 

but that's why I thought using the blitter list with 200 BCBs... would gain... but the yellow "areas" are similar size?

 

wtf... this does not look good in terms of copy speed. I thought when looking at the Lamer's demo there is more potential in VBXE. but could my code or Altirra or whatever :D

Link to comment
Share on other sites

Are you doing lots of single blits? I would think that's a big problem, especially considering some line draws are like 10 pixels wide.

 

Consider VBXE reads the BCB and starts executing it in less than 3 CPU cycles. A 10 pixel line is another 2 cycles with some spare. The overhead in setting up for individual blits, monitoring and starting the next one could potentially see the blitter spending more time idle than actually working.

 

 

Pretty good looking sequence BTW... another optimization you might try - in standard mode the scanlines are 320 bytes apart. Depending on how you do your calculations, if you can spare some VRam, put the scanlines 512 bytes apart which for some graphical stuff can speed things up... fairly sure I did that in Moon Cresta so all the 6502 had to do was some bit-shifting to calculate the sprite start addresses.

 

the yellow is I am waiting for blitter to finish... so blitter can not be idle... it looks more to me hooking up the CPU for too long as I had expected? if small spans... then it would or should be a mess of small yellow stripes?

Link to comment
Share on other sites

I was thinking about doing a test case for the refresh thing...

 

For your problem, maybe do a dump of a bad case situation of all the data going into the BCBs.

Then work out how many BCBs, how many pixels per BCB etc.

Then calculate how many cycles are required. Then throw in the ones for where the 6502 is dragging the chain with the blitter idle and waiting. Then compare that to what you're witnessing onscreen.

 

What you're drawing there, is it all done with horizontal line segments?

Are you using the mode 0 blitter command, without collision detection or any other time sapping stuff?

Link to comment
Share on other sites

Rybags...

 

BUT... as you see... most of the time we are talking about 1 SCANLINE... processed... it could be that I am meassuring wrong... (but posted the wait junk of code).

 

so i am really really wonder...even if I blit one horizontal line say (here in oxygene logo faces are max maybe 32 pixels) and look the big yellow area?

 

(check the filenames to see what they do)... no wait means start blitter without waiting blitter to finish his work...

 

and I had assumed that the yellow chunks would be

a) smaller in terms of height and length

b) randomly spreaded over the screen

 

so still most wonder... why the hell does the blitter suck so much time? (as I said...could be my code... will show you later).

Link to comment
Share on other sites

tha's my hline blit object (not talking about the list)



hline_bcb:
.long $000000 ;source adress
.word 0 ;source step y
.byte 0 ;source step x
.long $010000 ;destination adress
.word 256 ;dest. step y
.byte -1 ;dest step x
.word 0 ;size x
.byte 0 ;size y
.byte $00 ;and
.byte $00 ; XOR
.byte 0 ; collision AND
.byte $00 ; zoom
.byte 0 ; pattern
.byte 0 ; control



and that's the render loop:


render_scene
ldy miny
lda #$80 ;bank 0
sta $d65d ;cpu-vram access window at $4000

polycol lda #4
sta $4310 ;color
lda #$03 ;$000300 = $4300 bank #0
sta $D651 ; blitter addr

_drwply
lda redge,y
sta $4306 ;xpos
sty $4307 ;ypos
sec
sbc ledge,y
bcc @+1
;sta $430c ;sizex
lda #15
sta $d01a
@ lda $D653 ; wait until not-busy
bne @-

lda #1
sta $D653 ; start blitter (draw span)
sta $d01a
_drw2 iny
cpy maxy
bne _drwply
@ lda #$00 ;bank 0
sta $d65d ;cpu-vram access window at $4000
rts




Edited by Heaven/TQA
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...