GR8 Pixel Plotting (assembly)

damosan · December 22, 2020

See attached code below. I'm calling this from a C program - it uses graphics mode 8 but only plots 256x192 so I can use bytes. YB, XB, and ARGUMENT are zero page. yindexhi, yindexlo, byteoffset256_table and bitmask_table are lookups. It works pretty well plotting 49k pixels in about 106 jiffies.

I can replace the JSR/RTS with two JMPs (saving about 6 cycles per pixel - a little less if I do a jump indirect back to the caller).

I've been staring at this for a while so I might be overlooking something short of inlining this.

Thanks.

;;;
;;; _plot_pixel_256 should be called the first time we plot on a
;;; new row.  As long as we're plotting on the same row we can
;;; call _plot_pixel_256_fast as the only item that changes is the
;;; column.
;;; 
_plot_pixel_256:
	ldy	yb		      ; load row
	lda	yindexhi,y	      ; get row address
	sta	argument+1	      ; argument = memory to write to
	lda	yindexlo,y
	sta	argument
_plot_pixel_256_fast:		      ; call this if we're writing to same row
	ldx	xb		      ; load column
	ldy	byteoffset256_table,x ; get byte offset (4 - 35)
	lda	(argument),y	      ; load screen byte
	eor	bitmask_table,x	      ; xor it with pixel bitmask
	sta	(argument),y	      ; store it back to screen byte
	rts

drac030 · December 22, 2020

Having each display line at address $yy00 one could perhaps shorten the first 5 instructions.

damosan · December 22, 2020

30 minutes ago, drac030 said:

Having each display line at address $yy00 one could perhaps shorten the first 5 instructions.

Using cc65 I believe I'd have to edit the config file to create a number of special 0xff byte segments - then define the tables to reside in these segments. Then it should be possible to do a 2 byte vs. 3 byte lookup?

danwinslow · December 22, 2020

I don't think you'd HAVE to do all the config stuff. You could just lay out a larger space and do your addressing inside of that, I think. Anyway even if you made the segments you'd still waste some memory because it couldn't pack around them all most likely anyway.

damosan · December 23, 2020

As an aside I'm pushing 17k pixels per second - though I'm limiting it to 255x192 so I can push bytes. I modified the config file to put the buffers into page aligned memory. I'm going to mess around with the ZP loads.

ivop · December 23, 2020

Perhaps you can also try using narrow playfield (see $D400/DMACTL). It also saves on Antic DMA cycles.

+Stephen · December 23, 2020

1 hour ago, damosan said:

As an aside I'm pushing 17k pixels per second - though I'm limiting it to 255x192 so I can push bytes. I modified the config file to put the buffers into page aligned memory. I'm going to mess around with the ZP loads.

Sounds cool - can you share what you're working on yet?

damosan · December 23, 2020

17 minutes ago, Stephen said:

Sounds cool - can you share what you're working on yet?

Nothing in particular - just seeing how fast I can paint a screen with pixels painting one per pass (vs doing 8 pixels at a time by copying 0xff). The general use routine above is slower than this code. This can probably be made faster by increment argument by 40 after the first load.

;;; _speedtest
;;;
;;; at present this will paint a 255x192 Gr.8 screen at 777 pixels
;;; per jiffy (46.6k per second).
;;;
;;; runtime = 63 jiffies
;;; 
_speedtest:
	lda	#0
	sta	yb
paint_row:
	;;
	;;  get row offset
	;; 
	ldy	yb
	lda	yindexhi,y	      ; get row address
	sta	argument+1	      ; argument = memory to write to
	lda	yindexlo,y
	sta	argument
	;;
	;; init column to 0
	;; 
	ldx	#0
paint_col:	
	;;
	;; plot pixel
	;;
	ldy	byteoffset256_table,x ; (4/5) get byte offset (4 - 35)
	lda	(argument),y	      ; (5/6) load screen byte
	ora	bitmask_table,x	      ; (4/5) OR it with pixel bitmask
	sta	(argument),y	      ; (6) store it back to screen byte
	;; 
	;; increment column
	;;
	inx			; (2) increment X
	bne	paint_col	; (2) if we increment $ff it wraps to $00 so...
end_of_cols:
	;; 
	;; increment row
	;;
	lda	yb
	cmp	#191
	beq	end_of_rows
	inc	yb
	jmp	paint_row
end_of_rows:	
	rts

rensoup · December 23, 2020

57 minutes ago, damosan said:

Nothing in particular - just seeing how fast I can paint a screen with pixels painting one per pass (vs doing 8 pixels at a time by copying 0xff). The general use routine above is slower than this code. This can probably be made faster by increment argument by 40 after the first load.

It's difficult to optimize something that is so generic, it all depends on your use case...

you could have 256 plot routines with #Imm instead of using those 2 tables, the problem would be selecting between those routines quickly enough

xxl · December 23, 2020

1 hour ago, damosan said:

Nothing in particular - just seeing how fast I can paint a screen with pixels painting one per pass (vs doing 8 pixels at a time by copying 0xff). The general use routine above is slower than this code. This can probably be made faster by increment argument by 40 after the first load.


;;; _speedtest
;;;
;;; at present this will paint a 255x192 Gr.8 screen at 777 pixels
;;; per jiffy (46.6k per second).
;;;
;;; runtime = 63 jiffies
;;; 
_speedtest:
	lda	#0
	sta	yb
paint_row:
	;;
	;;  get row offset
	;; 
	ldy	yb
	lda	yindexhi,y	      ; get row address
	sta	argument+1	      ; argument = memory to write to
	lda	yindexlo,y
	sta	argument
	;;
	;; init column to 0
	;; 
	ldx	#0
paint_col:	
	;;
	;; plot pixel
	;;
	ldy	byteoffset256_table,x ; (4/5) get byte offset (4 - 35)
	lda	(argument),y	      ; (5/6) load screen byte
	ora	bitmask_table,x	      ; (4/5) OR it with pixel bitmask
	sta	(argument),y	      ; (6) store it back to screen byte
	;; 
	;; increment column
	;;
	inx			; (2) increment X
	bne	paint_col	; (2) if we increment $ff it wraps to $00 so...
end_of_cols:
	;; 
	;; increment row
	;;
	lda	yb
	cmp	#191
	beq	end_of_rows
	inc	yb
	jmp	paint_row
end_of_rows:	
	rts

5 less cycles per loop if you use the Y register to carry values from the end of the loop

--

and 3 per init

--

and ~~one~~ 4 cycle less per loop if plot on ZP

Edited December 23, 2020 by xxl

Rybags · December 23, 2020

A big saving could be had by just embedding it in the program rather than having it as a sub... though seeing it's used with C that mightn't be possible.

Also, the position variables - if one or both could be used directly instead of copying.

damosan · December 24, 2020

2 hours ago, xxl said:

5 less cycles per loop if you use the Y register to carry values from the end of the loop

--

and 3 per init

--

and ~~one~~ 4 cycle less per loop if plot on ZP

yb is a zp byte. The lookup tables are page aligned.

How would you rewrite the above to use X and Y based on Y being required the way it is?

xxl · December 24, 2020

1 hour ago, damosan said:

yb is a zp byte. The lookup tables are page aligned.

How would you rewrite the above to use X and Y based on Y being required the way it is?

ldy yb becomes:

ldy #

yb equ *-1

1 cycle less

sta argument

sta argument+1

from ABS (4cycle) becomes ZP (3 cycle)

2 less

lda (argument),y (5 cycle)

becomes

lda $ffff,y (4 cycle)

argument equ *-2

1 less

	lda	#0
	sta	yb
paint_row:
	;;
	;;  get row offset
	;; 
	ldy	yb

becomes

ldy #0
paint_row
sty yb

3 cycle less

	lda	yb
	cmp	#191
	beq	end_of_rows
	inc	yb
	jmp	paint_row

becomes

	ldy	#
yb equ *-1
        iny 
	cpy	#192
	bcc	paint_row

6 cycles less?

Edited December 24, 2020 by xxl

rensoup · December 24, 2020

3 hours ago, xxl said:

lda (argument),y (5 cycle)

becomes

lda $ffff,y (4 cycle)

argument equ *-2

Oh yeah that works!

Edited December 24, 2020 by rensoup

damosan · December 24, 2020

12 hours ago, Rybags said:

A big saving could be had by just embedding it in the program rather than having it as a sub... though seeing it's used with C that mightn't be possible.

Also, the position variables - if one or both could be used directly instead of copying.

It's possible though it's kind of a PITA to embed assembly directly into C code - it's very easy, of course, to create separate assembly routines and let the linker figure it out.

Estece · December 25, 2020

Using all above suggestions and mads i got 42 PAL or 53 NTSC frames for full fill 256x192 pixels.

fastantF.xex

Edited December 25, 2020 by Estece

xxl · December 25, 2020

36 minutes ago, Estece said:

Using all above suggestions and mads i got 42 PAL or 53 NTSC frames for full fill 256x192 pixels.

fastantF.xex 1.24 kB · 2 downloads

hmmmmmm

0084: 84 8D STY $8D

0086: 84 93 STY $93 - DELETE
0088: BC 00 07 LDY $0700,X
008B: B9 00 CA LDA $xx00,Y

here equ *-2
008E: 1D 00 06 ORA $0600,X
0091: 99 00 CA STA $xx00,Y - REPLACE: STA (here),Y

check how much is slower (at a single point it can be faster)

Edited December 25, 2020 by xxl

Estece · December 25, 2020

Plus 2 frames :)

plus2frames.xex

damosan · December 28, 2020

I got it down to 58 jiffies with this.

paint_row:
	;;
	;;  get row offset
	;;
	lda	yindexhi,y	      ; get row address
	sta	ld+2		      ; write high byte to LD/WR so we...
	sta	wr+2		      ; ...can use an absolute version.
	lda	yindexlo,y
	sta	ld+1		      ; ditto...
	sta	wr+1
	;;
	;; init column to 0
	;; 
	ldx	#0
paint_col:	
	;;
	;; plot pixel
	;;
	ldy	byteoffset256_table,x ; (4) get byte offset (4 - 35)
ld:	lda	$ffff,y		      ; (4)
	ora	bitmask_table,x	      ; (4) OR it with pixel bitmask
wr:	sta	$ffff,y	       	      ; (5)
	;; 
	;; increment column
	;;
	inx			; (2) increment X
	bne	paint_col	; (2) if we increment $ff it wraps to $00 so...

GR8 Pixel Plotting (assembly)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members