Jump to content
IGNORED

Anyone see a faster way to do this?


JamesD

Recommended Posts

This is my code to print a character to the screen in my 64 column text code for the Atom.
It will eventually be 80 columns for other machines but this is my current target.
The first line of the font is always blank so it writes a zero to the screen for it without masking.

The font is broken up by all of the first bytes in a table, all of the 2nd bytes in a table, etc...
This eliminates having to perform a multiply in the code leading up to this (as discussed in the topic I started on multiplying by 7).
I also don't have to reuse indirect indexed addressing so I don't have to manipulate y as much.

The left and right nibble in the font are the same to keep from having to bit shift.

The code does work but I'm open to suggestions for more speed.
This is just the code to write the left nibbles, there is separate code for right nibbles.

	lda		#0
	sta		(fscreen),y			; write to the screen

	lda		FCol1,X				; get the 2nd byte of the font
	and		#%11110000			; mask area used by new character
	sta		temp0				; save it
	ldy		#32				;point to next screen byte
	lda		(fscreen),y			; read byte at destination
	and		#%00001111			; mask off unused half
	ora		temp0				; combine background with font data
	sta		(fscreen),y			; write back to the screen

	lda		FCol2,X				; get the 3rd byte of the font
	and		#%11110000			; mask area used by new character
	sta		temp0				; save it
	ldy		#64				;point to next screen byte
	lda		(fscreen),y			; read byte at destination
	and		#%00001111			; mask off unused half
	ora		temp0				; combine background with font data
	sta		(fscreen),y			; write back to the screen

	lda		FCol3,X				; get the 4th byte of the font
	and		#%11110000			; mask area used by new character
	sta		temp0				; save it
	ldy		#96				;point to next screen byte
	lda		(fscreen),y			; read byte at destination
	and		#%00001111			; mask off unused half
	ora		temp0				; combine background with font data
	sta		(fscreen),y			; write back to the screen

	lda		FCol4,X				; get the 5th byte of the font
	and		#%11110000			; mask area used by new character
	sta		temp0				; save it
	ldy		#128				;point to next screen byte
	lda		(fscreen),y			; read byte at destination
	and		#%00001111			; mask off unused half
	ora		temp0				; combine background with font data
	sta		(fscreen),y			; write back to the screen

	lda		FCol5,X				; get the 6th byte of the font
	and		#%11110000			; mask area used by new character
	sta		temp0				; save it
	ldy		#160				;point to next screen byte
	lda		(fscreen),y			; read byte at destination
	and		#%00001111			; mask off unused half
	ora		temp0				; combine background with font data
	sta		(fscreen),y			; write back to the screen
	
	lda		FCol6,X				; get the 7th byte of the font
	and		#%11110000			; mask area used by new character
	sta		temp0				; save it
	ldy		#192				;point to next screen byte
	lda		(fscreen),y			; read byte at destination
	and		#%00001111			; mask off unused half
	ora		temp0				; combine background with font data
	sta		(fscreen),y			; write back to the screen

	lda		FCol7,X				; get the 8th byte of the font
	and		#%11110000			; mask area used by new character
	sta		temp0				; save it
	ldy		#224				;point to next screen byte
	lda		(fscreen),y			; read byte at destination
	and		#%00001111			; mask off unused half
	ora		temp0				; combine background with font data
	sta		(fscreen),y			; write back to the screen
	
	rts
Edited by JamesD
Link to comment
Share on other sites

Pre-mask FCOLxx per nibble and then:

 

lda (fscreen),y ; read byte at destination
and #%00001111 ; mask off unused half
ora FCOL1,x ; combine background with font data

sta (fscreen),y

 

Also depending on the environment you may get rid of the

ldy #32

ldy #64

ldy #96

by having 8 screen pointers instead of one, all different by the relevant #32 bytes. But there is the tradeoff regarding how & when to update 8 poiners instead of 1.

 

Additionally, if the output is line-wise, it makes sense to first clear a full line and then spare the "and #%00001111 ; mask off unused half"

Link to comment
Share on other sites

Pre-mask FCOLxx per nibble and then:

 

lda (fscreen),y ; read byte at destination

and #%00001111 ; mask off unused half

ora FCOL1,x ; combine background with font data

sta (fscreen),y

 

Also depending on the environment you may get rid of the

ldy #32

ldy #64

ldy #96

by having 8 screen pointers instead of one, all different by the relevant #32 bytes. But there is the tradeoff regarding how & when to update 8 poiners instead of 1.

 

Additionally, if the output is line-wise, it makes sense to first clear a full line and then spare the "and #%00001111 ; mask off unused half"

I think updating 8 pointers is going to be worse than the LDY since these change every character

 

Link to comment
Share on other sites

Use eor to write data to screen, like this:

 

lda data

eor (screen),y

and mask

eor (screen),y

sta (screen),y

Actually, I think it would have to be

        lda	FCol6,X
	ldy	#32
	eor	(fscreen),y
	and	#%00001111
	eor	(fscreen),y	
	sta	(fscreen),y	

It's 2 instructions shorter but I'll have to count cycles to see how it compares since it uses one more complex addressing instruction rather than simple 2 cycle instructions

 

That is what I was looking for though! Thanks!

*edit*

Well, it produced interesting results but not the desired result so it's not quite right.

*edit*

I copied over the data source. It works now. I'll just benchmark it.

 

*edit*

It saves 4 clock cycles per line of the character. So 28 per character for a 7 pixel high font like I'm using or 32 per character for an 8 pixel font.

For a full text screen of 64x24 that's 6,144 clock cycles for the 7 pixel high font.

That'll do!

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

FWIW the first version saved 1 instruction on the 6803 and 3 on the Z80. It wasn't a big deal for the 6803 but that makes a big difference on the Z80 with it's larger number of cycles per instruction.
I couldn't see a lot of difference visually on the 6502 code but every cycle helps.
The bottleneck on the 6502 code is probably the scroll at this point.
I'm not sure what I can do with an unrolled loop of this.
lda (add1),y
sta (add2),y
iny

ldy #32

lda (fscreen),y

eor FCol6,x

and #%11110000

eor FCol6,x

sta (fscreen),y

 

should do as well, saving additional cycle per byte written to screen memory.

 

The only problem with that is that it doubles the number of lines of code I have to modify to switch fonts.
The 6803 and Z80 require changing 1 line. This requires changing 28. Oh well, it was 14 already, what's a few more? :D

Link to comment
Share on other sites

The bottleneck on the 6502 code is probably the scroll at this point.

I'm not sure what I can do with an unrolled loop of this.

lda (add1),y

sta (add2),y

iny

This is the only thing I've come up with to speed this up.

 

start:

S1: LDA Source1,X

D1: STA Dest1,X

LDA Source2,X

STA Dest2,X

LDA Source3,X

STA Dest3,X

LDA Source4,X

STA Dest4,X

LDA Source5,X

STA Dest5,X

LDA Source6,X

STA Dest6,X

LDA Source7,X

STA Dest7,X

LDA Source8,X

STA Dest8,X

INX

BNE start

;test for end of screen here

B?? exit

INC S1+2

INC D1+2

etc...

 

Link to comment
Share on other sites

This is the only thing I've come up with to speed this up.

 

start:

S1: LDA Source1,X

D1: STA Dest1,X

LDA Source2,X

STA Dest2,X

LDA Source3,X

STA Dest3,X

LDA Source4,X

STA Dest4,X

LDA Source5,X

STA Dest5,X

LDA Source6,X

STA Dest6,X

LDA Source7,X

STA Dest7,X

LDA Source8,X

STA Dest8,X

INX

BNE start

;test for end of screen here

B?? exit

INC S1+2

INC D1+2

etc...

 

Actually, I think I can do something like this and scroll several lines of data in the loop

start:

LDA PTR2,X

STA PTR1,X

LDA PTR3,X

STA PTR2,X

LDA PTR4,X

STA PTR3,X

LDA PTR5,X

STA PTR4,X

LDA PTR6,X

STA PTR5,X

LDA PTR7,X

STA PTR6,X

LDA PTR8,X

STA PTR7,X

INX

BNE start
Link to comment
Share on other sites

This made a huge difference on my text scroll. 80 columns instead of 64 would be more difficult

scroll2:
	ldx		#0
@sloop:
	lda		$8100,x
	sta		$8000,x
	lda		$8200,x
	sta		$8100,x
	lda		$8300,x
	sta		$8200,x
	lda		$8400,x
	sta		$8300,x
	lda		$8500,x
	sta		$8400,x
	lda		$8600,x
	sta		$8500,x
	lda		$8700,x
	sta		$8600,x
	lda		$8800,x
	sta		$8700,x
	lda		$8900,x
	sta		$8800,x
	lda		$8A00,x
	sta		$8900,x
	lda		$8B00,x
	sta		$8A00,x
	lda		$8C00,x
	sta		$8B00,x
	lda		$8D00,x
	sta		$8C00,x
	lda		$8E00,x
	sta		$8D00,x
	lda		$8F00,x
	sta		$8E00,x
	lda		$9000,x
	sta		$8F00,x
	lda		$9100,x
	sta		$9000,x
	lda		$9200,x
	sta		$9100,x
	lda		$9300,x
	sta		$9200,x
	lda		$9400,x
	sta		$9300,x
	lda		$9500,x
	sta		$9400,x
	lda		$9600,x
	sta		$9500,x
	lda		$9700,x
	sta		$9600,x
	inx
	beq		@nxt
	jmp		@sloop
@nxt:

Link to comment
Share on other sites

Is this for the A8? If so, you can use display list manipulation to scroll the data, rather than copying data around in the framebuffer. Keep a ringbuffer of lines for the display and use LMS instructions to point to lines in the ringbuffer.

Sadly it is not. It's for the Acorn Atom.

I'm making a little library of routines for use with 6847 based machines.

Then I want to create a few programs for that in C.

Nothing spectacular, just something to fill some spare time.

Link to comment
Share on other sites

The text rendering code now looks like this
Thanks for your help!

leftnibble:
	lda		#0
	sta		(fscreen),y			; write to the screen

	ldy		#32					;point to next screen byte
	lda		(fscreen),y
	eor		FCol1,X				; EOR with the next byte of the font
	and		#%00001111
	eor		FCol1,X				; EOR with the next byte of the font	
	sta		(fscreen),y

	ldy		#64					;point to next screen byte
	lda		(fscreen),y
	eor		FCol2,X				; EOR with the next byte of the font
	and		#%00001111
	eor		FCol2,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	ldy		#96					;point to next screen byte
	lda		(fscreen),y
	eor		FCol3,X				; EOR with the next byte of the font
	and		#%00001111
	eor		FCol3,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	ldy		#128				;point to next screen byte
	lda		(fscreen),y
	eor		FCol4,X				; EOR with the next byte of the font
	and		#%00001111
	eor		FCol4,X				; EOR with the next byte of the font	
	sta		(fscreen),y

	ldy		#160				;point to next screen byte
	lda		(fscreen),y
	eor		FCol5,X				; EOR with the next byte of the font
	and		#%00001111
	eor		FCol5,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	ldy		#192				;point to next screen byte
	lda		(fscreen),y
	eor		FCol6,X				; EOR with the next byte of the font
	and		#%00001111
	eor		FCol6,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	

	ldy		#224				;point to next screen byte
	lda		(fscreen),y
	eor		FCol7,X				; EOR with the next byte of the font
	and		#%00001111
	eor		FCol7,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	rts

  • Like 1
Link to comment
Share on other sites

It saves 4 clock cycles per line of the character. So 28 per character for a 7 pixel high font like I'm using or 32 per character for an 8 pixel font.

 

For a full text screen of 64x24 that's 6,144 clock cycles for the 7 pixel high font.

Actualy it's 43k cycles :) (64*24 chars * 28 cycles per char)

 

And as JAC! already suggested, you can have two fonts with unused nibbles cleared and save another 4 cycles per byte, 28 per char so another 43k cycles.

Edited by MaPa
  • Like 1
Link to comment
Share on other sites

Actualy it's 43k cycles :) (64*24 chars * 28 cycles per char)

Yup, I had that and re-added on the calculator and missed the * 7.

 

http://www.stardot.org.uk/forums/viewtopic.php?f=44&t=10149

 

And as JAC! already suggested, you can have two fonts with unused nibbles cleared and save another 4 cycles per byte, 28 per char so another 43k cycles.

If I have 672 bytes to spare for the doubled font then I might do that but I'm not going to for the moment.

 

The code is down to 6 instructions per byte already but that would get it to 4.

I think that actually saves 7 cycles per byte over my current code.

I clear the screen first and last line when I scroll. If I have to print over something I'll need an and though.

	ldy		#224				;point to next screen byte
	lda		(fscreen),y
	ora		FCol7,X				; OR with the next byte of the font
	sta		(fscreen),y
I just created a test version based on that using a clone of the existing font and I'd need to put them side by side to tell the difference. Most of the clock cycles are in the scroll. Edited by JamesD
Link to comment
Share on other sites

This does work with the extra font data but the speed difference amounts to about 1 line extra for an entire screen full once it's scrolling.
During the screen drawing before the scroll it is much faster.

 

 

print_64:
	; register a contains character
	sec
	sbc		#' '				; printable character set data starts at space, ASCII 32
	tax							; save as character table offset

	; point screen to $8000 + row (base screen address + row)
	lda		#$80
	clc
	adc		row					; adding row to MSB = 256 * row
	sta		fscreen+1

	ldy		#0					; top line is always black (1st byte of the font)
								; start at zero offset from screen address
	
	; add the column
	lda		col					; 2 columns / byte
	lsr
	sta		fscreen				; save it

	bcc		leftnibble
	clc
	txa
	adc		#96
	tax
;**************************************************
;* left nibble 
;**************************************************
leftnibble:
	lda		#0
	sta		(fscreen),y			; write to the screen

	ldy		#32					;point to next screen byte
	lda		(fscreen),y
	ora		FCol1,X				; EOR with the next byte of the font	
	sta		(fscreen),y

	ldy		#64					;point to next screen byte
	lda		(fscreen),y
	ora		FCol2,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	ldy		#96					;point to next screen byte
	lda		(fscreen),y
	ora		FCol3,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	ldy		#128				;point to next screen byte
	lda		(fscreen),y
	ora		FCol4,X				; EOR with the next byte of the font	
	sta		(fscreen),y

	ldy		#160				;point to next screen byte
	lda		(fscreen),y
	ora		FCol5,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	ldy		#192				;point to next screen byte
	lda		(fscreen),y
	ora		FCol6,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	

	ldy		#224				;point to next screen byte
	lda		(fscreen),y
	ora		FCol7,X				; EOR with the next byte of the font	
	sta		(fscreen),y
	
	rts

 

 

Link to comment
Share on other sites

print_64:
	; register a contains character
	sec
	sbc		#' '				; printable character set data starts at space, ASCII 32
	tax							; save as character table offset

 

 

tax

sbx #' '

 

 

 

	clc
	txa
	adc		#96
	tax

 

 

txa

sbx #$100-96

  • Like 2
Link to comment
Share on other sites

I'm not big on illegal opcodes but if I need a couple extra clock cycles I might use that.

I'm sure there are a few clock cycles to be had here or there in the rest of the code but I think the only thing I can do to cut any significant number of clock cycles is to unroll the scroll code further, but that means doubling the size of that code or worse. I figure that would save around 1278 cycles. That adds up but I think I'd only use that if I have to.

Link to comment
Share on other sites

When two characters line up as left nibble and right nibble of the same byte you can print them at the same time like this:

 

;**************************************************
; write two characters at once
;**************************************************
print_642:
	; register a contains character
	lda		(string),y
	sec
	sbc		#' '				; printable character set data starts at space, ASCII 32
	sta		firstchar			; save as character table offset

	iny
	lda		(string),y
	sec
	sbc		#' '
	sta		secondchar
	
	; point screen to $8000 + row (base screen address + row)
	lda		#$80
	clc
	adc		row					; adding row to MSB = 256 * row
	sta		fscreen+1

	ldy		#0					; top line is always black (1st byte of the font)
								; start at zero offset from screen address
	
	; add the column
	lda		col					; 2 columns / byte
	lsr
	sta		fscreen				; save it

twochar:
	lda		BGColor
	sta		(fscreen),y			; write to the screen

	ldy		#32				; point to next screen byte
	ldx		firstchar			; offset to 1st character
	lda		FCol1,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol21,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen

	ldy		#64				;point to next screen byte
	ldx		firstchar			; offset to 1st character
	lda		FCol2,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol22,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	
	ldy		#96					;point to next screen byte
	ldx		firstchar			; offset to 1st character
	lda		FCol3,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol23,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	
	ldy		#128				;point to next screen byte
	ldx		firstchar			; offset to 1st character
	lda		FCol4,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol24,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen

	ldy		#160				;point to next screen byte
	ldx		firstchar			; offset to 1st character
	lda		FCol5,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol25,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	
	ldy		#192				;point to next screen byte
	ldx		firstchar			; offset to 1st character
	lda		FCol6,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol26,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	

	ldy		#224				;point to next screen byte
	ldx		firstchar			; offset to 1st character
	lda		FCol7,X				; load the next byte of the 1st character
	ldx		secondchar			; offset to 2nd character
	eor		FCol27,X			; add the next byte of the 2nd character
	sta		(fscreen),y			; write it to the screen
	
	rts
		

 

 

Link to comment
Share on other sites

maybe this way:

 

ldy #32 ; point to next screen byte
ldx secondchar ; offset to 2nd character
lda FCol21,X ; add the next byte of the 2nd character
ldx firstchar ; offset to 1st character
eor FCol1,X ; load the next byte of the 1st character
sta (fscreen),y ; write it to the screen
ldy #64 ;point to next screen byte
ldx firstchar ; offset to 1st character
lda FCol2,X ; load the next byte of the 1st character
ldx secondchar ; offset to 2nd character
eor FCol22,X ; add the next byte of the 2nd character
sta (fscreen),y ; write it to the screen
and so.
and of course:
lax (string),y
sbx #' ' ; printable character set data starts at space, ASCII 32
stx firstchar ; save as character table offset
and
lax (string),y
sbx #' '
stx secondchar

 

Link to comment
Share on other sites

maybe this way:

 

ldy #32 ; point to next screen byte

ldx secondchar ; offset to 2nd character

lda FCol21,X ; add the next byte of the 2nd character

ldx firstchar ; offset to 1st character

eor FCol1,X ; load the next byte of the 1st character

sta (fscreen),y ; write it to the screen

 

 

ldy #64 ;point to next screen byte

ldx firstchar ; offset to 1st character

lda FCol2,X ; load the next byte of the 1st character

ldx secondchar ; offset to 2nd character

eor FCol22,X ; add the next byte of the 2nd character

sta (fscreen),y ; write it to the screen

 

 

and so.

That will work nicely.

 

 

and of course:

 

 

lax (string),y

sbx #' ' ; printable character set data starts at space, ASCII 32

stx firstchar ; save as character table offset

 

and

 

 

lax (string),y

sbx #' '

stx secondchar

I'd still like to avoid illegal/undocumented opcodes unless I absolutely have to use them.
Link to comment
Share on other sites

Before I started optimizing on the left, after on the right.
The most noticeable optimizations were the revised scroll and printing two characters at a time but the latter did depend a bit on some of the others.

Sorry for the jerky video, I'm doing this on my laptop and sometimes it doesn't keep up.

Edited by JamesD
  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...