Jump to content

Photo

Super fast 6 digit BCD display


10 replies to this topic

#1 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • 7,964 posts
  • Busy bee!
  • Location:North, England

Posted Sun Sep 12, 2010 2:26 PM

Since the CP1600 has no decimal mode or even any instructions to do BCD arithmetic its difficult to get any numeric information out of the machine without taking up a ton of precious cycles.

The following code is taken from the score display routine in my latest game. It doesn't do anything fancy so there is no leading zero suppression. It will produce a 6 digit display with the least significant 16 bits in r0, the most significant 16 bits in r1. Register r4 is the destination pointer for the buffer and register r2 is the STIC colour information (which is XOR'd in).

;------------------------------------------------------------------------------
; Display6DigitsInBCD
;------------------------------------------------------------------------------
; Display 6 digits in BCD format.
;------------------------------------------------------------------------------
; In :-
; RO - Least Significant Word.
; R1 - Most Significant Word.
; R2 - STIC colour mask (XOR'd in).
; R4 - Destination pointer.
;
; Out :-
; Nothing.
;
; Trashes :-
; r0, r1, r3, r4, r5
;
;------------------------------------------------------------------------------
Display6DigitsInBCD: PROC
        pshr r5                 ; Save return address.

        mvii #1 SHL 3, r5       ; Most used constant in a register for speed.

; Most significant digit.
        mvii #(-1+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
BCDLoop1:
        addr r5, r3
        addi #-100000 AND $FFFF, r0             ; Get around the fact that
        adcr r1                                 ; there is no SBCR instruction.
        addi #(-100000 SHR 16) AND $FFFF, r1    ; on the CP1600.
        bc BCDLoop1
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(10+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
BCDLoop2:
        subr r5, r3
        addi #10000, r0
        adcr r1
        bnc BCDLoop2
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(-1+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
BCDLoop3:
        addr r5, r3
        subi #1000, r0
        bc BCDLoop3
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(10+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
BCDLoop4:
        subr r5, r3
        addi #100, r0
        bnc BCDLoop4
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(-1+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
BCDLoop5:
        addr r5, r3
        subi #10, r0
        bc BCDLoop5
        mvo@ r3, r4                     ; Into the output buffer.

; Least significant digit.
        addi #10+'0'-' ', r0            ; Prep GROM character.
        sll r0, 2
        addr r0, r0                     ; x3
        xorr r2, r0                     ; Mix in the STIC colours.
        mvo@ r0, r4                     ; Into the output buffer.

; All done!
        pulr pc
        ENDP

Call the function as follows :-

        mvii #976541 AND $FFFF, r0
        mvii #(976541 SHR 16) AND $FFFF, r1
        mvii #C_RED, r2
        mvii #MEMMAP.backtab, r4
        call Display6DigitsInBCD

You'll need the latest version of jzintv (not beta 3) to assemble it correctly.

#2 intvnut OFFLINE  

intvnut

    Stargunner

  • 1,418 posts
  • Location:@R6 (top of stack)

Posted Sun Sep 12, 2010 3:24 PM

Since the CP1600 has no decimal mode or even any instructions to do BCD arithmetic its difficult to get any numeric information out of the machine without taking up a ton of precious cycles.

The following code is taken from the score display routine in my latest game. It doesn't do anything fancy so there is no leading zero suppression. It will produce a 6 digit display with the least significant 16 bits in r0, the most significant 16 bits in r1. Register r4 is the destination pointer for the buffer and register r2 is the STIC colour information (which is XOR'd in).


:cool:

Nice! I like the alternation between ADDI and SUBI. It saves having to "fix up" an over-subtract. Once you get to BCDloop3, you could store your power-of-10 in R1, which will cost you 8 cycles to generate, but will save you 2 cycles per loop iteration. Since you could loop up to 10 times and you always loop at least once, that'll save you cycles both in the worst case and on average.

I actually went even more aggressively for some code I wrote recently for a Space Patrol update. Instead of looping, I went with a decision-tree approach, since it's guaranteed to never need more than ceil(log2(10)) steps for any digit. I didn't want to crash on all 9s or on 909090. :-)

I also cheated another way: I display a 7 digit score, but I store it as a 16 bit number. I do this by making all the scores a multiple of 50 points. Ignore the very last digit (which remains fixed at 0). The rightmost changing digit is always either 0 or 5 based on the LSB of the score. The remaining 15 bits give me 0 - 32767 for the upper 5 digits of the score.

The resulting decision-tree based code looks like this:

MACRO   digi(x)
    ($87 + (%x%) * <img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />
ENDM

            ;; Displays a score from 000000 to 327675

            MVI     SCORE,          R0
            MVII    #SCORE.DIG,     R4
            CLRC
            MVII    #digi(5),       R1
            RRC     R0
            BC      @@50
            MVII    #digi(0),       R1
@@50
            MVO     R1,             SCORE.DIG + 5

            ;;  3xxxx, 2xxxx, 1xxxx, or 0xxxx?
            SUBI    #20000,         R0
            BNC     @@1xxxx_or_0xxxx

@@3xxxx_or_2xxxx
            ;; 2xxxx or 3xxxx?  Try subtracting more
            MVII    #digi(3),       R1      ; assume it's 3xxxx
            SUBI    #10000,         R0
            BPL     @@3xxxx                 ; assumed correctly

            ;; 2xxxx:  We subtracted too much
            MVII    #digi(2),       R1      ; 
            ADDI    #10000,         R0      
            B       @@xxxx

@@1xxxx_or_0xxxx:
            ;; 1xxxx or 0xxxx:  We subtracted too much
            MVII    #digi(1),       R1      ; assume it's 1xxxx
            ADDI    #10000,         R0
            BPL     @@1xxxx                 ; assumed correctly!

            ;; 0xxxx:  We still subtracted too much
            MVII    #digi(0),       R1
            ADDI    #10000,         R0      ; add it back
            
@@3xxxx:
@@1xxxx:  
@@xxxx:
            MVO@    R1,             R4      ; store out first digit

            ;;  [9-5]xxx or [4-0]xxx?
            SUBI    #5000,          R0
            BMI     @@4xxx_or_less

            ;;  [9-7]xxx or [6-5]xxx
            SUBI    #2000,          R0
            BMI     @@6xxx_or_5xxx

            ;;  [9-8]xxx or 7xxx
            SUBI    #1000,          R0
            BMI     @@7xxx

            ;;  9xxx or 8xxx
            MVII    #digi(9),       R1      ; assume 9
            SUBI    #1000,          R0
            BPL     @@9xxx

            ;;  8xxx
            MVII    #digi(<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />,       R1      ; it's 8
            ADDI    #1000,          R0
            B       @@xxx

@@7xxx:     MVII    #digi(7),       R1      ; it's 7
            ADDI    #1000,          R0
            B       @@xxx

@@6xxx_or_5xxx:
            ;;  6xxx or 5xxx
            MVII    #digi(6),       R1      ; assume 6
            ADDI    #1000,          R0
            BPL     @@xxx

            ;;  5xxx
            MVII    #digi(5),       R1      ; it's 5
            ADDI    #1000,          R0
            B       @@xxx

@@4xxx_or_less

            ;;  [4-2]xxx or [1-0]xxx
            SUBI    #2000,          R0
            BMI     @@1xxx_or_0xxx

            ;;  [4-3]xxx or 2xxx
            SUBI    #1000,          R0
            BMI     @@2xxx

            ;;  4xxx or 3xxx
            MVII    #digi(4),       R1      ; assume 4
            SUBI    #1000,          R0
            BPL     @@4xxx

            ;;  3xxx
            MVII    #digi(3),       R1      ; it's 3
            ADDI    #1000,          R0
            B       @@xxx

@@2xxx:     MVII    #digi(2),       R1      ; it's 2
            ADDI    #1000,          R0
            B       @@xxx

@@1xxx_or_0xxx:
            ;;  1xxx or 0xxx
            MVII    #digi(1),       R1      ; assume 1
            ADDI    #1000,          R0
            BPL     @@xxx

            ;;  0xxx
            MVII    #digi(0),       R1      ; it's 0
            ADDI    #1000,          R0
            B       @@xxx


@@9xxx
@@4xxx
@@xxx
            MVO@    R1,             R4      ; store out second digit

            ;;  [9-5]xx or [4-0]xx?
            SUBI    #500,           R0
            BMI     @@4xx_or_less

            ;;  [9-7]xx or [6-5]xx
            SUBI    #200,           R0
            BMI     @@6xx_or_5xx

            ;;  [9-8]xx or 7xx
            SUBI    #100,           R0
            BMI     @@7xx

            ;;  9xx or 8xx
            MVII    #digi(9),       R1      ; assume 9
            SUBI    #100,           R0
            BPL     @@9xx

            ;;  8xx
            MVII    #digi(<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />,       R1      ; it's 8
            ADDI    #100,           R0
            B       @@xx

@@7xx:      MVII    #digi(7),       R1      ; it's 7
            ADDI    #100,           R0
            B       @@xx

@@6xx_or_5xx:
            ;;  6xx or 5xx
            MVII    #digi(6),       R1      ; assume 6
            ADDI    #100,           R0
            BPL     @@xx

            ;;  5xx
            MVII    #digi(5),       R1      ; it's 5
            ADDI    #100,           R0
            B       @@xx

@@4xx_or_less

            ;;  [4-2]xx or [1-0]xx
            SUBI    #200,           R0
            BMI     @@1xx_or_0xx

            ;;  [4-3]xx or 2xx
            SUBI    #100,           R0
            BMI     @@2xx

            ;;  4xx or 3xx
            MVII    #digi(4),       R1      ; assume 4
            SUBI    #100,           R0
            BPL     @@4xx

            ;;  3xx
            MVII    #digi(3),       R1      ; it's 3
            ADDI    #100,           R0
            B       @@xx

@@2xx:      MVII    #digi(2),       R1      ; it's 2
            ADDI    #100,           R0
            B       @@xx

@@1xx_or_0xx:
            ;;  1xx or 0xx
            MVII    #digi(1),       R1      ; assume 1
            ADDI    #100,           R0
            BPL     @@xx

            ;;  0xx
            MVII    #digi(0),       R1      ; it's 0
            ADDI    #100,           R0
            B       @@xx


@@9xx
@@4xx
@@xx
            MVO@    R1,             R4      ; store out third digit

            ;;  [9-5]x or [4-0]x?
            SUBI    #50,            R0
            BMI     @@4x_or_less

            ;;  [9-7]x or [6-5]x
            SUBI    #20,            R0
            BMI     @@6x_or_5x

            ;;  [9-8]x or 7x
            SUBI    #10,            R0
            BMI     @@7x

            ;;  9x or 8x
            MVII    #digi(9),       R1      ; assume 9
            SUBI    #10,            R0
            BPL     @@9x

            ;;  8x
            MVII    #digi(<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />,       R1      ; it's 8
            ADDI    #10,            R0
            B       @@x

@@7x:       MVII    #digi(7),       R1      ; it's 7
            ADDI    #10,            R0
            B       @@x

@@6x_or_5x:
            ;;  6x or 5x
            MVII    #digi(6),       R1      ; assume 6
            ADDI    #10,            R0
            BPL     @@x

            ;;  5x
            MVII    #digi(5),       R1      ; it's 5
            ADDI    #10,            R0
            B       @@x

@@4x_or_less

            ;;  [4-2]x or [1-0]x
            SUBI    #20,            R0
            BMI     @@1x_or_0x

            ;;  [4-3]x or 2x
            SUBI    #10,            R0
            BMI     @@2x

            ;;  4x or 3x
            MVII    #digi(4),       R1      ; assume 4
            SUBI    #10,            R0
            BPL     @@4x

            ;;  3x
            MVII    #digi(3),       R1      ; it's 3
            ADDI    #10,            R0
            B       @@x

@@2x:       MVII    #digi(2),       R1      ; it's 2
            ADDI    #10,            R0
            B       @@x

@@1x_or_0x:
            ;;  1x or 0x
            MVII    #digi(1),       R1      ; assume 1
            ADDI    #10,            R0
            BPL     @@x

            ;;  0x
            MVII    #digi(0),       R1      ; it's 0
            ADDI    #10,            R0
            B       @@x

@@9x
@@4x
@@x
            MVO@    R1,             R4      ; store out fourth digit

            ;;  [9-5] or [4-0]?
            SUBI    #5,             R0
            BMI     @@4_or_less

            ;;  [9-7] or [6-5]
            SUBI    #2,             R0
            BMI     @@6_or_5

            ;;  [9-8] or 7
            DECR    R0
            BMI     @@7

            ;;  9 or 8
            MVII    #digi(9),       R1      ; assume 9
            DECR    R0
            BPL     @@9

            ;;  8
            MVII    #digi(<img src='http://www.atariage.com/forums/public/style_emoticons/<#EMO_DIR#>/icon_cool.gif' class='bbc_emoticon' alt='8)' />,       R1      ; it's 8
            INCR    R0
            B       @@last

@@7:        MVII    #digi(7),       R1      ; it's 7
            INCR    R0
            B       @@last

@@6_or_5:
            ;;  6 or 5
            MVII    #digi(6),       R1      ; assume 6
            INCR    R0
            BPL     @@last

            ;;  5
            MVII    #digi(5),       R1      ; it's 5
            INCR    R0
            B       @@last

@@4_or_less

            ;;  [4-2] or [1-0]
            SUBI    #2,             R0
            BMI     @@1_or_0

            ;;  [4-3] or 2
            DECR    R0
            BMI     @@2

            ;;  4 or 3
            MVII    #digi(4),       R1      ; assume 4
            DECR    R0
            BPL     @@4

            ;;  3
            MVII    #digi(3),       R1      ; it's 3
            INCR    R0
            B       @@last

@@2:        MVII    #digi(2),       R1      ; it's 2
            INCR    R0
            B       @@last

@@1_or_0:
            ;;  1 or 0
            MVII    #digi(1),       R1      ; assume 1
            INCR    R0
            BPL     @@last

            ;;  0
            MVII    #digi(0),       R1      ; it's 0
            INCR    R0
            B       @@last

@@9
@@4
@@last
            MVO@    R1,             R4      ; store out fifth digit

Now that's a beast. :-) Your code is decidedly more approachable.

#3 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • Topic Starter
  • 7,964 posts
  • Busy bee!
  • Location:North, England

Posted Sun Sep 12, 2010 3:37 PM

Once you get to BCDloop3, you could store your power-of-10 in R1, which will cost you 8 cycles to generate, but will save you 2 cycles per loop iteration. Since you could loop up to 10 times and you always loop at least once, that'll save you cycles both in the worst case and on average.


:cool: Thanks for the further optimisation tip. I'd completely forgotten that I'd another free register at that point.

I actually went even more aggressively for some code I wrote recently for a Space Patrol update. Instead of looping, I went with a decision-tree approach, since it's guaranteed to never need more than ceil(log2(10)) steps for any digit. I didn't want to crash on all 9s or on 909090. :-)


Wow! That's a ton of code and an interesting cheat.

#4 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • Topic Starter
  • 7,964 posts
  • Busy bee!
  • Location:North, England

Posted Mon Sep 13, 2010 3:55 AM

Here's the version with the optimisation :-

;------------------------------------------------------------------------------
; Display6DigitsInBCD
;------------------------------------------------------------------------------
; Display 6 digits in BCD format.
;------------------------------------------------------------------------------
; In :-
; RO - Least Significant Word.
; R1 - Most Significant Word.
; R2 - STIC colour mask (XOR'd in).
; R4 - Destination pointer.
;
; Out :-
; Nothing.
;
; Trashes :-
; R0, R1, R3, R4, R5
;
;------------------------------------------------------------------------------
Display6DigitsInBCD: PROC
        pshr r5                 ; Save return address.

        mvii #1 SHL 3, r5       ; Most used constant in a register for speed.

; Most significant digit.
        mvii #(-1+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
BCDLoop1:
        addr r5, r3
        addi #-100000 AND $FFFF, r0             ; Get around the fact that
        adcr r1                                 ; there is no SBCR instruction.
        addi #(-100000 SHR 16) AND $FFFF, r1    ; on the CP1600.
        bc BCDLoop1
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(10+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
BCDLoop2:
        subr r5, r3
        addi #10000, r0
        adcr r1
        bnc BCDLoop2
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(-1+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
        mvii #1000, r1                  ; Power of 10.
BCDLoop3:
        addr r5, r3
        subr r1, r0
        bc BCDLoop3
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(10+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
        mvii #100, r1                   ; Power of 10.
BCDLoop4:
        subr r5, r3
        addr r1, r0
        bnc BCDLoop4
        mvo@ r3, r4                     ; Into the output buffer.

        mvii #(-1+'0'-' ') SHL 3, r3    ; Prep GROM character.
        xorr r2, r3                     ; Mix in the STIC colours.
        mvii #10, r1                    ; Power of 10.
BCDLoop5:
        addr r5, r3
        subr r1, r0
        bc BCDLoop5
        mvo@ r3, r4                     ; Into the output buffer.

; Least significant digit.
        addi #10+'0'-' ', r0            ; Prep GROM character.
        sll r0, 2
        addr r0, r0                     ; x3
        xorr r2, r0                     ; Mix in the STIC colours.
        mvo@ r0, r4                     ; Into the output buffer.

; All done!
        pulr pc
        ENDP


#5 DZ-Jay OFFLINE  

DZ-Jay

    Quadrunner

  • 5,357 posts
  • Ranger Elf: Saviour of Christmas!
  • Location:NC, USA

Posted Thu Dec 9, 2010 2:37 PM

Here's the version with the optimisation :-


I'm sorry if this sounds dense, but would you mind explaining the algorithm a bit? You are obviously making some optimizations but I'm not sure I understand them. In particular, why the alternation between SUBR/ADDR for R3? Also, why use "ADDI #-100000 AND $FFFF" instead of "NEGR" once outside the loop and then just using ADDR with the 1000 in another register? Does NEGR not set the carry bit when an overflow occurs in the negation?

I'm trying to implement my own bin2dec routine to display score, more as a learning exercise, but I can't seem to understand exactly the mechanisms that make yours work.

-dZ.

#6 catsfolly OFFLINE  

catsfolly

    Dragonstomper

  • 513 posts
  • Location:Japan

Posted Thu Dec 9, 2010 8:58 PM


Here's the version with the optimisation :-


I'm sorry if this sounds dense, but would you mind explaining the algorithm a bit? You are obviously making some optimizations but I'm not sure I understand them. In particular, why the alternation between SUBR/ADDR for R3? Also, why use "ADDI #-100000 AND $FFFF" instead of "NEGR" once outside the loop and then just using ADDR with the 1000 in another register? Does NEGR not set the carry bit when an overflow occurs in the negation?

I'm trying to implement my own bin2dec routine to display score, more as a learning exercise, but I can't seem to understand exactly the mechanisms that make yours work.

-dZ.

I haven't looked into the carry bit operation much, but think I can explain the alternating subtracts and adds.

Basically, we are doing a divide by doing subtracts and counting the number of subtracts we do. Imagine we are working on the hundred's digit. So we have a number "num" that is less than 1000. The code in C would look like this:

digit = 0;
while (num >= 100)
{	
        num = num - 100;
        digit  = digit +1;
}
// use digit to set a display char
digit = 0;
while (num >= 10)
{	
        num = num - 10;
        digit  = digit +1;
}
// use digit to set a display char

In assembly, this would look like
		clrr	r3	
		mvii #100, r1                  ; Power of 10.
BCDLoop3:
		cmp  r1, r0				   ; is num lt 100
		blt  done3				   ; yep, done
		subr r1, r0
		incr  r3
		b BCDLoop3
done3:
		; output r3
		clrr	r3
		mvii #10, r1                  ; Power of 10.
BCDLoop4:
		cmp  r1, r0				  ; is num lt 10
		blt  done4
		subr r1, r0
		incr  r3
		b BCDLoop3
done4:

If we study this code, it seems inefficient. The
         cmp	r1, r0
instruction subtracts r1 from r0, sets the flags, and then discards the result of the subtraction. Then the subtract instruction does the same subtract again. To fix this, we could do the subtract every time to set the flags. But this means we would be subtracting 100 one too many times from num, so we have to add back 100 to fix up num.

digit = 0;
while (1)
{	
	num = num - 100;
	if (num <0) break;
         digit  = digit +1;
}
num = num + 100;  // we did one too many subtracts, so fix num
// use digit to set a display char
digit = 0;
while (1)
{	
	num = num - 10;
         if(num<0) break;
         digit  = digit +1;
}
num = num + 10;  // we did one too many subtracts, so fix num
// use digit to set a display char
In assembly, this would look like:

        clrr	r3
        mvii #100, r1                  ; Power of 10.
BCDLoop3:
        subr r1, r0
	blt	done3
	incr  r3
        bc BCDLoop3
done3:
	add  r1, r0				  ; fix up r0
        ; output r3
	clrr	r3
        mvii #10, r1                  ; Power of 10.
BCDLoop4:
        subr r1, r0
	blt     done4
	incr  r3
        bc BCDLoop3
done4:
        add  r1, r0		       ; fix up r0
        ; output r3

As you can see, the loops are smaller. We could optimize this a bit more by initializing r3 to "-1" and then always doing the increment:

        mvii #-1, r3 
        mvii #100, r1                  ; Power of 10.
BCDLoop3:
	incr  r3
        subr r1, r0
	bge	BCDLoop3
done3:
	add  r1, r0				 ; fix up r0
        ; output r3
	   
        mvii #-1, r3 
        mvii #10, r1                  ; Power of 10.
BCDLoop4:
        incr  r3
        subr  r1, r0
	bge   BCDLoop44
done4:
	add  r1, r0				  ; fix up r0
        ; output r3

But, is there some way to avoid doing the fixups? This is where the alternating subtracts and adds comes in. In C

digit = -1;
while (1)
{	
	digit = digit +1;
	num = num - 100;
	if (num <0) break;
}
// num is now negative number (or zero)
// use digit to set a display char
digit = 10;
while (1)
{	
	digit  = digit -1;
	num = num + 10;
        if (num>= 0 ) break;
}
// num is positive
// use digit to output the character

In assembly:

        mvii #-1, r3 
        mvii #100, r1                  ; Power of 10.
BCDLoop3:
	incr  r3
        subr r1, r0
	bge	BCDLoop3
done3:

        ; output r3
	   
        mvii #10, r3
        mvii #10, r1                  ; Power of 10.
BCDLoop4:
        decr   r3
        addr r1, r0
	blt     BCDLoop44
done4:
        ; output r3

Does this make sense?

#7 DZ-Jay OFFLINE  

DZ-Jay

    Quadrunner

  • 5,357 posts
  • Ranger Elf: Saviour of Christmas!
  • Location:NC, USA

Posted Fri Dec 10, 2010 4:07 AM

Does this make sense?


Fantastic! Yes, it makes a lot of sense. I understand it perfectly now. Thank you very much!

-dZ.

#8 DZ-Jay OFFLINE  

DZ-Jay

    Quadrunner

  • 5,357 posts
  • Ranger Elf: Saviour of Christmas!
  • Location:NC, USA

Posted Fri Dec 10, 2010 12:02 PM

I think there may be one more optimization that can be done: According to the data sheet of the CP-1600 CPU, the conditional branching instructions cost 7 cycles when false and 9 when true. Thus, since we're testing on each iteration, it occurred to me that we could shave a few cycles by unrolling each division loop and taking the hit on the best case scenarios of breaking out early.

For instance, instead of:

        mvii #-1, r3 
        mvii #100, r1                  ; Power of 10.
BCDLoop3:
        incr  r3
        subr r1, r0
        bge     BCDLoop3
done3:

        ; output r3

We could do something like:


        mvii #-1, r3 
        mvii #100, r1                  ; Power of 10.
BCDLoop3:
     ; Unroll loop and reverse break test
     REPEAT(9)
        incr  r3
        subr r1, r0
        blt     done3                  ; break
     ENDR
        subr r1, r0                    ; Perform last one outside repeat block
                                       ; since we don''t need the test/branch there.

done3:
        ; output r3

That way, on the worse case scenarios, you save 20 cycles (2 cycles * 10 iterations) more per loop, and on the best case scenario you break even with the previous code.

What do you think?

-dZ.

#9 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • Topic Starter
  • 7,964 posts
  • Busy bee!
  • Location:North, England

Posted Fri Dec 10, 2010 12:07 PM

That way, on the worse case scenarios, you save 20 cycles (2 cycles * 10 iterations) more per loop, and on the best case scenario you break even with the previous code.


As with any optimisation its always a trade off between size and speed. In my code I only convert the score when its changed thus saving a ton of cycles.

#10 DZ-Jay OFFLINE  

DZ-Jay

    Quadrunner

  • 5,357 posts
  • Ranger Elf: Saviour of Christmas!
  • Location:NC, USA

Posted Fri Dec 10, 2010 12:53 PM

That way, on the worse case scenarios, you save 20 cycles (2 cycles * 10 iterations) more per loop, and on the best case scenario you break even with the previous code.


As with any optimisation its always a trade off between size and speed. In my code I only convert the score when its changed thus saving a ton of cycles.


I was just mentioning it because ROM is cheap nowadays, so the trade off between size and speed is not as significant as it once was.

-dZ.

#11 catsfolly OFFLINE  

catsfolly

    Dragonstomper

  • 513 posts
  • Location:Japan

Posted Fri Dec 10, 2010 4:43 PM

I was just mentioning it because ROM is cheap nowadays, so the trade off between size and speed is not as significant as it once was.

-dZ.

i think GroovyBee's code probably does the conversion with the least amount of code. If you want to minimize the execution time, I think Joe's "divide and conquer" approach will work better, even though it takes more code. For example, we could add one check to our previous example code:


     	subi  #500, r1                 ; check the halfway point
        bpl   5_or_more

4_or_less
     	mvii	  #5,       r3
        mvii #100, r1                  ; Power of 10.
BCDLoop3:
	decr    r3
        addi    r1, r0
	bge	BCDLoop3
        b       done3

5_or_more
     	mvii	 #5,  r3
        mvii     #100, r1                  ; Power of 10.
BCDLoop3:
	incr  r3
        subr r1, r0
	bge	BCDLoop3
done3:


Now, the loops will only run a maximum of 4-5 times (I haven't tried any of this code, this is just to show the concept). Further "divide and conquer" checks can reduce or eliminate the loops entirely (check out Joe's code on this thread for details...).

David




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users