Jump to content
Sign in to follow this  
moulinaie

Compare with Zero, ASSEMBLER

Recommended Posts

Hi,

 

I'm not sure, but I think there was a way to compare a value (from a register? from memory?) to zero faster or shorter than:

 

register: CI Ri, 0 (for compare immediate Register i to zero) -> four bytes used

 

or memory: MOV @adr,Ri (to set flags according to what's in adr, erasing register i)

 

(I used so often the 68000 that I expect every processor to have a TEST instruction !)

 

If someone knows...

 

Guillaume.

 

 

 

Share this post


Link to post
Share on other sites

There is no explicit TEST instruction for the TMS9900 other than the compare instructions (C, CB, CI).

 

Both of your examples take four bytes. “MOV Ri,Ri”, however, takes only two and does not erase Ri but does take one memory access more than the CI instruction.

 

...lee

Share this post


Link to post
Share on other sites
8 hours ago, moulinaie said:

Hi,

 

I'm not sure, but I think there was a way to compare a value (from a register? from memory?) to zero faster or shorter than:

 

register: CI Ri, 0 (for compare immediate Register i to zero) -> four bytes used

 

or memory: MOV @adr,Ri (to set flags according to what's in adr, erasing register i)

 

(I used so often the 68000 that I expect every processor to have a TEST instruction !)

 

If someone knows...

 

Guillaume.


There is another alernative: the CZC instruction. It tests for zero bits using a mask. It's more powerful but slower.

 

Summary: with registers in PAD >8300, MOV and CZC are equal, and can be a little faster than CI. CZC has more uses than comparing to 0.
With registers in external memory, CI and CZC can be equal, and a bit faster than MOV.

 

ones data >ffff     ; source word all ones
czc  @ones,r6

seto r7             ; all ones, if you can keep a register constant around
czc  r7,r6


Per 9900 Family Systems Design, p. 8-23  (I seem to recall there is an errata to this table somewhere?)

mov  r6,r6     C=14 M=4
ci   r6,0      C=14 M=3 
czc  @ones,r6  C=14 M=3 A: C+=8 M+=1
czc  r7,r6     C=14 M=3
seto r7        C=10 M=3


C is the number of cycles
M is the number of memory accesses
A is additional cycles and memory access for operands other than registers.
W is the number of wait states imposed for external memory


If W=4 (2 wait states per byte on the 4A) 

Formula:       C  + W*(M+A)


mov  r6,r6     14 + 4*4 = 30
ci   r6,0      14 + 4*3 = 26
czc  @ones,r6  22 + 4*4 = 38  

seto r7        10 + 4*3 = 22
czc  r7,r6     14 + 4*3 = 26
=total         24 + 4*6 = 48


If you prepare R7 ahead of time, CZC can be faster than MOV but the same as CI.


If you put registers in PAD (LWPI >8300), then register memory accesses (R) don't have wait states. I think this reduces to:

Formula:       C  + 0*R + W*(M-R)


mov  r6,r6     14 + 0*3 + 4*1 = 18
ci   r6,0      14 + 0*1 + 4*2 = 22
czc  @ones,r6  22 + 0*1 + 4*3 = 34
czc  r7,r6     14 + 0*2 + 4*1 = 18
seto r7        10 + 0*2 + 4*1 = 14


Summary: in the best conditions, MOV and CZC are equal, and can be a little faster than CI. CZC has more uses than comparing to 0.
With registers in external memory, CI and CZC can be equal, and a bit faster than MOV.


Note: With a 3 MHz clock, 18 cycles is 6 microseconds.


Further


I use CZC to test loop conditions where I want a condition each time the loop counter crosses a multiple.


For instance, writing R2 bytes down a column of a bitmap screen, and testing the memory address for a multiple of 8 in each loop.

 li   r7,>7
 li   r0,>4000+PATTBL+>03b5   ; arbitrary starting address in pattern table
 li   r1,>1f  ; a pattern byte to write all down the column
 li   r2,42   ; weird number of bytes to write down the column
loop:
 movb r1,@VDPWD  ; write a byte. note: optimize: store VDPWD in a register
 inc  r0
 czc  r7,r0   ; where r0 is the vdp address
 jne  next    ; not a multiple of 8
 ai   r0,>F8  ; calculate address of next row (hey: why do we do bitmap mode this way? why not 8 consecutive chars down instead of 32 across?)
 bl   @setva  ; update VDPWA
next:
 dec  r2
 jne  loop


This might not be optimal compared to using two loop counters, but it's a pattern worth considering.
 

Share this post


Link to post
Share on other sites
On 10/24/2019 at 9:11 PM, FarmerPotato said:

This might not be optimal compared to using two loop counters, but it's a pattern worth considering.

I know this is not about optimizing drawing, but what I would do is to draw the bottom lines of the top character first (if any), then draw the middle characters in an unrolled loop with 8 movb, and finally draw the top lines of the bottom character (in any). A lot of work to code, but you dramatically reduce the average number of instructions it takes to write a byte to the screen, especially if you draw many lines.  ;)

Edited by Asmusr

Share this post


Link to post
Share on other sites
13 hours ago, moulinaie said:

I'm not sure, but I think there was a way to compare a value (from a register? from memory?) to zero faster or shorter than:

Aside from the instructions already mentioned, the CPU will compare-to-zero after certain instructions.  Organizing your code and loops in such a way to take advantage of the auto-compare is going to be the fastest.  I'm pretty sure most people here are aware of this, but I did not see it mentioned.

  • Like 1

Share this post


Link to post
Share on other sites
12 hours ago, FarmerPotato said:


There is another alernative: the CZC instruction. It tests for zero bits using a mask. It's more powerful but slower.

 

Summary: with registers in PAD >8300, MOV and CZC are equal, and can be a little faster than CI. CZC has more uses than comparing to 0.
With registers in external memory, CI and CZC can be equal, and a bit faster than MOV.

 

 

Hi!

 

This is very interesting. Those instructions, such as CSZ and friends, look very powerfull and certainly not used as they should be.

I like your example with the "multiple of 8", that's clever !

 

When I am working with MLC, I do not touch to the 256 fast ram block as most of it is reserved for XB use.

 

Guillaume.

Share this post


Link to post
Share on other sites
7 hours ago, matthew180 said:

Aside from the instructions already mentioned, the CPU will compare-to-zero after certain instructions.  Organizing your code and loops in such a way to take advantage of the auto-compare is going to be the fastest.  I'm pretty sure most people here are aware of this, but I did not see it mentioned.

Yes, that's what I try to do most of the time.

 

But in my current program, a flag was turned to zero (eventually) inside a loop. And I had to test it after the loop was finished.

 

Guillaume.

Share this post


Link to post
Share on other sites
21 hours ago, Asmusr said:

I know this is not about optimizing drawing, but what I would do is to draw the bottom lines of the top character first (if any), then draw the middle characters in an unrolled loop with 8 movb, and finally draw the top lines of the bottom character (in any). I lot of work to code, but you dramatically reduce the average number instructions it takes to write a byte to the screen, especially if you draw many lines.  ;)

I think you're right.

 

I wrote a fast rectangle fill not long ago, which relied on SZCB and CZC instructions. However, they were just to manipulate the coordinates into  counts for the top, middle, bottom chunks on 8-pixel boundaries, not to test the count inside the loop. Then an unrolled loop in PAD blasted them out :) I used the top or bottom count when  jumping into the right place in the unrolled loop.
 

  • Like 1

Share this post


Link to post
Share on other sites

Depending on the value range you can also use ABS.  It is a handy, fast method to check for zero/non zero though it isn't always the best test for iterative loops since you probably need to use MOV, DEC, S or some other instruction as part of the loop. For flags it is very handy/quick to use CLR / SETO along with ABS to test for zero/not zero.

  • Like 1

Share this post


Link to post
Share on other sites
12 hours ago, InsaneMultitasker said:

Depending on the value range you can also use ABS.  It is a handy, fast method to check for zero/non zero though it isn't always the best test for iterative loops since you probably need to use MOV, DEC, S or some other instruction as part of the loop. For flags it is very handy/quick to use CLR / SETO along with ABS to test for zero/not zero.

Thanks a lot, I like this solution for my flag.

Gonna modify the source...

 

Guillaume.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...