Jump to content
IGNORED

philosophic assembler - use cases - programming tipps


HackMac

Recommended Posts

Recently I wondered about an often used assembly instruction for the TMS9900, the load immediately a.k.a. LI.

That thing, that makes me wondering is the fact that this instruction alters the status flags L>, A> and EQ. I ask me, for what is this good? I mean, when I load a immediately constant value in a register, I know if the value is or is not EQ to zero, so I don't have to check it.

So this code

LI   R0,>1234
JNE  @ELSEWHERE 

doesn't make much sense.

I never have seen any code or wrote code that do something like this.

Is there a serious use case? Where is the benefit?

 

Perhaps TI implement the instruction in that way to save resources in the CPU, so that the micro code could be smaller. But I think the instruction could run faster when skipping the useless status flags altering.

 

What are your opinions?

Link to comment
Share on other sites

I believe there are many situations where the manufacturer, here TI, kept a feature for the sake of consistency, less because they are really useful. The immediate operations usually set the status flags according to the result of the operation. In this case, LI does not do anything exciting, but the question is whether we would be more surprised to see that the status flags are not set, although they are set with AI, ANDI, ORI etc.

 

Another thing that almost qualifies for a facepalm is the implementation of MOV. If you have a look at the 9900 Family System Design Book you can find the fact that MOV loads the destination value first, although it will be overwritten in the next cycles. It could be understandable for MOVB (read-before-write), but in the case of MOV where nothing of the original destination value will be kept, this is somewhat surprising. (This was actually fixed in the 9995, as the 9995 does not require read-before-write.)

Link to comment
Share on other sites

Ok, you see the immediate instructions in conjunction with their common op code format and you group them in a immediate instruction family.

I see the LI as a specific form of a common load or set instruction like the LIMI or LWPI and CLR, SETO (the last two are special LI with an build in constant), SBO or SBZ instructions are also forms of a load instruction. All those instructions, but LI, has no affect on status flags. So why should LI be an exception?

Well, of course you should think the load/set instructions are a special form of a memory move like MOV does, but with a very special addressing mode. Or better the other way: MOV is a special kind of a load instruction witch has various addressing modes. Therefore the MOV instruction has also a weird behavior.

 

I believe LI and MOV are the most often used instructions and makes 70% of a program code. Because of unnecessary cycles of setting status flags, they slow down the execution time. In my opinion, and in addition of what Michael points to (the odd behavior of MOV and MOVB), CPU is really not well designed.

 

But ok, we have to live with them. Therefore I ask me how we can use those behaviors for sophisticated programming techniques. Are there use cases in real live?

Link to comment
Share on other sites

As JamesD said above, the functionality is good for self-modifying code. An instruction might be LI R0,>1234 now, but code can over-write the value >1234. You might do this in a loop perhaps, incrementing what was the value >1234, and with a JEQ to break out of the loop when the value rolls over to zero.

Link to comment
Share on other sites

Yes I understand you and also I understood JamesD.

Your use case is like kind of parameter passing (self modifying code is not a use case, its a programming technique). And where is the benefit? It is not very useful for parameter passing. There are better techniques.

Link to comment
Share on other sites

Clever programming with self-modifying code can certainly benefit from this behavior, as @JamesD and @Stuart have said; but, it is the second point @JamesD made that intrigues me. The fact that not one jump or branch instruction sets the status bits would allow the LI instruction to be used to load a register with a flag or index in several subroutines, or in several places in the same subroutine, to be used as a switch upon return.

 

...lee

  • Like 1
Link to comment
Share on other sites

Do you have some examples ready? Is there any known used implementation, where the operand of an LI is dynamically set? I'm not very creative to excogitate a scenario where LI is used to set/reset status bits. When I wish to manipulate single status bits, I would use a BLWP subprogram where I manipulate R15 and set it on the corresponding RTWP.

And furthermore creating self manipulating programs is an ugly programming style and unusable in ROMS. Usually you define a byte witch is to set/reset/test.

 

By the way TI forgot an LDST that correspond to STST, in that way how STWP correspond to LWPI ;-)

Edited by HackMac
Link to comment
Share on other sites

Do you have some examples ready?

 

Not really, unless you would consider some of the following as such.

Is there any known used implementation, where the operand of an LI is dynamically set?

 

The destination operand of the LI instruction is always “dynamically set”. Dynamically setting the immediate operand (self-modifying code) was not the point of my comment.

I'm not very creative to excogitate a scenario where LI is used to set/reset status bits.

 

LI R5,0 L = 0; A = 0; EQ = 1

LI R5,1 L = 1; A = 1; EQ = 0

LI R5,>8000 L = 1; A = 0; EQ = 0

LI R5,SRADDR L = ?; A = ?; EQ = ?

 

The first three could be used for tri-state decisions.

 

Not sure how useful the following would be: The fourth could be used to detect whether a subroutine’s address (SRADDR) is in high RAM before actually taking the branch.

 

​There are surely other useful scenarios that would benefit from the speed of LI over MOV.

When I wish to manipulate single status bits, I would use a BLWP subprogram where I manipulate R15 and set it on the corresponding RTWP.

 

Very much slower, if manipulating the L, A and EQ bits is all that is necessary.

And furthermore creating self manipulating programs is an ugly programming style and unusable in ROMS.

 

Again, not my point.

Usually you define a byte which is to set/reset/test.

 

???

 

By the way TI forgot an LDST that correspond to STST, in that way how STWP correspond to LWPI ;-)

 

I respectfully disagree. To me, the point of STST is to preserve a status for later consideration before a subsequent instruction mucks it up.

 

...lee

Edited by Lee Stewart
Link to comment
Share on other sites

Yes. The reason is that for commands which support various addressing modes (CLR R0, CLR *R0, CLR *R0+ ...), the general "data derivation sequence" is executed which required to load a word at its start.

 

That is, for Workspace Register addressing, the value of the register is always loaded first. Then, the value is changed to 0000 internally, and finally, the write operation takes place.

 

In the 9995, the "data derivation sequence" was replaced by the "address derivation sequence". The latter does not necessarily entail a memory operation.

Link to comment
Share on other sites

My english must be too bad, no one understood me in the way I want. I'm done. Lee is a bit too serious/rational for me. ;-)

Look at what you asked.

 

Recently I wondered about an often used assembly instruction for the TMS9900, the load immediately a.k.a. LI.

That thing, that makes me wondering is the fact that this instruction alters the status flags L>, A> and EQ. I ask me, for what is this good? I mean, when I load a immediately constant value in a register, I know if the value is or is not EQ to zero, so I don't have to check it.

So this code

LI   R0,>1234
JNE  @ELSEWHERE 

doesn't make much sense.

I never have seen any code or wrote code that do something like this.

Is there a serious use case? Where is the benefit?

 

Perhaps TI implement the instruction in that way to save resources in the CPU, so that the micro code could be smaller. But I think the instruction could run faster when skipping the useless status flags altering.

 

What are your opinions?

You say it doesn't make much sense and you ask where is the benefit.

 

To put the answer another way... it makes perfect sense and we have already stated the benefits.

Assembly is a general purpose language meant to be able to implement pretty much anything.

An interpreter, compiler, game, word processor, database manager, recipe file program... whatever.

The way instructions are decoded in hardware is designed to reuse things like setting status bits.

Every time you LD, no matter which form of the instruction, you set the status bits.

All LD functions are consistent in their behavior.

It simplifies decoding the instruction in hardware whether you use the status bits or not and programmers can always depend on the status bits being set if they use a LD instruction.

Do you know how much more difficult it would be to learn assembly if behavior of instructions varied just based on whether or not someone thought something made sense?

Have a look at all the special cases for which instructions can be used on which registers on the Z80 and you will understand what I'm getting at.

 

 

We already told you some other benefits. Just because you don't like self modifying code does not mean it's not a benefit.

You keep looking at it through your narrow snippit of code and don't seem to understand what we are saying.

You probably haven't seen code like that because you only want to look at it through that exact sequence of instructions.

Make this small change to the code:

 

LI   R0,>1234
...
Subroutine:
...
JNE  @ELSEWHERE

The LI is now setting a parameter for the subroutine and falling through into it.

The "..." indicates some instructions that do no modify the status flags or may leave them alone under certain circumstances.

The subroutine can also be called from anywhere else and the JNE instruction has no idea what set the status flags so you cannot presume to know if the value in R0 is zero, positive, negative or otherwise.

The status bits are set for consistency in the instruction set.

Self modifying code could take advantage of it.

And you cannot assume that a sequence of instruction will know where the status bits are set so you cannot assume to know what they are.

Just because your code snippet doesn't make sense to you does not mean an instruction's functionality does't make any sense.

Would this make sense?

 

LD R0,<1234
LD R0,<4321

Does that mean the instruction used doesn't make sense?

Or does it mean the programmer's logic doesn't make sense?

Edited by JamesD
Link to comment
Share on other sites

My english must be too bad, no one understood me in the way I want. I'm done. Lee is a bit too serious/rational for me. ;-)

 

Your English is fine. Misunderstanding will happen among native speakers. That is precisely why I segmented your quote. I wanted my responses to be as clear as I could make them. Rest assured, I am not always right. I have been disabused of conclusions I have drawn by some of the folks in this very thread. Perhaps my use of bold type was a little over the top. That is why I attempted to soften it with a blue color. My intention was to make a good separation between quoting you and my comments. I viewed it with the default page theme for this forum and my comments do look a little more like I am shouting than they do in the “Gravity” theme I have been using. I likely appeared a bit more of a jerk than I normally do. |:) I dialed back the bold in my last post.

 

And—don't give up so easily. You have been on this forum long enough to observe that, at times, discussions get passionate; but, progress usually gets made. :)

 

...lee

Link to comment
Share on other sites

I definitely agree there--the discussion has made clear some of the quirks of programming in Assembly that can be used to the programmer's advantage. Thanks for starting the discussion, HackMac! :) No problems with your English at all--and although I could hack my points across in French--your English is definitely better than my French (I would do better in German or Turkish, but I'm sure I'd make weird mistakes in both of those too). :)

Link to comment
Share on other sites

I definitely agree there--the discussion has made clear some of the quirks of programming in Assembly that can be used to the programmer's advantage. Thanks for starting the discussion, HackMac! :) No problems with your English at all--and although I could hack my points across in French--your English is definitely better than my French (I would do better in German or Turkish, but I'm sure I'd make weird mistakes in both of those too). :)

Exactly.

 

There are any number of ways to implement an algorithm in assembly language.

You can make your code easy to follow, optimized for space, optimized for speed, or any combination of those.

I've seen some really strange code and you should never assume code should obviously make sense.

If we were talking about the 6502, you even have illegal instructions that do so odd things that have been used to save a few more clock cycles in some code.

 

*edit*

And don't forget, assembly code can do some really weird things and still work, even if just by accident.

Edited by JamesD
Link to comment
Share on other sites

Recently I wondered about an often used assembly instruction for the TMS9900, the load immediately a.k.a. LI.

That thing, that makes me wondering is the fact that this instruction alters the status flags L>, A> and EQ. I ask me, for what is this good? I mean, when I load a immediately constant value in a register, I know if the value is or is not EQ to zero, so I don't have to check it.

So this code

LI   R0,>1234
JNE  @ELSEWHERE 

doesn't make much sense.

I never have seen any code or wrote code that do something like this.

Is there a serious use case? Where is the benefit?

 

Perhaps TI implement the instruction in that way to save resources in the CPU, so that the micro code could be smaller. But I think the instruction could run faster when skipping the useless status flags altering.

 

What are your opinions?

Self modification code in any Language saves space and when programs are smaller they are better.

 

Several sections of Self Modified Code can actually run faster then huge jumps and sections of loading values over and over.

 

Objective C and other computer languages save space, operate faster and if done properly is much more efficient.

Link to comment
Share on other sites

Perhaps TI implement the instruction in that way to save resources in the CPU, so that the micro code could be smaller. But I think the instruction could run faster when skipping the useless status flags altering.

 

What are your opinions?

You answered your own question. However, it seems you are thinking of the hardware from a software perspective. In order to write a value to a workspace register the value has to come from the ALU output (that's just the way the 9900 is designed), and setting the status flags is something the ALU does automatically. So you get the status update for free and, as Tursi mentioned, updating the status flags probably happens in parallel to other CPU activity. If there was a discrete machine state dedicated to latching the status flip-flops, it would be 333ns long, so hardly a slow-down.

 

The LI is not lumped with LIMI or LWPI because those instructions deal with updating true *internal* registers of the 9900 and have dedicated opcodes. The LI instruction works with the general-purpose registers, which are not part of the CPU itself, and require external memory access. LI is properly grouped with the other Immediate instructions based on how it functions internally.

 

Just because the LI affects the flags does not mean it necessarily makes sense to act on them after the instruction. It would have taken more hardware or a special case to suppress the status flags update for LI, and in hardware every transistor matters. Reducing cost, die transistors, and circuit complexity weigh more heavily than avoiding nuances like LI updating the status flags.

 

If you look hard enough you will certainly find other instructions that have the similar kinds of side effect. You will also find these kinds of nuances in every CPU. The key as a programmer is knowing that this is going to happen and writing your code accordingly.

Edited by matthew180
  • Like 1
Link to comment
Share on other sites

Do you have some examples ready? Is there any known used implementation, where the operand of an LI is dynamically set? I'm not very creative to excogitate a scenario where LI is used to set/reset status bits. When I wish to manipulate single status bits, I would use a BLWP subprogram where I manipulate R15 and set it on the corresponding RTWP.

And furthermore creating self manipulating programs is an ugly programming style and unusable in ROMS. Usually you define a byte witch is to set/reset/test.

 

By the way TI forgot an LDST that correspond to STST, in that way how STWP correspond to LWPI ;-)

 

I don't know that there is any benefit to restore the status register as you describe. Often the intent (and use) of STST is to allow for subsequent (and potentially consecutive) tests on the saved status value. You can also push the saved status back to a calling (or different) workspace prior to a RTWP to perform some interesting magic.

 

Somewhere in the archives there are examples of self-modifying code that are very handy. One of my common self-modifying code pairs is to load a subroutine's return address into a "B @>xxxx". A stack is not required, the branch address is updated at the start of the subroutine, and the branch substitutes as return with no register use required. Example:

 

 

SUB1  MOV R11,@RET+2
..
..
RET    B    @0000

 

I would argue that self-modifying code can be quite helpful and certainly not ugly, IMHO. As for ROM, I never consider that an issue as all of my programs are intended to run within RAM. ;)

  • Like 3
Link to comment
Share on other sites

I usually avoid self modifying code, but I used it recently in a soft sprites routine, to set up fixed shift instructions instead of having to load R0 with the shift value - saved a few clock cycles.

*********************************************************************
*
* Display a soft sprite at any position
* The pattern must be provided as a 'natural' bitmap of R3 lines of 
* 8 * R2 pixels where a line extends all the way across the sprite.
*
* R0 x coordinate
* R1 y coordinate
* R2 Width in bytes (of 8 pixels)
* R3 Height in pixels
* R4 Pattern address
* R5 Mask address
*
SFTSPR MOV    R11,*R10+            * Push return address onto the stack
       BL     @SCRADR              * R6 = byte offset, R7 = bit offset
       AI     R6,SCRBUF            * Add screen buffer offset
*      Setup shift instructions
       LI     R1,8                 * 8
       S      R7,R1                * 8 - bit offset
       SLA    R1,4                 * Move into place for SLA instruction
       MOV    @SLAX,R0             * Get op code for SLA Rx,0
       SOC    R1,R0                * Set shift
       MOV    R0,@SLAX1            * Write into program
       MOV    @SLAY,R0             * Get op code for SLA Ry,0
       SOC    R1,R0                * Set shift
       MOV    R0,@SLAY1            * Write into program
*      2nd group of shifts
       MOV    R7,R1                * Bit offset
       JEQ    SFTSP1               * Check if shift would be zero
       SLA    R1,4                 * Move into place for SLA instruction
       MOV    @SLAX,R0             * Get op code for SLA Rx,0
       SOC    R1,R0                * Set shift
       MOV    R0,@SLAX2            * Write into program
       MOV    @SLAY,R0             * Get op code for SLA Ry,0
       SOC    R1,R0                * Set shift
       MOV    R0,@SLAY2            * Write into program
       JMP    SFTSP2
*      Replace shifts with nops
SFTSP1 MOV    @NOOP,R0
       MOV    R0,@SLAX2
       MOV    R0,@SLAY2       
*      Line loop
SFTSP2 MOV    R2,R12               * Save the width
SFTSP3 MOV    R6,R7                * Save the destination address
       CLR    R8                   * Pattern register
       CLR    R9                   * Mask register
*      Byte loop
       MOV    R12,R2               * Restore width
SFTSP4 MOVB   *R4+,@R8LB           * Get pattern byte in LSB       
       MOVB   *R5+,@R9LB           * Get mask byte in LSB
SLAX1  SLA    R8,0                 * Shift (8 - bit offset) bits into MSB
SLAY1  SLA    R9,0                 * Actual shift values will be inserted
       MOVB   *R6,R1               * Get existing screen buffer byte
       SZCB   R9,R1                * Remove bits not set in mask
       SOCB   R8,R1                * Set pattern bits
       MOVB   R1,*R6               * Write byte back
       AI     R6,8                 * Next byte in line
       DEC    R2
       JEQ    SFTSP5
SLAX2  SLA    R8,0                 * Shift (bit offset) bits into MSB
SLAY2  SLA    R9,0                 * Actual shift values will be inserted
       JMP    SFTSP4
*      Final byte
SFTSP5 SWPB   R8
       SWPB   R9
       MOVB   *R6,R1               * Get existing screen buffer byte
       SZCB   R9,R1                * Remove bits not set in mask
       SOCB   R8,R1                * Set pattern bits
       MOVB   R1,*R6               * Write byte back
*      Next line
       MOV    R7,R6                * Restore destination address
       COC    @SEVEN,R6            * Check for last character row
       JEQ    SFTSP6
       INC    R6                   * Next row within character
       JMP    SFTSP7
SFTSP6 AI     R6,256-7             * First row of next character
SFTSP7 DEC    R3
       JNE    SFTSP3
*      Return
       DECT   R10                  * Pop return address off the stack
       MOV    *R10,R11
       B      *R11
*      Instructions
SLAX   SLA    R8,0       
SLAY   SLA    R9,0
NOOP   NOP
*// SFTSPR

  • Like 1
Link to comment
Share on other sites

 

One of my common self-modifying code pairs is to load a subroutine's return address into a "B @>xxxx". A stack is not required, the branch address is updated at the start of the subroutine, and the branch substitutes as return with no register use required. Example:

SUB1  MOV R11,@RET+2
..
..
RET    B    @0000

 

Yep, I did put that one into my notes! You mentioned in Matthew's Assembly thread. I haven't used it yet, but it's too nice ignore :!:

Link to comment
Share on other sites

I usually avoid self modifying code, but I used it recently in a soft sprites routine, to set up fixed shift instructions instead of having to load R0 with the shift value - saved a few clock cycles.

Remember too, if you can spare the register (or want to use a variable in RAM) that X can also be used to replace a single instruction. It doesn't save as many cycles, but it works from ROM. :) I've generally tried to use that since we have it, though I'm not too much against self-modifying code. (I've mostly used it to save extra comparisons in loops ;) ).

Link to comment
Share on other sites

Remember too, if you can spare the register (or want to use a variable in RAM) that X can also be used to replace a single instruction. It doesn't save as many cycles, but it works from ROM. :) I've generally tried to use that since we have it, though I'm not too much against self-modifying code. (I've mostly used it to save extra comparisons in loops ;) ).

 

X is on my list of weird stuff I will try some day along with BLWP and the ISR :).

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...