BTW, in case I wasn't clear: I'm documenting what's already shipped and is in all 600-ish LTO Flash units today, and has been in jzIntv for over 3 years now. If I change anything, it will be to add to the ISA in a backward compatible way.
Yes, there aren't any programs out there that use the existing ISA. But, the ISA has already shipped, and is actually verified.
There's no real indication you're in an interrupt context or not. PV is not banked nor is it preserved. PV is primarily intended to be ephemeral and should be consumed immediately after it's generated. The instructions that set PV are also non-interruptible. Thus, the intended, safe and idiomatic use of PV is:
EXT3OP X0, X1, X2
MVI PV, R0
I should make this clearer in the documentation. It's not meant for any other use. From this, you can implement a number of interesting idioms, but you really have to bolt a MVI or ADD, or something next to it to immediately consume the value.
In general, using the extended register set (X0 - XF) from an interrupt handler needs to be done with care, as the standard ISR save/restore won't save and restore it. I'd wanted PSHM/PULM to make that more efficient. For now, it's maybe easier to say "don't use these from ISRs," but I need to include context switch concerns in the Programmer's Guide as well. Either that, or partition the Xregs so that some are for interrupt context and the rest are for foreground context. There are 16 of them.
If I do an "upgrade" to the ISA, it might be work considering register banks as well, similar to an 8051. I don't want to try to detect "return from interrupt," though. I'd make bank selection manual. But, it'd be one instruction as opposed to many.
At the start, I really wanted to avoid having an external status register, in particular because I hadn't worked out how to do branches efficiently. As it stands, I have a limited ability to encode new conditional branches efficiently, and I may have wasted my one opportunity by using the MVI xx, R7 encoding for TSTBNZ.
I encoded the CMPxx the way I did, to better reflect how IntyBASIC performs comparisons. The CMPxx& construct allows you to build compound comparisons quickly, or to conditionally zero out a value (think "break statement").
A XSWD becomes part of the interrupt context in a strong way. PV seemed safe to me as long as you stick to the idiomatic use. XSWD is a little more of a problem because the instruction that generates it is not necessarily immediately before the instruction that consumes it. Reading it and PSHR'ing it onto the stack is no fun either.
And then there's the simple concern that it's costly for me to compute. Some of the ISA implementations pushed the limit of what I could do on a 40MHz PIC. The I2BCD, BCD2I, ABCD/SBCD, ATAN2, and ISQRTFX were tight, as I recall. If you gave me 20% more cycles, I could compute the flags. I don't have those cycles.
The lack of flags forces you into a "branch-free algorithm" mindset. Given that branches are costly anyway, it's not necessarily a bad place to be. What I'm missing is a good conditional-move instruction to round it out. "Based on src1, conditionally move src2 into dst."
I like them, but I couldn't figure out how to arrange the operands to make it efficient. It really wants to be a 4 operand instruction, with the extracted-from entity separate from the extracted-to entity. In the end, I settled for a two-instruction sequence of shift (or multiply) followed by a rotate.
Perhaps, in a V2 of the extensions. As you note, this starts looking more like an anti-goal. I'd need to consider whether to prioritize GRAM loading efficiency or obviousness. (e.g. use the more obvious encoding of 8 LSBs in each of 8 registers, or the more efficient encoding of packed 8-bit pairs in 4 registers.) And, without PSHM/PULM to quickly block transfer data into and out of the Xregs, the memory bottleneck quickly overwhelms whatever you might save.
So, I decided to table those ideas until I had an efficient way to get data into and out of Xregs.
I didn't bother trying to solve that one, as JLP stores 10-bit ROM in 12-bit pages rather than 16-bit pages, and already gets most of the benefit for me. (The 16-bit vs. 12-bit decision works on 4K page boundaries. Any game with significant voice can pack it in dedicated 4K pages and get the benefit.)
That said, if you set aside an 8 word buffer of RAM, it wouldn't be hard to write an efficient 5-to-8 decoder that took a block of 5 16-bit words and output 8 10-bit words using the existing shifts and bit operations. It just didn't seem like the more pressing concern. A better use of my time would be to work on a tighter encoding for Intellivoice data, as the current data is not at all compressed.
Yes, I used both meanings from C here: Address-of for the effective-address addressing modes, and bitwise-AND for the comparison instructions.
I figured there's enough variation in heading update logic that having the inverse didn't make sense. You can easily convert ATAN2 into sine/cosine values of the desired precision with something like:
ATAN2 R0, X0, X1
MVI @X1(sintbl), R1 ; sine value
MVI @X1(sintbl+4), R2 ; cosine value
MPY16 R1, X2, X3 ; assume X2 is velocity
MPY16 R2, X2, X4
If you instead wanted to do that in fixed point, just replace MPY16 with MPYFXS. Or, if you want to model rotational inertia and use a finer-grain sine table, you could do that. etc. etc.
Ah, remember the bad old days of how to detect 8088 vs. 8086 vs. 80286, before they added CPUID? I could even return a short manufacturer string like x86 does ("GenuineIntel" / "AuthenticAMD"). "LegitLTO"?
I need to check whether RXSER / TXSER modify the register when branching to the error path. In any case, JLP and Locutus both put their serial port at the same address, so if / when these instructions come to JLP, the address for serial status will be the same.
If someone figures out how to backport this to a CC3, you could always add a CP-1600X compatible serial port window at the alternate address.
That's an out-of-place accumulation, as it required storing the intermediate result in X8 - XB. An in-place accumulation would have stored the accumulated value directly in X4 - X7 without disturbing other registers (except perhaps one for the carry).
If I had instead defined the extended precision instructions as adding the carry/borrow to dst_hi, then I could have gotten away with fewer new instructions (just 2 rather than 4). This is a case where the rapid speed with which I spec'd the ISA caused me to miss an opportunity.