Jump to content

Photo

99110 ROM disassembly


26 replies to this topic

#26 speccery OFFLINE  

speccery

    Moonsweeper

  • 287 posts

Posted Mon Feb 12, 2018 4:36 AM

Thanks pnr for very interesting and in-depth analysis!



#27 pnr OFFLINE  

pnr

    Star Raider

  • Topic Starter
  • 71 posts

Posted Wed Feb 14, 2018 6:08 PM

Last up is an analysis of AR, SR, and CR. All three share most of their code.

 

Although addition is perhaps conceptually the easiest operation, the code is surprisingly long and involved, as there are many cases to consider. As a result, floating point addition is not much faster than multiplication or division.

 

The main issue is that the mantissas of two floating point numbers can only be added together if their exponents are equal. If they are not equal, the smaller number must be denormalized to make the exponents equal:

0.1234 x 16^4 + 0.12 x 16^2 = 0.1234 x 16^4 + 0.0012 x 16^4 = 0.1246 x 16^4

If the difference between the exponents is more than 6, the smaller number becomes insignificant and effectively equals zero. 

 

The entry code for SR is as follows:

; entry point for SR
;
0814 C138   MOV  *R8+, R4       ; fetch 1st word of S
0816 136D   JEQ  >08F2          ; if S is zero, nothing to do
0818 0224   AI   R4, >8000      ; flip sign bit
081A 8000
081C 1002   JMP  >0822          ; now handle as AR

It checks the operand for being zero, and if so the accumulator already has the right result. If not zero, it flips the sign bit and handles FPAC-S as FPAC+(-S).

 

Next is the entry code for AR:

; entry point for AR
;
081E C138   MOV  *R8+, R4       ; fetch 1st word of S
0820 1368   JEQ  >08F2          ; if S is zero, nothing to do

It only checks for the operand being zero, and FPAC already containing the result.

 

From here on, AR and SR have an identical code path.

0822 C000   MOV  R0, R0         ; if FPAC is zero, S is the result
0824 1603   JNE  >082C
0826 C004   MOV  R4, R0         ; move S to local FPAC
0828 C058   MOV  *R8, R1
082A 1063   JMP  >08F2          ; store FPAC & set status bits
;
082C 04C6   CLR  R6             ; clear flag (= store result)
082E C158   MOV  *R8, R5        ; fetch 2nd word of S

The code first checks for another special case: if the accumulator is zero, the result is equal to the operand. If not, it enters the full calculation. It clears the CR flag (R6): at >0830 the code path for CR merges in (see entry code for CR discussed earlier), and the CR code path will separate towards the end of the algorithm.

 

Note that the CR code path does not have checks for either the accumulator or the operand being zero. Effectively a zero here is handled as meaning "+0.0 x 16^-64" and this will not lead to issues in the CR code path.

; CR jumps here (with R6 all ones = set status flags only)
;
0830 04C2   CLR  R2             ; clear extra mantissa bits
0832 C0C0   MOV  R0, R3         ; save exponents
0834 C1C4   MOV  R4, R7
0836 7000   SB   R0, R0         ; remove exponents from mantissas
0838 7104   SB   R4, R4

As usual the code starts out separating the sign and exponent from the mantissa. R2 is prepared to hold an extra 'guard' digit of precision.

 

Next the sign bit and exponent are separated for the accumulator:

083A 0A13   SLA  R3, 1          ; is FPAC negative?
083C 1702   JNC  >0842
083E 06A0   BL   @>0AE6         ; yes: negate extended FPAC mantissa
0840 0AE6
0842 0993   SRL  R3, 9          ; FPAC exponent in R3

If FPAC is negative, the mantissa is negated. There is a subroutine for this, as the negation has to happen again when the result is converted back to standard IBM360 format. The subroutine is:

; subroutine to negate extended FPAC mantissa
;
0AE6 0540   INV  R0
0AE8 0541   INV  R1
0AEA 0502   NEG  R2
0AEC 1703   JNC  >0AF4
0AEE 0581   INC  R1
0AF0 1701   JNC  >0AF4
0AF2 0580   INC  R0
0AF4 045B   RT

Including R2 in the negation is superfluous at this point. Also note that with the sign/exponent removed, the mantissa has two extra hex digits on the left, and hence does not need to consider the >800000 overflow condition when negating.

 

Next, the sign and exponent of the operand are separated:

0844 0A17   SLA  R7, 1          ; is S negative?
0846 1704   JNC  >0850
0848 0544   INV  R4             ; yes: negate S mantissa
084A 0505   NEG  R5
084C 1701   JNC  >0850
084E 0584   INC  R4
0850 0997   SRL  R7, 9          ; S exponent in R7

Here too the mantissa is negated if the operand is negative, but this time it happens in line because it does not need to be reversed later.

 

With the mantissas prepared and including the sign bit, the code considers the exponents and the relative size of the accumulator and the operand:

0852 C247   MOV  R7, R9         ; compare exponents
0854 6243   S    R3, R9
0856 1319   JEQ  >088A          ; if equal, directly add the mantissas

0858 0289   CI   R9, >0006      ; S much larger than FPAC?
085C 110F   JLT  >087C
085E C0C7   MOV  R7, R3         ; if FPAC is insignificant, result is S
0860 04C0   CLR  R0
0862 04C1   CLR  R1
0864 1012   JMP  >088A

...

087C 0509   NEG  R9             ; FPAC much larger than S?
087E 0289   CI   R9, >0006
0880 0006
0882 11F6   JLT  >0870
0884 C1C3   MOV  R3, R7         ; if S is insignificant, result is FPAC
0886 04C4   CLR  R4 
0888 04C5   CLR  R5

The code first handles the three easy cases: exponents equal, FPAC dominates and S dominates. If the exponents are equal, there is nothing to do. If S is more than 6 hex digits larger than FPAC, FPAC is effectively zero and the exponent of S becomes the exponent of the result. If FPAC is more than 6 hex digits larger than S, S is effectively zero and the exponent of FPAC becomes the exponent of the result.

 

The complex case is handled by a clever loop that shifts either FPAC or S into place. The loop code is entered in the middle (>0870):

; denormalize & align smallest mantissa
;
0866 C085   MOV  R5, R2         ; shift S one nibble right
0868 0AC2   SLA  R2, 12
086A 001C   SRAM R4, 4
086C 4104
086E 0587   INC  R7             ; and adjust exponent

0870 81C3   C    R3, R7         ; exponents equal?
0872 130B   JEQ  >088A          ; yes: add mantissas
0874 15F8   JGT  >0866          ; exp FPAC > exp S?

0876 06A0   BL   @>0AD4         ; no: shift FPAC one nibble right
0878 0AD4                       ;     and adjust exponent
087A 10FA   JMP  >0870

The loop compares the exponents and if they have become equal (which they must within 6 shifts), the work is done and we proceed with the actual addition at >088A. If S is the smallest the loop runs from >0866 to >0874 and shifts the operand in place (keeping one guard digit in R2). If FPAC is the smallest, the loop runs from >0870 to >087A and shifts the accumulator in place (again keeping one guard digit in R2).

 

The accumulator shift is also used again later in the algorithm and hence in a subroutine:

; subroutine to (de)normalize FPAC mantissa
; to the right one hex digit (nibble)
;
0AD4 C081   MOV  R1, R2        ; shift extended mantissa one nibble
0AD6 0AC2   SLA  R2, 12
0AD8 001C   SRAM R0, 4
0ADA 4100
0ADC 0583   INC  R3            ; adjust exponent
0ADE 24E0   CZC  @>0BD6, R3    ; exponent in range?
0AE0 0BD6
0AE2 139F   JEQ  >0A22         ; no: overflow
0AE4 045B   RT

At this point in time, the range check on the exponent is superfluous, as the exponent must be in range (because S is in range).

 

With both numbers properly aligned, we can do the actual addition. At this point, the code for CR takes its own path again:

088A C186   MOV  R6, R6         ; was opcode CR, or AR/SR?
088C 1307   JEQ  >089C
088E 002A   AM   R4, R0         ; CR: add mantissas & return status bits
0890 4004
0892 02CA   STST R10
0894 024A   ANDI R10, >E000     ; mask out L>, A>, EQ status bits
0896 E000
0898 E3CA   SOC  R10, R15
089A 0380   RTWP                ; macro processing complete

For the CR instruction, we add the mantissas and only look at the status bits (L>, A> and EQ) and return those to the user routine. No result is stored back to the user accumulator.

 

For AR and SR, there is more work to do:

089C 002A   AM   R4, R0         ; add mantissas
089E 4004
08A0 1325   JEQ  >08EC          ; if zero, clear FPAC & finish
08A2 1504   JGT  >08AC          ; if negative,
08A4 06A0   BL   @>0AE6         ;   negate extended mantissa
08A6 0AE6
08A8 0263   ORI  R3, >0080      ;   and flip sign bit
08AA 0080
08AC D000   MOVB R0, R0         ; if mantissa too large
08AE 1302   JEQ  >08B4
08B0 06A0   BL   @>0AD4         ; normalize it rightward one nibble
08B2 0AD4
            [JMP to >08CC seems missing]

Again, the two mantissas are added. If the result is zero, FPAC is cleared (the normalized version of zero) and the status bits are set accordingly.

 

If the result is negative, the result mantissa is negated back to positive (note this time negating the guard digit as well is not superfluous) and the sign bit is set accordingly.

 

If the result has one more hex digit (i.e. something like 0.800000 + 0.A00000 = 1.200000), the mantissa is normalized one hex digit to the right (note that in this case, the range check is not superfluous). As the mantissa must now be in normalized form, the code could proceed to merging in the sign/exponent byte. However, it drops into the code for another check.

 

It is possible in addition that several hex digits cancel out, and that there are a lot of leading zeroes in the result mantissa. An example would be:

0.123456 - 0.123400 = 0.000056

This must be normalized to 0.56x16^-4. In this case no precision is lost. However, with a denormalized number one guard digit is required:

0.100001 - 0.123400x16^-5 = 0.100001 - 0.000001(2) = 0.0FFFFF(E)

This must be normalized to 0.FFFFFEx16^-1 and in this case we need the guard digit shifted in. If I'm not mistaken only one guard digit can possibly shift in, and hence that is all we have in R2.

 

In code, this leads to the following:

; normalize FPAC mantissa (leftward)
;
08B4 0280   CI   R0, >000F      ; is the highest nibble 0?
08B6 000F
08B8 1509   JGT  >08CC          ; no: mantissa is normalized
08BA 24E0   CZC  @>0BD6, R3     ; exponent already 0?
08BC 0BD6
08BE 1378   JEQ  >09B0          ; yes: underflow
08C0 0603   DEC  R3             ; reduce exponent & shift mantissa one nibble    
08C2 001D   SLAM R0,4
08C4 4100
08C6 09C2   SRL  R2, 12         ; shift in guard digit
08C8 A042   A    R2, R1
08CA 10F4   JMP  >08B4
;
08CC 06C3   SWPB R3             ; merge exponent back in
08CE D003   MOVB R3, R0
08D0 1071   JMP  >09B4          ; store FPAC & set status bits

This code was discussed before in the post on multiplication: this tail is shared between AR, SR and MR.

 

I wonder if the AR/SR code is the shortest possible. It would seem that the checks for zero accumulator and operand are for performance only, as the rest of the algorithm would seem to work for AR/SR just as it does for CR. Also, maybe it is faster to operate on mantissas shifted one hex digit to the left; this still leaves one "overflow digit" to the left, but makes room to include the guard digit on the right. Finally, the range check could be taken out of the "shift right" subroutine and moved to immediately after the second subroutine call.

 

That completes our tour of the 99110 macrorom: there is no other code left to discuss.

 

 






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users