Jump to content

BillG

New Members
  • Content Count

    62
  • Joined

Posts posted by BillG


  1. This is a bit of a tangent, but...

     

    I have been working on a 8080 simulator to allow running CP/M programs on the 680x and 65[x]02.  One of the posters on the forum at 6502.org said he is interested in what I am doing because he was "collecting languages" on his Apple II.  Turbo Pascal was one of the compilers he mentioned; he did not say he had a Z80 card.

     

    That reminded me of my discovery long ago that Turbo Pascal for CP/M required a Z80 processor, both for running the compiler as well as for the programs it generated.  I went back to MT+ or assembly language for things needing to run on an 8080 or 8085.  But I also briefly considered trying to modify Turbo Pascal to generate 8080 programs; that effort did not go very far as programming on the then new IBM PC took up more of my time.

     

    I took a closer look at the CP/M version of Turbo Pascal after that discussion.  I have since disassembled most of the run-time library and commented much of it.  I still believe that creating a version of the compiler to run on an 8080 system is not feasible, but it now seems very doable to:

     

    * write an 8080 version of the run-time library

    * write a post-compiler to translate Z80 code generated by the compiler to 8080 assembly language

    * write a tool to stitch the two together

     

    If that can be done, it is not a huge stretch to

     

    * write enough of a Z80 simulator to run the Turbo Pascal compiler on processor X

    * write a version of the run-time library for processor X

    * write a version of a converter to translate Z80 code to processor X

     

    I believe this can be done with substantially less effort than writing a new Turbo compatible compiler to run natively on processor X

    • Like 2

  2. On 7/29/2020 at 2:56 PM, Mehridian Sanders said:

    On top of that .. does anyone feel like taking a swing at a "Double Wide Speech Synth Case" for modelling? (To accommodate the TIPI/32k and Speech Synth?) I really don't have the time the learn 3d modelling as fast as I would like to.

    Tinkercad is a very approachable way to get started in 3D design: https://www.tinkercad.com/

     

    You add or subtract geometric shapes.  Definitely give it a try.

     

    Somewhat more advanced is OpenSCAD, a parametric design tool.  It is somewhat like programming.  https://www.openscad.org/

     

    Both of these are free to use.

     

    At the other end of the spectrum, more powerful and difficult to learn, are the full-featured "professional" CAD programs.


  3. My head is spinning and this is how I plan to make it stop.

     

    1. I choose this as the authoritative document.  Appendix A of:  

    https://ia801205.us.archive.org/14/items/bitsavers_tiTMS9900MstemDevelopmentManual1977_4482262/MP702_TMS9900_Family_System_Development_Manual_1977.pdf

     

    2. It speaks in terms of machine cycles.  A machine cycle consists of two clock cycles.  This was a source of much of my confusion as some numbers appeared to be off by half or double.  They were.

     

    3. My assembler will distinguish only between workspace accesses and all other memory accesses (including instruction opcode and immediate operand fetches) and allow specifying whether each is in fast or slow memory.  It will assume a 99/4A platform in which all slow memory accesses incur a penalty of four additional clock cycles.  The default is fast workspace and slow other memory.

    • Like 1

  4. 6 hours ago, FarmerPotato said:

    The J1 is a stack-based machine, similar to the Novix NC4016, with an instruction set designed to run FORTH. Read about it here: https://www.excamera.com/files/j1.pdf   or https://excamera.com/files/svfig-2015-aug.pdf

    Do you think this will be effective for writing emulators for other processors?

     

    My understanding is that the fastest non-jit approach is some form of threaded code instead of a central decoder/dispatcher.


  5. I have been playing with code generation on different processors.

     

    This line of code

     

        W0 := S1;
    

     

    sign extends a byte into two.

     

    For the 6502:

     

                              00037 ;  1 L v S1
     0029 A0 00           [2] 00038          ldy    #0
     002B A6 16           [3] 00039          ldx    S1
     002D 10 01 (0030)  [2/3] 00040          bpl    2f
     002F 88              [2] 00041          dey
     0030                     00042 2:
     0030 98              [2] 00043          tya
                              00044 ;  0 := v W0 -> 1
     0031 86 0D           [3] 00045          stx    W0
     0033 85 0E           [3] 00046          sta    W0+1
    

     

    For the 6800:

     

                              00037 *  1 L v S1
     0029 4F              [2] 00038          clra
     002A D6 16           [3] 00039          ldab   S1
     002C 2A 01 (002F)    [4] 00040          bpl    2f
     002E 4A              [2] 00041          deca
     002F                     00042 2:
                              00043 *  0 := v W0 -> 1
     002F 97 0D           [4] 00044          staa   W0
     0031 D7 0E           [4] 00045          stab   W0+1
    

     

    For the 8080:

     

                              00037 ;  1 L v S0
     0100 3A 0015        [13] 00038         lda     S0
     0103 6F              [5] 00039         mov     L,A
     0104 17              [4] 00040         ral
     0105 9F              [4] 00041         sbb     A
     0106 67              [5] 00042         mov     H,A
                              00043 ;  0 := v W0 -> 1
     0107 22 000D        [16] 00044         shld    W0
    

     

    For the 9900:

     

                              00043 *  1 L v S1
     0052 D020 0037           00044         movb    @S1,R0
     0056 0880                00045         sra     R0,8
                              00046 *  0 := v W0 -> 1
     0058 C800 002E           00047         mov     R0,@W0
    

     

    For the 68000:

     

                                      00009 ;  1 L v S1
     00000400 1038 0421               00010         move.b  S1,D0
     00000404 4880                    00011         ext.w   D0
                                      00012 ;  0 := v W0 -> 1
     00000406 31C0 0418               00013         move.w  D0,W0
    

     

    That almost feels like cheating.

     

    It is interesting that the last four examples have each been ten bytes long.

     

    Real cheating would be the 80386:

     

        movsx   AX,[S1]
        mov     [W0],AX
    

     

    For the AVR:

     

                              00011 ;  1 L v S1
     000060 9160 0116     [2] 00012         lds     R22,S1
     000062 2F76          [1] 00013         mov     R23,R22
     000063 0F77          [1] 00014         lsl     R23
     000064 0B77          [1] 00015         sbc     R23,R23
                              00016 ;  0 := v W0 -> 1
     000065 9360 010D     [2] 00017         sts     W0,R22
     000067 9370 010E     [2] 00018         sts     W0+1,R23
    

     

    And finally, for the 6809:

     

                                      00037 *  1 L v S1
     0029 D6 16                   [4] 00038          ldb    S1
     002B 1D                      [2] 00039          sex
                                      00040 *  0 := v W0 -> 1
     002C DD 0D                   [5] 00041          std    W0
    

     

    which brings up one of my favorite programming jokes:

     

    The Motorola 6809, where SEX is sometimes followed by STD.

     

    Thank you very much.  Drive safely...

    • Like 1
    • Haha 6

  6. This shows just how un-RISC the 9900 is.

     

    Reminds me of programming the 8088 - the specialized "string" instructions were by far the fastest way to do some things.  That advantage eroded away as all of the instructions were made more efficient with each new generation of the architecture.  By the time of the 486, a sequence of simple instructions can beat most of the string instructions.

    • Like 1

  7. Consider the case of

        inc     @I
    
    

     

    The documentation says 10 base cycles, 3 memory accesses and Table A says 8 additional cycles and one additional access for symbolic mode.  The operand is listed as the source instead of the destination.  None of the memory accesses is workspace.


  8. If the size of the block is large enough, it may be worthwhile to try to copy words at a time.

                              00001 *
                              00002 * R0 = source address
                              00003 * R1 = destination address
                              00004 * R2 = count
                              00005 *
     0000                     00006 Move
     0000 0282 0006           00007         ci      R2,6            ; Enough to get clever?
     0004 1404 (000E)         00008         jhe     TryWord
                              00009
     0006                     00010 ByteLoop
     0006 DC70                00011         movb    *R0+,*R1+       ; Copy a byte at a time
     0008 0602                00012         dec     R2              ; More?
     000A 16FD (0006)         00013         jne     ByteLoop
                              00014
     000C                     00015 MoveDone
     000C 045B                00016         b       *R11            ; Return
                              00017
     000E                     00018 TryWord
     000E 0203 0001           00019         li      R3,1            ; Load mask for LSB
                              00020
     0012 C100                00021         mov     R0,R4           ; Are they aligned?
     0014 2901                00022         xor     R1,R4
     0016 2503                00023         czc     R3,R4
     0018 16F6 (0006)         00024         jne     ByteLoop        ; No, do a byte at a time
                              00025
     001A 24C0                00026         czc     R0,R3           ; Word aligned?
     001C 1302 (0022)         00027         jeq     Aligned
                              00028
     001E DC70                00029         movb    *R0+,*R1+       ; Copy the first byte
     0020 0602                00030         dec     R2
                              00031
     0022                     00032 Aligned
     0022 C102                00033         mov     R2,R4           ; Save count
                              00034
     0024 0912                00035         srl     R2,1            ; Convert to number of words
                              00036
     0026                     00037 WordLoop
     0026 CC70                00038         mov     *R0+,*R1+       ; Copy a word
     0028 0602                00039         dec     R2
     002A 16FD (0026)         00040         jne     WordLoop
                              00041
     002C 24C4                00042         czc     R4,R3           ; One byte left over?
     002E 13EE (000C)         00043         jeq     MoveDone        ; No
                              00044
     0030 DC70                00045         movb    *R0+,*R1+       ; Copy the last byte
                              00046
     0032 045B                00047         b       *R11            ; Return
                              00048
                              00049 *
                              00050 * Just to see the overhead of combining bytes into words
                              00051 *
     0034                     00052 NotAligned
     0034 24C0                00053         czc     R0,R3           ; Is source aligned?
     0036 160F (0056)         00054         jne     DestAligned
                              00055
     0038 C102                00056         mov     R2,R4           ; Save count
                              00057
     003A 0912                00058         srl     R2,1            ; Convert to number of words
                              00059
     003C D170                00060         movb    *R0+,R5         ; Get unaligned byte
                              00061
     003E                     00062 SrcLoop
     003E C1B0                00063         mov     *R0+,R6         ; Get next word
                              00064
     0040 06C6                00065         swpb    R6              ; Assemble dest word
     0042 C1C6                00066         mov     R6,R7
     0044 D185                00067         movb    R5,R6
                              00068
     0046 CC46                00069         mov     R6,*R1+         ; Store word
                              00070
     0048 D147                00071         movb    R7,R5           ; Ready for next
                              00072
     004A 0602                00073         dec     R2
     004C 16F8 (003E)         00074         jne     SrcLoop
                              00075
     004E 24C4                00076         czc     R4,R3           ; One byte left over?
     0050 13DD (000C)         00077         jeq     MoveDone        ; No
                              00078
     0052 DC45                00079         movb    R5,*R1+         ; Store the last byte
                              00080
     0054 045B                00081         b       *R11            ; Return
                              00082
     0056                     00083 DestAligned
                              00084 *
                              00085 * Why bother?  The overhead is too bad.
                              00086 *
    


     
    • Like 1

  9. If my calculations are correct, shifting is faster (assuming the workspace is in fast memory and everything else is in slow memory.)

     

     0052 04C0                00049         clr     R0      ; 14 : 10 + 4 (fetch)
     0054 D020 0037           00050         movb    @S1,R0  ; 30 : 14 + 4 (fetch) + 8 + 4
     0058 1502 (005E)         00051         jgt     2f      ; 12/14 : 8/10 + 4 (fetch)
     005A 0260 00FF           00052         ori     R0,>FF  ; 22 : 14 + 2 * 4 (fetch)
     005E                     00053 2
     005E 06C0                00054         swpb    R0      ; 14 : 10 + 4 (fetch)
    

     

    vs

     

     0052 D020 0037           00049         movb    @S1,R0  ; 30 : 14 + 4 (fetch) + 8 + 4
     0056 0880                00050         sra     R0,8    ; 32 : 12 + 4 (fetch) + 16
    


  10. 3 hours ago, apersson850 said:

    The TMS 9900 does a memory access in two cycles. But if it accesses the memory expansion, or rather anything outside the 256 bytes RAM or the 8 K monitor ROM in the console, it accesses that memory byte by byte, adding a wait state for each byte. So instead of two cycles, a memory access is six cycles.

    Thus an assembler can't really know the cycle count, since it doesn't know where the code and the workspace is located. The same software will also behave differently, depending on the machine. My main console has 16-bit wide (two cycle per access) memory for the memory expansion, so it runs faster than a standard TI 99/4A.

    There are also different additions depending on the addressing mode used, but they can be calculated, as they are consistent.

    A simpler way is to assume the workspace is in fast memory and everything else is in slow memory.

     

    Unfortunately, the tables in the manual do not separate workspace accesses from everything else, so I will have to figure those out.


  11. As mentioned in another thread, the following code to add a small constant to a variable:

     

                              00037 * W0 := W0 + 2;
                              00038
                              00039 *   *  0 := v W0 -> 1
                              00040 *   *  1 L r 2
                              00041
                              00042 *      *  2 L v W0 -> 3
                              00043 *      *  3 + c 2
                              00044
                              00045
                              00046 *  1 L r 2
                              00047 *  2 L v W0 -> 3
                              00048 *  3 + c 2
     0052 C020 002E           00049         mov     @W0,R0
     0056 05C0                00050         inct    R0
                              00051 *  0 := v W0 -> 1
     0058 C800 002E           00052         mov     R0,@W0
    

     

    can be optimized to this:

     

                              00037 * W0 := W0 + 2;
                              00038
                              00039 *   *  0 := v W0 -> 1
                              00040 *   *  1 L r 2
                              00041
                              00042 *      *  2 L v W0 -> 3
                              00043 *      *  3 + c 2
                              00044
                              00045
                              00046 *  1 L r 2
                              00047 *  2 L v W0 -> 3
                              00048 *  3 + c 2
     0052 05E0 002E           00049         inct    @W0
                              00050 *  0 := v W0 -> 1
    

     

    This is the code to add two signed bytes resulting in a 16-bit number.  Is there a better way to do sign extension?

     

                              00037 * W0 := S1 + S2;
                              00038
                              00039 *   *  0 := v W0 -> 1
                              00040 *   *  1 L r 2
                              00041
                              00042 *      *  2 L v S1 -> 3
                              00043 *      *  3 + v S2
                              00044
                              00045
                              00046 *  1 L r 2
                              00047 *  2 L v S1 -> 3
                              00048 *  3 + v S2
     0052 04C0                00049         clr     R0
     0054 D020 0037           00050         movb    @S1,R0
     0058 1502 (005E)         00051         jgt     2f
     005A 0260 00FF           00052         ori     R0,>FF
     005E                     00053 2
     005E 06C0                00054         swpb    R0
     0060 04C1                00055         clr     R1
     0062 D060 0038           00056         movb    @S2,R1
     0066 1502 (006C)         00057         jgt     2f
     0068 0261 00FF           00058         ori     R1,>FF
     006C                     00059 2
     006C 06C1                00060         swpb    R1
     006E A001                00061         a       R1,R0
                              00062 *  0 := v W0 -> 1
     0070 C800 002E           00063         mov     R0,@W0
    


  12. 1 hour ago, TheBF said:

    Ya totally sucks doesn't it.  If you are using the TI-99 "O/S" as it were you are best to treat the 256 bytes of 16 bit RAM with kid gloves.  There are about 120 ish bytes at the top that you can use.

    For example in most Forth systems the workspace is at >8300. So at least the primary registers are in fast ram.

    My cross assemblers for most other processors offer an option to display the number of machine cycles each instruction uses.

     

    My 9900 one does not attempt doing so due to the complexity of calculating it and that was not even taking 8-bit delays into account.  It is so helpful that I may still attempt it.

    • Like 1
×
×
  • Create New...