My Final Four Wants for the TI-99/4A

Tursi · September 21, 2019

It's pretty tight, but how about this? If you use word access to your table to process two bytes at a time, you can go from about 248 cycles for two bytes to 178 cycles. Don't even have to change your counts, since we have a DECT.

upload_map_1:
    mov *r1+,r3         ; get two chars from row 1: xxxx11xxxx11    14+4+8  =26
    sla r3,2            ; shift up                : xx1100xx1100    12+4    =16
    soc *r0+,r3         ; merge in chars from row2: xx1122xx1122    14+4+8  =26
    ab r5,r3            ; add scroll offset byte                    14      =14        
    movb r3,*r4         ; send to VDP                               14+4+4+4=26
    swpb r3             ; other byte                                10      =10
    ab r5,r3            ; add scroll offset byte                    14      =14        
    movb r3,*r4         ; send to VDP                               14+4+4+4=26
    dect r2             ; counter                                   10      =10
    jne upload_map_1    ; loop                                      10      =10
                                                                            =178 (for 2)

if R5 can contain the scroll offset in both bytes instead of just one (easy if you are using an add instead of an inc, otherwise probably not worth it), then you can just use a single A R5,R3 instead of the two ABs, and save another 14 cycles per two characters.

Asmusr · September 21, 2019

2 hours ago, Tursi said:

It's pretty tight, but how about this? If you use word access to your table to process two bytes at a time, you can go from about 248 cycles for two bytes to 178 cycles. Don't even have to change your counts, since we have a DECT.


upload_map_1:
    mov *r1+,r3         ; get two chars from row 1: xxxx11xxxx11    14+4+8  =26
    sla r3,2            ; shift up                : xx1100xx1100    12+4    =16
    soc *r0+,r3         ; merge in chars from row2: xx1122xx1122    14+4+8  =26
    ab r5,r3            ; add scroll offset byte                    14      =14        
    movb r3,*r4         ; send to VDP                               14+4+4+4=26
    swpb r3             ; other byte                                10      =10
    ab r5,r3            ; add scroll offset byte                    14      =14        
    movb r3,*r4         ; send to VDP                               14+4+4+4=26
    dect r2             ; counter                                   10      =10
    jne upload_map_1    ; loop                                      10      =10
                                                                            =178 (for 2)

if R5 can contain the scroll offset in both bytes instead of just one (easy if you are using an add instead of an inc, otherwise probably not worth it), then you can just use a single A R5,R3 instead of the two ABs, and save another 14 cycles per two characters.

Thanks, that's fantastic. The time to upload a screen has dropped from 92902 cycles to 59894 (running from scratch pad). Now it will be possible to run the game at 30 FPS instead of 20, which is just what I needed. ;-)

Asmusr · September 21, 2019

This is the code I ended up with:

upload_map_1:
       mov  *r1+,r3                    ; Get 2 bytes from row 1
       sla  r3,2                       ; Shift up
       soc  *r0+,r3                    ; Combine with two bytes from row 2
       a    r5,r3                      ; Add scroll offset
       movb r3,*r4                     ; Send one byte to VDP
       movb @r3lb,*r4                  ; Send the other byte to VDP
       dect r2                         ; counter
       jne  upload_map_1               ; Loop
       rt

It's one cycle faster to access the r3 low byte as a memory address rather than doing a swpb.

Tursi · September 22, 2019

12 hours ago, Asmusr said:

This is the code I ended up with:


upload_map_1:
       mov  *r1+,r3                    ; Get 2 bytes from row 1
       sla  r3,2                       ; Shift up
       soc  *r0+,r3                    ; Combine with two bytes from row 2
       a    r5,r3                      ; Add scroll offset
       movb r3,*r4                     ; Send one byte to VDP
       movb @r3lb,*r4                  ; Send the other byte to VDP
       dect r2                         ; counter
       jne  upload_map_1               ; Loop
       rt

It's one cycle faster to access the r3 low byte as a memory address rather than doing a swpb.

Nice! There are so few exceptions on the TI to the "fewest instructions is fastest code", even when you have to pull in a more complex instruction.

wierd_w · September 22, 2019

hmm.. Printing.

Could you not set up some named pipes on the pi, then shoot your "RS232" (ahem) data there?

https://www.guru99.com/linux-redirection.html

EG, you could use echo, with std-in replaced with the input from the Ti, and redirect it to /dev/tty0, and put it through a real serial port (via a USB dongle).

It would be stuff you have to configure on the Pi, but should be doable. I think they are doing something similar with the functions to get text/data back from web hosts over that bridge.

Virtualized proxy IO should be doable.

Asmusr · September 22, 2019

2 hours ago, Tursi said:

Nice! There are so few exceptions on the TI to the "fewest instructions is fastest code", even when you have to pull in a more complex instruction.

It's even faster if we load r3lb into another register (r6) and replace that instruction with movb *r6,*r4. That's 26 cycles instead of 30.

Sign In

My Final Four Wants for the TI-99/4A

Recommended Posts

Tursi

Link to comment

Share on other sites

Asmusr

Link to comment

Share on other sites

Asmusr

Link to comment

Share on other sites

Tursi

Link to comment

Share on other sites

wierd_w

Link to comment

Share on other sites

Asmusr

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More