Jump to content
IGNORED

Assembly on the 99/4A


matthew180

Recommended Posts

The second listing actually works. It is code I just borrowed to piece it together to try out. But I haven't gotten to vsbr yet or charset.

 

Haven't looked at the subroutines very closely, but on the third of your screenshots, I think you need to move the STOP line from the end of your listing to before the VSBW label. After the block of START code the program continues straight into the VSBW code and the B *R11 at the end of that is sending the program off to somewhere unintended in memory.

Edited by Stuart
  • Like 1
Link to comment
Share on other sites

Yes, that is correct. I do need to move it, and I think I did later in the following code as Matt mentioned putting most definitions towards the bottom out of the way and my stop is supposed to stop there with a limi 2 to allow me to break out via keyboard. That was my intention..

Edited by GDMike
Link to comment
Share on other sites

I found out what my issue was. So excited to find, but I took Matt's VWTR code this time and tried it. It was supposed to change my border color with his VWTR as I used LI R0,>0758 and then the BL vwtr. It failed..But this time I placed his VWTR code right after my LI R0,>0758 and it worked! So what is the difference? B *R11 ?? Hmm so I tried again, this time I did a BL @VWTR but I ended with a RT instead..and it worked..what's wrong with my workspace? I'm using wrksp equ >8300 and R0LB equ > WRKSP+1.. anyway, I tested Matt's VSBR and with the RT at the end I'm so very happy that all these routines work well, even changed the screen color with Matt's VWTR., So happy!! Next I'll try to get the character set loaded..so grateful you guys made this easier by explaining the architecture a little. It helped a lot for me. I even remember Matt saying, the 9918 is using the lower pins of the 9900, so watch out for that data coming back to the CPU as it's not ending up on MSB.

Edited by GDMike
  • Like 2
Link to comment
Share on other sites

Working on real metal can be fun for a little while for nostalgia and all, but you might find it easier and less error prone (and faster) to code on a PC and use an emulator for testing. Once things work how you expect, then try things out on the real metal. When you couple learning assembly with "doing it for real", it might get frustrating (and save you from having to take a lot of photos to post your code).

 

In listing 1 (the first 2 or 3 images) you have this:

LI R0,0
LI R1,>2000
LI R2,768
BL @VMBW

I looks like you want to clear the screen, but VMBW is not the routine you wanted. To help keep things straight I like to say the function, in this case "VDP Multiple Byte Write". If you are familiar with C or other similar programming languages, think of VMBW like a memcpy() between CPU and VDP RAM with a function signature that looks like this:

 

memcpy(byte *vram_dst, byte *ram_src, len)

 

The destination is always a location (memory address, pointer, or whatever you prefer to call it) in VRAM, and the source is always a location in CPU RAM. In your code above, you were essentially copying 768 bytes into VRAM at address >0000 (probably the name table) from whatever data is in CPU RAM starting at address >2000 (which is the low 8K of the 32K RAM expansion and probably contains the E/A linker/loader). That would have filled the screen with garbage, which I think was the result you got.

 

The VSMW (VDP Single Byte Multiple Write) is the routing you probably wanted. This is a routine that takes a single byte passed in the MSB of R1, and writes it starting at the VRAM address specified in R0, for the number of times specified in R2. This routine is a cross between VMBW and VSBW, and takes advantage of the auto-incrementing address pointer in the VDP.

 

The next bit of code in listing 1 is:

LI R0,33
LI R1,42
LI R2,50
BL @VMBW

I'm not exactly sure what this is trying to do, but I suspect you wanted to write 50 asterisks starting on the second line on the screen? This is close, but again, using VMBW means you were copying memory from CPU RAM starting at address 42 (which is ROM in the 99/4A) and copying 50 bytes to the screen. This would have resulted in more garbage. Again, I think you were looking for the functionality provided by VSMW, however, the value in the MSB of R2 is >00, so you would not have seen the asterisk. Your loading of R2 needed to be:

LI R2,42*256  * Move 42 to the MSB of R2, let the assembler to the math
LI R2,10752   * Move 42 to the MSB or R2, already did the math
LI R2,>2A00   * Move 42 (>2A hex) to the MSB of R2

Remember, the registers are 16-bit, and when reading or writing to the VDP the MSB is the byte that is always transferred (the VDP is s a byte-oriented device and the 9900 is a 16-bit CPU).

 

The final code in listing 1:

LI  R0,33
CLR R1
BL  @VSBR
LI  R0,300
BL  @VSBW

This is reading a byte from VRAM address 33 (one of the asterisks you expected to have written previously) and write it to screen location 300 (somewhere in the middle of line 9 on the screen). This code would have worked, but with the garbage on the screen it was probably not obvious. Also, the "CLR R1" is unnecessary in this specific example. The VSBR will overwrite whatever is in the MSB or R1, and the VSBW writes the MSB of R1.

 

After that the program would just continue on executing the subroutines, since the "STOP" code was in the wrong place (this was already mentioned).

 

Listing 2:

    LI  R0,0
    LI  R1,>2000
    LI  R2,768
CLS BL  @VSBW
    INC R0
    CI  R0,768
    JLT CLS

The first 4 lines set up to clear the screen, but then you only write a single byte with VSBW. Using VSMW would have given the expected results. After that the code looks like an adaptation from something in the Lottrup book. ;-) You have basically implemented your own screen clearing routine the long, slow way. However, I suspect this produced the expected results, albeit with more code and effort that necessary (learning is about making mistakes, I am only explaining what I'm seeing, I am not trying to criticize, please don't take all of this the wrong way).

 

It looks like the commented out code is again trying to write some asterisks to the screen. Similar to the screen clearing code above, you have set up to use VSMW but then implemented your own discrete loop, which is not necessary to achieve what you are trying to do.

 

The last bit of code in listing 2:

LI R0,36
LI R1,T1
LI R2,25
BL @VMBW

STOP LIMI 2
     LIMI 0
     JMP  STOP

T1 TEXT 'THIS IS A TEST FOR VMBWS'

I would expect this to work, and this is the correct use of VMBW. The only problem is the text is 24 bytes, not 25. I'm not sure if you were accounting for a terminator perhaps, if so don't conflate "TEXT" with "strings" in languages like C/C++, etc. there is no "null terminator" here (although you can implement such a thing if you want).

  • Like 2
Link to comment
Share on other sites

Maybe this will help clarify using the VDP subroutines from writing your own loops. Here are two minimal self contained examples that clear the screen. The first one is brute force the whole way, and does not use any subroutines. It sets up the VDP and writes the bytes directly using a loop. The second example is also complete, but writing to the VDP has been moved to a more general purpose subroutine.

 

Listing 1, no subroutines:

       DEF  MAIN            // Entry point

VDPWD  EQU  >8C00           // VDP write data
VDPWA  EQU  >8C02           // VDP set read/write address

WRKSP  EQU  >8300           // Use fast 16-bit RAM for registers
R0LB   EQU  WRKSP+1         // Convenience reference to the LSB of R0

MAIN   LIMI 0               // Turn off interrupts
       LWPI WRKSP           // Set the register workspace

       LI   R0,0            // Set to write to VDP address >0000
       LI   R1,>2000        // Set to write >20 (char 32) to VDP RAM
       LI   R2,768          // Going to count 768 writes

*      Set the VDPs internal address register to point to the value
*      loaded in R0.
       MOVB @R0LB,@VDPWA    // Send low byte of VDP RAM write address
       ORI  R0,>4000        // Set bits to tell VDP to set a write address
       MOVB R0,@VDPWA       // Send high byte of VDP RAM write address

*      Now write the MSB of R1 to the VDP (VRAM).  Since the VDP will
*      increment its internal address every time it is written to, just
*      keep writing the same byte until 768 bytes have been sent.
CLS
       MOVB R1,@VDPWD       // Write MSB in R1 to VDP RAM
       DEC  R2              // Dec counter, CPU automatically compares to 0
       JNE  CLS             // If R2 is not zero, loop

       LIMI 2               // Enable interrupts
STOP   JMP  STOP

Listing 2, writing to the VDP moved to a subroutine:

       DEF  MAIN            // Entry point

VDPWD  EQU  >8C00           // VDP write data
VDPWA  EQU  >8C02           // VDP set read/write address

WRKSP  EQU  >8300           // Use fast 16-bit RAM for registers
R0LB   EQU  WRKSP+1         // Convenience reference to the LSB of R0

MAIN   LIMI 0               // Turn off interrupts
       LWPI WRKSP           // Set the register workspace

       LI   R0,0            // Set to write to VDP address >0000
       LI   R1,>2000        // Set to write >20 (char 32) to VDP RAM
       LI   R2,768          // Going to count 768 writes
       BL   @VSMW           // Clear the screen

       LIMI 2               // Enable interrupts
STOP   JMP  STOP

*********************************************************************
*
* VDP Single Byte Multiple Write
*
* R0   dst: write-to address in VDP RAM
* R1   src: MSB of R1 will be written to VDP RAM
* R2   len: number of times to write the MSB byte of R1 to VDP RAM
*
* R0 is modified to a 14-bit value
* R2 is changed to 0
*
VSMW
       MOVB @R0LB,@VDPWA    // Send low byte of VDP RAM write address
       ORI  R0,>4000        // Set read/write bits 14 and 15 to write (01)
       MOVB R0,@VDPWA       // Send high byte of VDP RAM write address
VSMWLP
       MOVB R1,@VDPWD       // Write byte to VDP RAM
       DEC  R2              // Byte counter
       JNE  VSMWLP          // Check if done
       ANDI R0,>3FFF        // Clear R0 top two MSbits
       B    *R11
*// VSMW
  • Like 1
Link to comment
Share on other sites

The write/read code I guess I'll call it listed above is fantastic!! I also borrowed someone's keyboard routine that I found very useful from post 3 or 4 and I am currently in the process of making some timer for the key autorepeat process since now as a test I wrote the byte from the keyscan and it fills the screen with all A's for example if I were to hit A. But I think I can handle that, maybe! Lol. But that keyscan places >FF if no key press, or either an >80 or >82 or >84 depending on if a Ctrl key or shift key were pressed into register 1 along with the key. And I have a question, how can I tell how much space I have left in ram for my program?

Edited by GDMike
  • Like 1
Link to comment
Share on other sites

Typically you do a program listing when you compile and check the address to determine space left.

 

With the 32k expansion, programs usually load into the 24k block first. If you are writing without relying on REF and rolling your own utilities, you can either use AORG to relocate part of your code into the lower 8k block. Or use the lower 8k as data and variable storage.

  • Like 1
Link to comment
Share on other sites

Typically you do a program listing when you compile and check the address to determine space left.

You’re talking about doing this with the EA cart, option 2, I believe?

 

“With the 32k expansion, programs usually load into the 24k block first. If you are writing without relying on REF and rolling your own utilities, you can either use AORG to relocate part of your code into the lower 8k block. Or use the lower 8k as data and variable storage.”

 

In this case I’ve been doing everything without REF, rolling my own. Will need to research AORG which I’ve only used with mini-memory.

 

 

 

 

Sent from my iPhone using Tapatalk Pro

Edited by Airshack
Link to comment
Share on other sites

You’re talking about doing this with the EA cart, option 2, I believe?

 

“With the 32k expansion, programs usually load into the 24k block first. If you are writing without relying on REF and rolling your own utilities, you can either use AORG to relocate part of your code into the lower 8k block. Or use the lower 8k as data and variable storage.”

 

In this case I’ve been doing everything without REF, rolling my own. Will need to research AORG which I’ve only used with mini-memory.

 

 

 

 

Sent from my iPhone using Tapatalk Pro

 

Yes, most 9900 assemblers can produce a listing when you compile.

 

With E/A Option #5 binaries, your code is always loaded to fixed points in memory. So you can use AORG to locate code into the lower 8K, such as utilities for example.

 

I often find the best thing to do is use the lower 8K as data-space; then you don't need to AORG anything and you can just use EQU labels for your variables in the main program.

 

With SAMS programming, things get more complicated. At some point after my CRPG is complete I'll be sharing the GitHub repository for the source code and publishing some blog posts on how the architecture is both loaded and set up.

  • Like 1
Link to comment
Share on other sites

 

So you can use AORG to locate code into the lower 8K, such as utilities for example.

 

I often find the best thing to do is use the lower 8K as data-space; then you don't need to AORG anything and you can just use EQU labels for your variables in the main program.

So it’s advisable to store my game maps, character and sprite data, all of that into the lower 8K? It’s not immediately obvious why this is a preferred method. More research needed...

 

I need to get a better handle on the whole EA3 vs EA5 thing as well.

 

 

Sent from my iPhone using Tapatalk Pro

Link to comment
Share on other sites

So it’s advisable to store my game maps, character and sprite data, all of that into the lower 8K? It’s not immediately obvious why this is a preferred method. More research needed...

 

I need to get a better handle on the whole EA3 vs EA5 thing as well.

 

 

Sent from my iPhone using Tapatalk Pro

 

EA3 and EA5 are important things to understand.

 

EA3 means "code is relocatable anywhere in memory". It maintains a relative address structure so that you could load it here or there easily. So you sacrifice time to load in return for flexibility. Also, the EA3 loader includes most of the EA utilities like VMBW, DSRLNK, and so forth into the lower 8K RAM automatically so they can be REF'ed at any time. Many of my tools for my CRPG like map editors are written and ran in this mode for quick use.

 

EA5 is non-relocatable code, it has been compiled so that every address is specific. It's like a memory dump, honestly. It splits your files into 8K segments with a six byte header indicating where in memory to load, if any segments are remaining, and how big each segment is. If you used AORG to create a separate lower 8K part, for example, it would create a separate file just to load it. EA5 code loads MUCH faster since it's not trying to determine relative addresses.

 

When I say "use the lower 8K for data" I mean something like buffers. For example, if you were loading a game map from disk into memory, having a buffer in the lower 8K makes sense if most of your code is in the upper 24k. Static data like graphics that's part of the game doesn't necessarily need to be there. Unlike the IBM-PC module design, the TI can do data or code anywhere, it's pretty much on the programmer to decide where he wants it.

 

On another note, storing data like character graphics that are used only once are best offloaded to a disk file and then loaded by your main game program.

  • Like 3
Link to comment
Share on other sites

 

EA3 and EA5 are important things to understand.

 

EA3 means "code is relocatable anywhere in memory". It maintains a relative address structure so that you could load it here or there easily. So you sacrifice time to load in return for flexibility. Also, the EA3 loader includes most of the EA utilities like VMBW, DSRLNK, and so forth into the lower 8K RAM automatically so they can be REF'ed at any time. Many of my tools for my CRPG like map editors are written and ran in this mode for quick use.

 

EA5 is non-relocatable code, it has been compiled so that every address is specific. It's like a memory dump, honestly. It splits your files into 8K segments with a six byte header indicating where in memory to load, if any segments are remaining, and how big each segment is. If you used AORG to create a separate lower 8K part, for example, it would create a separate file just to load it. EA5 code loads MUCH faster since it's not trying to determine relative addresses.

 

When I say "use the lower 8K for data" I mean something like buffers. For example, if you were loading a game map from disk into memory, having a buffer in the lower 8K makes sense if most of your code is in the upper 24k. Static data like graphics that's part of the game doesn't necessarily need to be there. Unlike the IBM-PC module design, the TI can do data or code anywhere, it's pretty much on the programmer to decide where he wants it.

 

On another note, storing data like character graphics that are used only once are best offloaded to a disk file and then loaded by your main game program.

 

Best explanation of EA3 versus EA5 I have heard :thumbsup:

  • Like 2
Link to comment
Share on other sites

Adding a note that with EA3 you can if you wish use the AORG directive in your code so that it loads at a specific address, rather than your code being relocateable and letting the EA3 loader decide where to load it.

 

Correct! That being said, if you're using AORG in relocatable code, the question arises "Why not just make it EA5 then?"

 

Relocatable code was the future, of course, with modern computers everything is relocatable anywhere. It's an interesting design that I haven't seen on other 8-bit computers at the time, having both fixed and relocatable code options in assembly. Possibly a reflection of TI's adaptation of mainframe style design? The fact the assembler has support for segments which to my knowledge no one has ever used is a sign of that as well.

Link to comment
Share on other sites

>> Correct! That being said, if you're using AORG in relocatable code, the question arises "Why not just make it EA5 then?" <<

I do go as far as using compressed EA3 sometimes. ;-) ;-)

>> Relocatable code was the future, of course, with modern computers everything is relocatable anywhere. It's an interesting design that I haven't seen on other 8-bit computers at the time, having both fixed and relocatable code options in assembly. Possibly a reflection of TI's adaptation of mainframe style design? The fact the assembler has support for segments which to my knowledge no one has ever used is a sign of that as well. <<

The TI-99 assembler is a direct descendent of the assemblers used on their Model 990 minicomputers. So it is pretty full featured.

Link to comment
Share on other sites

>> Correct! That being said, if you're using AORG in relocatable code, the question arises "Why not just make it EA5 then?" <<

 

I do go as far as using compressed EA3 sometimes. ;-) ;-)

 

>> Relocatable code was the future, of course, with modern computers everything is relocatable anywhere. It's an interesting design that I haven't seen on other 8-bit computers at the time, having both fixed and relocatable code options in assembly. Possibly a reflection of TI's adaptation of mainframe style design? The fact the assembler has support for segments which to my knowledge no one has ever used is a sign of that as well. <<

 

The TI-99 assembler is a direct descendent of the assemblers used on their Model 990 minicomputers. So it is pretty full featured.

 

I love how the assembler even has 990 instructions that don't exist on the 9900 like LDS and LIIM. (search in this forum for their use on 99110.)

 

I am continually amazed that this "hobby" continues to yield up more knowledge. (In my case, 80s hobby became career.)

  • Like 2
Link to comment
Share on other sites

Let’s say I’d like my current project to run off of a system with Matt’s 32k and a FinalGROM as the minimum system configuration — no disk system needed.

 

How would this impact which way to go with EA3 v EA5?

 

Currently my project seems to load quite quickly (compared to previous BASIC projects) as well as run fast.

 

I get that EA5 loads quicker than EA3, and I understand why now — thanks!

 

Once loaded...is there any performance advantage with one vs the other?

 

Note: My only previous experience with getting code to run off of FinalGROM was using Compiled XB256, then creating a binary with Fred’s Module Creator Tool.

Edited by Airshack
Link to comment
Share on other sites

Once the program is in memory, there's no difference.

The main advantage of the linking loader is that it allows the declaration of external DEFinitions and REFerences. You can for example build a software library with functions you need frequently. Just like TI have done, when they provide REFerences to VMBW and DSRLNK, you can make you own such routines.

 

It doesn't matter what you need. You can make file access routines, screen formatting, array sorting or other stuff.

You can then let the EA3 option first load your file FILEUTILS, then your own application. In your program, you use REF FLOPEN, REF FLCLOS, REF FLWRIT, REF FLREAD and whatever you have DEFined in the FILEUTILS library. FILEUTILS can be used by different programs, even different programs that are loaded at the same time, if that makes sense. Since they are relocatable, the linking loader will stuff them where there's room, and will link all REFerences with the DEFined locations.

 

This provides the foundation for a system of assembly language utilities and programs, where you reduce the work with writing code, since you can re-use what you've done multiple times. Today, nobody does this, but at the time when TI developed their TI 990 minicomputer series, it was a viable concept, to get the required performance.

 

The memory image loader, with its creating utility, is instead used to create and load one-of-a-kind programs. They aren't intended to be part of a system of cooperating code segments, but instead solve one single task, and do that quickly.

  • Like 1
Link to comment
Share on other sites

Once the program is in memory, there's no difference.

The main advantage of the linking loader is that it allows the declaration of external DEFinitions and REFerences. You can for example build a software library with functions you need frequently. Just like TI have done, when they provide REFerences to VMBW and DSRLNK, you can make you own such routines.

 

It doesn't matter what you need. You can make file access routines, screen formatting, array sorting or other stuff.

You can then let the EA3 option first load your file FILEUTILS, then your own application. In your program, you use REF FLOPEN, REF FLCLOS, REF FLWRIT, REF FLREAD and whatever you have DEFined in the FILEUTILS library. FILEUTILS can be used by different programs, even different programs that are loaded at the same time, if that makes sense. Since they are relocatable, the linking loader will stuff them where there's room, and will link all REFerences with the DEFined locations.

 

This provides the foundation for a system of assembly language utilities and programs, where you reduce the work with writing code, since you can re-use what you've done multiple times. Today, nobody does this, but at the time when TI developed their TI 990 minicomputer series, it was a viable concept, to get the required performance.

 

The memory image loader, with its creating utility, is instead used to create and load one-of-a-kind programs. They aren't intended to be part of a system of cooperating code segments, but instead solve one single task, and do that quickly.

 

Exactly right. Which is why most games are memory images, you want them to load quickly.

 

The truth is, there's almost no reason NOT to convert an E/A 3 program to E/A 5 except it's part of some ongoing project where you may be adding or removing modules frequently. So a living application for a business for example. For a finished product burned to a diskette, it just doesn't make any sense to keep it in that form.

 

Now that's for the standard 32K TI-99/4a system, mind you. SAMS has considerably more memory, and could use a relocatable approach, especially if it was developed on a platform. That said, I wince at the thought at how long it would take to load any programs; the basic DSR routines on the TI are pretty good but their limitation of requiring the VDP memory as a file buffer is a very limiting factor for time. My own CRPG loads 8K segments to get the entire program into memory and takes about a minute on hardware.

 

I love the fact the Miller EEPROMS for the Corcomp disk controller gives you the ability to do directly to CPU memory loads, but unfortunately it's not a common standard available on TI's. So I can't use it. :/ Not even classic99 has it available.

  • Like 1
Link to comment
Share on other sites

 

Exactly right. Which is why most games are memory images, you want them to load quickly.

 

The truth is, there's almost no reason NOT to convert an E/A 3 program to E/A 5 except it's part of some ongoing project where you may be adding or removing modules frequently. So a living application for a business for example. For a finished product burned to a diskette, it just doesn't make any sense to keep it in that form.

 

Now that's for the standard 32K TI-99/4a system, mind you. SAMS has considerably more memory, and could use a relocatable approach, especially if it was developed on a platform. That said, I wince at the thought at how long it would take to load any programs; the basic DSR routines on the TI are pretty good but their limitation of requiring the VDP memory as a file buffer is a very limiting factor for time. My own CRPG loads 8K segments to get the entire program into memory and takes about a minute on hardware.

 

I love the fact the Miller EEPROMS for the Corcomp disk controller gives you the ability to do directly to CPU memory loads, but unfortunately it's not a common standard available on TI's. So I can't use it. :/ Not even classic99 has it available.

 

RXB 2019 has a built in EA5 Program Image loader, that loads ANY SAMS PAGE at any memory address in 32K.

 

Of course older Floppy drives would limit you're amount you could load and speed.

 

But Hard Drives, RAMDISK and something like Classic99 would be super fast.

 

Now there is a limit to the speed is RXB 2019 only loads 4K pages of Program Image files.

Edited by RXB
  • Like 1
Link to comment
Share on other sites

Let’s say I’d like my current project to run off of a system with Matt’s 32k and a FinalGROM as the minimum system configuration — no disk system needed.

 

How would this impact which way to go with EA3 v EA5?

 

Currently my project seems to load quite quickly (compared to previous BASIC projects) as well as run fast.

 

I get that EA5 loads quicker than EA3, and I understand why now — thanks!

 

Once loaded...is there any performance advantage with one vs the other?

 

Note: My only previous experience with getting code to run off of FinalGROM was using Compiled XB256, then creating a binary with Fred’s Module Creator Tool.

 

You don't have to decide to go with one or the other. Initially you might want to use EA3 because converting to EA5 is often an extra step (depending on the assembler). If you want to end up with a cartridge you need to convert to EA5 at some point anyway, but as long as loading time is not an issue EA3 is fine for development.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...