GDMike Posted September 12, 2020 Share Posted September 12, 2020 I'm just sharing something that I'm studying, as I think I can find my issue by reading.. ---------------------------------------- AAAA M M SSSSS A A MM MM S A A M MM M S AAAAAA M M SSSS A A M M S A A M M S A A M M SSSSS ---------------------------------------- Programmer's Documentation Documention: 1/19/93 Joe Delekto Note: This documentation covers the 128k AMS system only. After the AEMS is released, documentation will be available. ---------------------------------------- The AMS expanded memory card is a unique piece of hardware, in that mapping is simple, and lends itself well to overlay-structured programming. Because of the AMS design, no memory manager is necessary. The card itself uses a 17 bit address bus (18 for 512k) in order to access the SRAM on the card. The upper 4 bits of a standard 16 bit address are used to select one of 16 mapper registers. The remaining 12 bits from the address bus are combined with 12 bits taken from the mapper register, in order to give a maximum address bus of 24 bits in width. This will allow for a maximum of 16MB to be accessed. (AEMS) On the 128k card, only the 5 least significant bits (6 for 512k) are used from the mapper register. The other pins from the mapper output are unconnected. "Mapping" is accomplished by changing the value in the mapper register, to point to one of 32 pages (64 on 512k). No reading/writing or any transfer of memory is involved with mapping. All that is changed is the pointer to the RAM chip on the address bus. Because of this, mapping can be done in a few clock cycles, using only a couple of instructions. Programs which take advantage of the AMS can be extremely large, with about no change in execution time. As with any memory expansion, there are limitations on overlay sizes. It is recommended for the AMS system, that the root segment of your program be placed in low memory. (>2000 - >3FFF) Code overlays, from 4k to 24k in size, (in 4k increments) can be mapped in within the upper 24k of memory. This means that an 8k root segment can call as many (up to 24k) overlays as necessary. The result is a HUGE program, with structure and modularity. (Most desired in the programming field!) This document will describe how mapping works, as well as the AMS resident library created by Art Green and myself. I will also go into some detail as to how you can use Charles Earl's Hot Bug debugger to debug AMS code. Before I get to the meat of this document, I would like to explain why we chose our method of memory expansion. We chose 4k pages for two reasons. First, it made the hardware design simple, and made utility routines short and fast. Second of all, since our system uses overlay methods, many 4k overlays can reside in memory at once. The overlays can be any size from 4k to 24k in length, falling on 4k page boundaries. The larger the overlay is, the less overlays you can have. You can have 4k, 8k, 12k, 16k, or 24k overlays, and overlay size determines how many overlays you can have. (i.e. Six 4k, three 8k, etc.) You are NOT required to use only one size of overlay. For instance, you could have one 12k overlay, and an 8k overlay, along with a 4k overlay. (24k total). Keep in mind that most subroutines fall well under 4k of space! This means that MANY subroutines can be placed within just one 4k overlay! We believe you will find this to be one of the most flexible memory expansion systems, ever to be designed for the TI-99/4A. Many interesting applications, besides large programs, can be developed. We are making all information on the use of the memory available, so that programmers can make full use of its abilities. ---------------------------------------- PART ONE: Map Modes and Registers ================================== Because the AMS has a 17 bit address bus, (18 for 512k) and the TI-99/4A only has a 16-bit address bus, the extra bit(s) need to come from somewhere. These extra bits are taken from the mapper registers. 4 bits are taken from the memory address on the 9900 bus, and used to select one of 16 mapper registers. The remaining 12 bits from the 9900 bus are combined with the 5 bits from the mapper register, to make a new bus with 17 bits. The actual paging process is done by changing the values in the mapper registers, to point to new pages in memory. Here is the address diagram: * From 9900 Address bus: >A000 |+-- || |+---> 12 MSB to 12 LSB of new address | +----> 4 MSB to mapper register select * From mapper 5 bit output forms 5 MSB of new address A15 - Mapper Register Select A14 - " " " " " " A13 - " " " " " " A12 - " " " " " " Mapper Address Bus A11 - MA11 A10 - MA10 A09 - MA09 A08 - MA08 A07 - MA07 A06 - MA06 A05 - MA05 A04 - MA04 A03 - MA03 A02 - MA02 A01 - MA01 A00 - MA00 * On Mapper Address bus: +---+ A15 -| M |- MA16 A14 -| a |- MA15 A13 -| p |- MA14 A12 -| p |- MA13 | e |- MA12 | r | +---+ Since only the >A000 - >F000 range inside memory is mapped, register >A through >F are used in the mapper. Note the mapper register number corresponds with the 4 MSB of the address being accessed. Once the mapper register is loaded with a page number, (5 bit address which forms 5 MSB on AMS bus) any read or write to the 4k block it corresponds to, will access that 4k memory page. For example: I load mapper register 10 (>A) with page number >15. Any time I read/write or access the >A000 block, I will be writing to the 4k page >15. If I were to load mapper register 10 with >15, then perform a CLR @>A02E, I will actually be writing (from the AMS address bus) the address: >0001502E. Notice how the page/offset are combined on the AMS bus to get a 17 bit address (show here as 32 bits for clarity). It will be worthwile to note, that even though the address was >A000, the >A had no influence whatsoever on the new address. The primary purpose of the >A was to select which mapper register the 5 bits will come from. The same holds true for the entire >A000 - >FFFF range. Because >A - >F are used to select the mapper registers, we have 6 registers to use, and 6 4k boundaries. To load consecutive addresses, just load the consecutive page numbers into the mapper registers. The AMS system works in two modes. map mode, and pass mode. Power-up places AMS into pass mode. When the mode is in pass, AMS acts as a plain 32k card, and the mapper passes actual addresses used to the AMS bus. (i.e. a CLR @>A000 will clear the memory location at >A000). There is no difference between pass mode, and standard 32k mode. The second mode is map mode. Map mode is the mode which uses the MSB of an address to select the mapper register, then dump the register's contents to the AMS bus. Note, when map mode is enabled, it would be a good idea to initialize the mapper registers to known values! The CRU address for AMS is >1E00. In order to use mapping, some CRU instructions are required to: 1) Switch AMS between map/pass modes, and 2) enable/disable register read/writes. Below is the code which changes map modes for AMS. LI R12,>1E00 * AMS CRU Address SBO 1 * Enable Mapping SBZ 1 * Disable Mapping This is the only code required to switch back between modes. Only 2 instructions are necessary. In order to access the mapper registers, CRU bit 0 must be set. When it is, DSR space is temporarily disable, so that writing in the >4000 space will set a map register. When CRU bit 0 is set to zero, original DSR space is recovered, with no side effects. It is recommended that you enable mapper registers, write their values, and then disable them immediately. The reason being that forgetting to disable the registers will keep you from accessing and DSR routines. To enable and disable registers, use the following code: LI R12,>1E00 * AMS CRU address SBO 0 * Enable Registers . . * Set registers here . SBZ 0 * Disable Registers In order to load a mapper register with a page number, all you need to do is write to the >4000 block. To determine which mapper register you wish to change, use the follow calculation: MRAD = 2 * Register# + >4000 So that to clear mapper register 10, you would use: 2 * >A + >4000 = >4014 CLR @>4014 * Clear Register 10 Note, you can also read from a mapper register, for the purpose of saving previous page values: SPAGE BSS 2 * Hold Page # EXMPL MOV @>4014,@SPAGE * Get MR10 CLR @>4014 * Clear MR10 . . . MOV @SPAGE,@>4014 * Restore Page RT Because writing to the mapper registers is just writing to an address, indirect addressing could be used as well. For example, Consider setting up the mapper so that when in map mode, addresses are the same as in pass mode. It is always a good idea to first set up the mapper registers, and then go into map mode. While in map mode, registers can be changed at will to point elsewhere. If your code to do mapping resides in upper memory, take care NOT to change the register where your code is executing. Pointing to some other place in memory will continue execution on the new page, causing undesired, or unknown results. It is possible though, to point to a new page where code is executing, provided that valid code exists at the current offset and new page. Below is an example which sets up the mapper registers as normal 32k pass mode, yet places the mapper into map mode. The registers can be changed later to access other pages of memory. PAGES DATA >0A00,>0B00,>0C00 DATA >0D00,>0E00,>0F00 START LI R12,>1E00 * AMS CRU LI R1,PAGES * Page Table LI R2,>4014 * Start at MR >A LI R3,6 * 6 Pages to set SBO 0 * Enable MR's RSET MOV *1+,*2+ * Write to MR DEC R3 * Dec counter JNE RSET * Continue SBZ 0 * Disable MR's SBO 1 * Enable map mode END IMPORTANT: Note that the 5 bit page value is placed in the most significant byte of the mapper register. Because only 5 bits are used in a map register, and because the 2 cycle read/write on the data bus loads the most significant byte last, the mapper is loaded with this value. Therefore, page >18 would be >1800, page >05 would be >0500, etc. It might be worthwhile that AEMS addresses page numbers normally, since 12 bits are used instead. ---------------------------------------- PART TWO: Overlay Techniques ============================ Because the AMS system is able to map overlays up to 24k in length, on 4k boundaries, it lends itself well to program development using overlays. First, a root segment is established, which will contain the code to call an overlay. The root segment must remain in memory (without using tricky code to map it out), and will contain the routine used to call an overlay subprogram. We recommend the following: 1) Place the root segment into low memory (8k). 2) Make all overlay calls BLWP routines. Below is the stub code for both the root segment overlay manager, which is used to handle the simulation of a BLWP vector for a mapped environment. **************************************** * Overlay Manager * Version 1.0 * R.A.Green OVMGR SBO 0 * Enable map regs MOV *R11+,R10 * Get N # pages MOV *R11+,R9 * Get 1st map reg MOV *R11+,R7 * Get 1st page # OMGR2 MOV R7,*R9+ * Set mapper reg AI 7,>0100 * Add 1 to page # DEC R10 * Loop for N pages JGT OMGR2 * Finish loop SBZ 0 * Disable map regs MOV *R11,*R11 * Get real BLWP vec MOV *R11+,R7 * Get WSP MOV *11,R9 * Get sub address MOV R13,@26(R7) * Simulate BLWP MOV R14,@28(R7) * MOV R15,@30(R7) * OMGRW EQU $-12 * OVMGR workspace LWPI 0 * R6,R7 user wrkspc B @0 * R8,R9 call sub BSS 2 * R10 BSS 2 * R11 DATA >1E00 * R12 AMS CRU addr BSS 6 * R13 - R15 **************************************** Below is the code which replaces the original BLWP call in the root segment. This is done for every subroutine that generates an overlay. * Overlay Call * Version 1.0 * R.A.Green * BLWP @OSUB OSUB DATA OMGRW * Manager WSP DATA $+2 * BL @OVMGR * Use overlay manager DATA N * # Pages in overlay DATA >40xx * 1st mapper reg addr DATA n * 1st page number DATA sub * REAL BLWP vector **************************************** In order to generate a call for an overlayed subroutine, the real BLWP call must be replaced by the OSUB information. Now, the overlay generator needs to know: 1) How many pages the overlay is. Remember that it can be 4k-24k in length, broken into 4k pages. The routine needs to know what mapper registers to start mapping the 'N' pages in at. The first page number the overlay resides on is also given, along with the ACTUAL BLWP vector address for the overlayed routine. To illustrate this, let us say our root segment is in the >2000 - >3FFF block. We have created an overlay, and inside the overlay is a routine called INPUT, for which the BLWP vector starts at >C2E0. The overlay is in page >18 of memory, and is only 4k, or 1 page in length. To call the overlay, we would use the following code: BLWP @OSUB1 * Call overlay stub DATA >0001 * 1 page long (4k block) DATA >4018 * >C000 block DATA >1800 * 1st page # (only one) DATA >C2E0 * BLWP Vector Note, it would be very useful to have a program loader to load segments of code into different pages. Although such a loader exists for AMS, it is only used for AMS files with special headers for overlay and root segments. A similair loader can be constructed, which loads the overlays into their corresponding pages. The overlay code examples above, are the code segments installed (automatically) by the linker. That eliminates the need for passing the arguments to the overlay generator, and keeping track of relative page addresses. You may however, choose your own method of overlaying. We made it very flexible to customize your software so you can choose how you want to map. Keep in mind that other programs may be resident to AMS, and using the linker/loader will ensure that AMS programs are page relocatable, and won't overwrite memory resident code. ---------------------------------------- PART THREE: Using Hot Bug with AMS ================================== Most often overlay programs are tedious to debug. If you have access to Charles Earl's Hot Bug debugger, I recommend you learn how to use it. It is by far one of the best debugging utilities available, and can certainly work well for debugging AMS programs. Since Hot Bug will also load program files, you can use the debugger to change the page map, and load in your overlay code! Hot Bug Command Summary ER - Edit Register EW - Edit Word DM - Display Memory SPC - Set Program Counter G - Go (# of steps) In order to access the registers, and check the code/data within pages, we need to enable both the mapper, and the registers. Choose a word of memory that does not have code to use the following commands: (For this example, I use >3FF0) 1: ER 12 1E00 2: EW 3FF0 1D00 3: EW 3FF2 1D01 4: SPC 3FF0 5: G 2 (1: Load Register 12 with CRU >1E00) (2: Put SBO 0 at 3FF0 Enable REGS ) (3: Put SBO 1 at 3FF2 MAP Mode ) (4: Set program counter to >3FF0 ) (5: Execute 2 instructions ) NOTE: If the mapper is in an unknown state, (register values unknown), you will want to set the registers before actually placing into map mode. Just use G 1 instead, edit the registers (see below) and then G 1 again to get into map mode. To read/write to the mapper registers, use the DM command to look at the >4000 block. Only addresses >4000 - >4020 are of interest to us. (Mapper regs 0 - 16). NOTE: Even though the upper 24k is mapped using mapper registers 10 - 16, the other mapper registers can be used for temporary storage. 1: DM 4000 (1: Display Memory at >4000) In order to change a register value, just use the EW command. For example, to load mapper register 11, (>B000 block) with page >15, use the following: 1: EW 4016 1500 (1: Load mapper register 11 with >15) Now let's try an experiment. What we will do is write the same page to 2 different mapper registers, and observe what happens. Use the following commands: 1: EW 4014 1500 2: EW 4016 1500 3: DM A000 (1: Load mapper register 10 with >15) (2: Load mapper register 11 with >15) (3: Display memory at >A000 ) Note what the data in memory is at >A000. Now, if you use DM B000, you should see the same data you saw before. Let's try something interesting. Use the following commands: 1: EW A000 FACE 2: DM B000 (1: Put value >FACE into >A000) (2: Display memory at >B000 ) When you use the DM B000, you should get a surprise. When you wrote to A000, you actually changed the word at B000, as well as A000. Why? Because both 4k block point to the same page! Perhaps now you can envision some of the interesting tricks you can accomplish with the AMS system. One such application is the arbitrary locating of data buffers! It is also possible to load a memory image file, on non-consecutive pages, and yet still load the mapper registers such that the program is contiguous in the upper 24k! If that's so, then it means we can load E/A option 5 program files anywhere inside AMS, and then just map in their pages to the proper blocks in high memory! In this manner, even code with absolute origins becomes page relocatable, at least for paging purposes. By placing page numbers into the registers, and using Hot Bug's LOAD command, you can load overlay image files. Keep track of the address for the BLWP vector in the overlay, as well as the page you LOAD it into, and how many pages it takes up. This information you will need to pass to the overlay generator in your program. See? Loading, debugging, and running overlay code on the AMS system is very feasible, and not difficult. ---------------------------------------- This concludes this section of programmer's documentation. The next document will focus on the memory resident utility routines, which AMS programmers can use in their software. Memory allocation, exit code, memory moves, and far VDP read/write routines are available. Also, AMS program have access to the E/A 5, and AMS Overlay program file loader. The loader will load either type of file. The exit routines for AMS have the option of keeping the programmer's code resident for instant execution when desired. We have worked very hard for the past couple of years, to make this memory expansion as user friendly as possible. We are, and will continue to supply support for the AMS card. Without the software support to use AMS, it would just be an expensive paperweight. 4 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted September 12, 2020 Share Posted September 12, 2020 4 hours ago, Lee Stewart said: I just had an epiphany! What @FALCOR4 said earlier about reading SAMS registers and a discussion among [member-'RXB'], [member-'tursi'] and me over in the Classic99 Updates thread led me to the conclusion that my method in fbForth 2.0 of determining the amount of SAMS memory available is flawed! See post #1870 for my thinking. What occurred to me is that checking any, putative “highest” bank of an actual SAMS card will succeed, even for the lowest SAMS (128 KiB, highest page = >001F) expected, leading my code to conclude there is 32 MiB of SAMS available. FYI, my method writes a test value to the mapping window (>E000) and starts paging in “highest” pages of SAMS beginning with 32 MiB’s highest page, >1FFF, to see whether the test value is still there. If it fails, the code checks for SAMS at half that value until it succeeds or fails at 128 KiB, My point is that mapping >1FFF will always map a writable SAMS page for any working SAMS, 128 KiB or higher. If the card only has 128 KiB, mapping page >1FFF will actually map page >001F because the unattached bits are ignored. I will need to get more clever so that I am mapping only pages that I expect or use a clever pattern that will lead me to the correct conclusion in the shortest amount of code—aye, there’s the rub! ( sorry, Will ) ...lee If you write the page number to all the pages you want to test first, e.g. 255, 127, 63, 31, and then check that the values are still there in a second loop, it should work. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 12, 2020 Share Posted September 12, 2020 7 hours ago, Asmusr said: If you write the page number to all the pages you want to test first, e.g. 255, 127, 63, 31, and then check that the values are still there in a second loop, it should work. Could you write then read inside one loop. These new huge cards are getting pretty big for our old 9900. Even now erasing 1Mbyte takes significant time. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 12, 2020 Share Posted September 12, 2020 I should have read this earlier myself GDMike. I ended up re-inventing this after a quite a long road of trial and error. PAGES DATA >0A00,>0B00,>0C00 DATA >0D00,>0E00,>0F00 START LI R12,>1E00 * AMS CRU LI R1,PAGES * Page Table LI R2,>4014 * Start at MR >A LI R3,6 * 6 Pages to set SBO 0 * Enable MR's RSET MOV *1+,*2+ * Write to MR DEC R3 * Dec counter JNE RSET * Continue SBZ 0 * Disable MR's SBO 1 * Enable map mode END 3 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 12, 2020 Share Posted September 12, 2020 (edited) My problem isn't actually how to create a mapped area, but adding banks of that mapped area is giving me fits. I'm not sure the problem isn't my card, as it's worked in the past for me, then stopped during testing, then I asked for you all to chime in, then it worked until I powered off and on the PEB? So I'm really needing source that gives the 240 banks of address >3000-3FFF(>4006 SAMS, SMR3) that I can test with. It seems now, I can create page >10 but if I make page >11 and read it, it is actually a copy or IS page >10 as anything I map and read back just reads as my page 16 data. So I'm back into looking at the docs as I must be missing something?? And I'm having to test and turning off the Peb at each test. Today, even though the address can't be changed, I'll be moving the card to a different port in the Peb, or just reseating may do something too, not sure, but I can rule that out. Thx for that routine, definitely something I can use for initializing the card with. Edited September 12, 2020 by GDMike Quote Link to comment Share on other sites More sharing options...
+FALCOR4 Posted September 12, 2020 Share Posted September 12, 2020 17 hours ago, FALCOR4 said: Correct. The SAMS circuitry is simplistic, and I don't mean to say that is a bad thing. It's not, it just means that not all possible functionality is implemented which would require more ICs and board space. It will put the same page number (repeats) for a 1M segment in both the LSByte and the MSByte from the LS612 when you do a register read (>00 to >FF). The LSByte that is latched (which gives you banks beyond the first 1M) is not connected in such a way that it can be read back. So, you'll only be able to see page numbers for any one particular 1M bank, you won't be able to read back what bank you're in which would be in the LSByte if it were implemented. If needed, the software will just have to keep track of banks. I just put together another 4M board and am doing a burn in right now that should run through the night. When it's done, I'll play with it to verify that what I'm telling you is true or not. I'll report back with what I find. Finished burn in on the second 4M board and it passed with flying colors <yipee>. I did a double check on reading back the register values and it indeed only reads back 1M of pages and not the bank you may be in. Example: LI R12,<1E00 CRU ADDRESS LI R0,>0A01 SET REGISTER VALUE WITH PAGE >0A AND BANK >01 SBO 0 ENABLE WRITING TO REGISTERS MOV R0,@>4014 LOAD REGISTER FOR MEM LOCATION >A000 MOV @>4014,R1 READ BACK REGISTER, WHAT YOU GET IS NOT >0A01 BUT RATHER >0A0A. THE BANK VALUE >01 DOES NOT READ BACK. SBZ 0 TURN OFF ACCESS TO REGISTERS 1 1 Quote Link to comment Share on other sites More sharing options...
+retroclouds Posted September 12, 2020 Share Posted September 12, 2020 2 hours ago, TheBF said: Could you write then read inside one loop. These new huge cards are getting pretty big for our old 9900. Even now erasing 1Mbyte takes significant time. I didn’t even bother to try to erase SAMS before use. As long as your pointers and memory structures are ok, it doesn’t matter what is next to what you have in use. Don’t see a benefit in initializing memory up front or am I missing something? 3 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 12, 2020 Share Posted September 12, 2020 (edited) On 9/12/2020 at 2:28 AM, Asmusr said: If you write the page number to all the pages you want to test first, e.g. 255, 127, 63, 31, and then check that the values are still there in a second loop, it should work. If you mean writing 255 to a spot in page 255, 127 to the same spot in page 127, ..., I agree that it should work. What I have finally contrived is a little faster and less code, I think, but I will set up both ways to be sure. My way (currently): Initialize mapping with pages >2:>2000, >3:>3000, >A:>A000, >B:>B000, >C:>C000, >D:>D000, >E:>E000, >F:>F000 Write test word to >E000. Start at page >000E higher than half of highest SAMS page size (32 MiB): >2000 / 2 + >000E = >100E) Map current SAMS page, >xxxE into >E000. If we get past page >001E, no SAMS. Quit with SAMS flag = 0. Check for test word. If equal, shift left-most set bit right one bit (2nd time, >080E, ...) and go to 4. If not equal, we know SAMS size. Quit with SAMS flag = highest SAMS page (32 MiB:>1FFF, ...,, 128 KiB:>001F). SAMS flag = highest SAMS page available. 0 indicates no SAMS. ...lee Edited September 13, 2020 by Lee Stewart correction 1 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 12, 2020 Share Posted September 12, 2020 1 hour ago, retroclouds said: I didn’t even bother to try to erase SAMS before use. As long as your pointers and memory structures are ok, it doesn’t matter what is next to what you have in use. Don’t see a benefit in initializing memory up front or am I missing something? In ED99 I used a naïve record based data structure for display speed. This is trouble under 2 conditions: 1. When I list the visible page of records if they are filled with random stuff like when you first start the machine it is pretty ugly. 2. If the previous file content is still in the memory and you load a new file, I don't have an end of line marker. I just write the contents of each record to the screen. So you could see the old contents if the new line is shorter. So I fill the space with spaces (purge) before loading a new file. I could add an end of line marker but then there are more things to manage when inserting and deleting and displaying is slower because you have to read the contents for the magic character. Gains and losses... 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 12, 2020 Share Posted September 12, 2020 (edited) Ok, we got fixed. Mr. Falcor4 solved my issue, I also found my R1 that contains my address to pass wasn't being pulled across to my subs correctly.. Edited September 12, 2020 by GDMike 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 12, 2020 Share Posted September 12, 2020 On 9/10/2020 at 6:44 PM, GDMike said: *** MAP A BANK *++ Bank# must be in R1 before calling this routine. *++ Trashes R3. MAPPG LI R12,>1E00 SBO 0 LI R3,>4006 MOV R1,*R3 SBZ 0 RT Lee, I did get this working with a little help from@Falcor4 seems my R1 was getting lost.?? I haven't dug that deep into that part yet, just happy my card is good. Thank you for this code! It's spot on for what I need. 4 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 14, 2020 Share Posted September 14, 2020 Why LI R3,>4006 MOV R1,*R3 instead of MOV R1,@>4006 ? 2 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 14, 2020 Share Posted September 14, 2020 (edited) I was doing a mov @>4006 in my early stages, actually I had a SP1 EQU >4006 And I was doing a mov R# @SP1 But somehow after I submitted code we started using R1 for the future bank # whereas I was using R1 to push >4142 to the screen at location R0, then we just for clarification started making sure R1 was our address and to use it to pass onto >4006. So I think it's just for clarification between registers to use to make sure I knew the difference. Edited September 14, 2020 by GDMike Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 15, 2020 Share Posted September 15, 2020 It's not R1 I'm confused about, but the detour via R3, instead of direct access. 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 15, 2020 Share Posted September 15, 2020 (edited) Again, I think we were just taking existing code and just adjusting what was needed to make it work as is, but I'll write it with what you suggest and make sure it still works. Thx, it'll save a byte or so, and just look better. Yesterday, I was able to finish my init 240 banks with >2020 and it took about 30 seconds, on (RS) real steel. Edited September 15, 2020 by GDMike 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 16, 2020 Share Posted September 16, 2020 You are of course free to do exactly as you want to. I'm asking to find out if there was a specific reason, or if it was just something that slipped by. I've quite frequently found that people who are experienced with some simpler 8-bit processors write bad code for the TMS 9900, not utilizing it's strengths, but just being punished by its weak spots. Fewer instructions take less space, as you write. They also execute faster. If you look at the add-on timing for more advanced addressing modes, you'll find that the same instruction with more complex addressing is at least similar, but frequently better, than writing more instructions. This is due to the fact that instruction fetch and decode isn't pipelined in the TMS 9900, as it is in the TMS 9995, so there's always a penalty for fetching yet another instruction to accomplish something. And even if it's just similar, it reduces clutter in the program. ; This is easier to read at a glance A @VAL1,@VAL2 ; This does the same, but isn't necessary for the TMS 9900 (as it is for some simpler processors) MOV @VAL1,R1 MOV @VAL2,R2 A R1,R2 MOV R2,@VAL2 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 16, 2020 Share Posted September 16, 2020 What I'd enjoy would be a "9900 assembly" BEST practices video series. That's probably something a lot would be interested in. How cool Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 16, 2020 Share Posted September 16, 2020 6 hours ago, apersson850 said: You are of course free to do exactly as you want to. I'm asking to find out if there was a specific reason, or if it was just something that slipped by. I've quite frequently found that people who are experienced with some simpler 8-bit processors write bad code for the TMS 9900, not utilizing it's strengths, but just being punished by its weak spots. Fewer instructions take less space, as you write. They also execute faster. If you look at the add-on timing for more advanced addressing modes, you'll find that the same instruction with more complex addressing is at least similar, but frequently better, than writing more instructions. This is due to the fact that instruction fetch and decode isn't pipelined in the TMS 9900, as it is in the TMS 9995, so there's always a penalty for fetching yet another instruction to accomplish something. And even if it's just similar, it reduces clutter in the program. ; This is easier to read at a glance A @VAL1,@VAL2 ; This does the same, but isn't necessary for the TMS 9900 (as it is for some simpler processors) MOV @VAL1,R1 MOV @VAL2,R2 A R1,R2 MOV R2,@VAL2 I have to get back to it, but when i was writing a native code Forth cross-compiler these were the things that were most interesting to achieve; using that memory-to-memory architecture as much as possible. The secret that I found in an old paper by Thomas Almy was to keep track of all literals (real numbers, addresses, constants) in the source code on a literal stack as they became known and delay emitting code for them until you knew fully what you had to work with. It worked pretty well. 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 16, 2020 Share Posted September 16, 2020 (edited) There are many ways to handle such things. The UCSD p-system's Pascal compiler places all constants either in the constant pool or in the real constant pool. The reason for having two of them is that constants require only byte-flipping, if they are of the wrong byte sex, but real values require real conversion, to adapt to the real format used on the particular machine. This comes back to the portability requirements for code files under the p-system. Regardless of which machine they are compiled on, they should run on any other p-system as well. Once data is in the constant pool, there are special p-codes to fetch constants, at certain lengths, stored at certain offsets from the start of the code pool. But that's different than generating code for the TMS 9900 itself. Note that reading constants from a constant pool can easily be done by indexing. There are two basic methods. ; Use a constant pool base pointer CBASE EQU 9 CONST_POOL DATA A lot of data ;Fetch constant 26 bytes away LI CBASE,CONST_POOL MOV @26(CBASE),R0 The other method is to have the constant pool base address fixed, then index vith the register ; Use fixed constant pool address, then index into it via the register CONST_TO_GET EQU 26 CONST_POOL DATA A lot of data LI R1,CONST_TO_GET MOV @CONST_POOL(R1),R2 The first method makes it easy to switch constant pools. It's also useful for accessing activation records, if you implement recursion in your assembly code (or generate code for a language which supports recursion). The second is convenient if you want to easily fetch larger blocks from the constant pool, since you can increment the index register and keep the base. ; Fetching large constants CONST_TO_GET EQU 26 CONST_SIZE EQU 80 CONST_POOL DATA A lot of data VARIABLE BSS CONST_SIZE LI R1,CONST_TO_GET LI R2,VARIABLE LI R3,CONST_SIZE GET_LOOP MOV @CONST_POOL(R1),R2+ INCT R1 DECT R3 JNE GET_LOOP You can use auto-incrementing for the source too, to make it more efficient. It just depends on if you want to preserve the CONST_POOL base address or not, in the code. ; Fetching large constants CONST_TO_GET EQU 26 CONST_SIZE EQU 80 CONST_POOL DATA A lot of data VARIABLE BSS CONST_SIZE LI R1,CONST_POOL+CONST_TO_GET LI R2,VARIABLE LI R3,CONST_SIZE GET_LOOP MOV R1+,R2+ DECT R3 JNE GET_LOOP Edited September 16, 2020 by apersson850 2 Quote Link to comment Share on other sites More sharing options...
Tursi Posted September 17, 2020 Share Posted September 17, 2020 16 hours ago, GDMike said: What I'd enjoy would be a "9900 assembly" BEST practices video series. That's probably something a lot would be interested in. How cool It ultimately ends up simpler than you'd think on the TI-99/4A. Like @apersson850 said, most of the time the fastest code is that with the fewer number of instructions - no matter how complex those instructions are. The basic tricks that work on most CPUs are true on the 9900 as well, so long as they do not increase instruction count (for instance, a shift is usually faster than a divide - but a divide tends to take more set up so it also wins on the instruction count). Sometimes you just have to think out of the box. This also explains why unrolling a loop is faster, though on the surface it looks like more instructions - but it's actually less. If you code this: LP MOV *R1+,*R2+ DEC R2 JNE LP ... and use it to move 8 bytes, then you get 8 hits on the MOV, 8 hits on the DEC, and 8 hits on the JNE for a total of 24 instructions. But if you unroll it only once: LP MOV *R1+,*R2+ MOV *R1+,*R2+ DECT R2 JNE LP ... then you have 4 hits on each MOV (total of 8), 4 hits on the DECT and 4 hits on the JNE - a total of 16 instructions. Ermm... back to your original program. 1 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 17, 2020 Share Posted September 17, 2020 (edited) Ehhm, you can't use the same register as a pointer and counter (R2). MOV also moves two bytes at a a time, so it takes four MOV to move eight bytes. But I understand what you intended to illustrate. DIV is usually efficient if you need to divide with something else than a multiple of two, and especially if you want to compute both the quotient and the reminder. You get both in one fell swoop. Edited September 17, 2020 by apersson850 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 17, 2020 Share Posted September 17, 2020 (edited) I know that mov, basically is Saying, "copy" rather than MOVe Sometimes that gets confusing when I'm actually wanting something Moved, with a result leaving the original location zero. That'll never happen, unless I'm MOVing zero Edited September 17, 2020 by GDMike Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 17, 2020 Share Posted September 17, 2020 (edited) That's true. Such a move is a two-stroke thing. ; Move and clear COUNT words LI R1,SOURCE LI R2,DEST LI R3,COUNT LOOP MOV *R1,R2+ CLR *R1+ DEC R3 JNE LOOP ; Or replace with some arbitrary constant, instead of clearing LI R1,SOURCE LI R2,DEST LI R3,COUNT LI R4,DEFAULT LOOP MOV *R1,R2+ MOV R4,*R1+ DEC R3 JNE LOOP Edited September 17, 2020 by apersson850 1 Quote Link to comment Share on other sites More sharing options...
GDMike Posted September 17, 2020 Share Posted September 17, 2020 (edited) Ooops, there it is. Haha, It's just a waste to just leave without putting something valuable in it, yup. Or replace with some arbitrary constant, instead of clearing Edited September 17, 2020 by GDMike Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 17, 2020 Share Posted September 17, 2020 Well, that depends entirely on the reason for moving the data in the first place. There's no reason to load a value that has no purpose at the moment. Sometimes you move a value to some other location to make the first location available for other use. Then you just don't care what's in there until it's time to load something useful. In which case it's a waste of time to clear it. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.