SAMS usage in Assembly

GDMike · September 12, 2020

I'm just sharing something that I'm studying, as I think I can find my issue by reading..

----------------------------------------
          AAAA   M    M   SSSSS
         A    A  MM  MM  S
         A    A  M MM M  S
         AAAAAA  M    M   SSSS
         A    A  M    M       S
         A    A  M    M       S
         A    A  M    M  SSSSS
----------------------------------------
 
       Programmer's Documentation
          Documention: 1/19/93
               Joe Delekto
 
Note: This documentation covers the 128k
      AMS system only.  After the AEMS
      is released, documentation will
      be available.
 
----------------------------------------
 
   The AMS expanded memory card is a
unique piece of hardware, in that
mapping is simple, and lends itself
well to overlay-structured programming.
 
   Because of the AMS design, no memory
manager is necessary.  The card itself
uses a 17 bit address bus (18 for 512k)
in order to access the SRAM on the
card.  The upper 4 bits of a standard
16 bit address are used to select one
of 16 mapper registers.  The remaining
12 bits from the address bus are
combined with 12 bits taken from the
mapper register, in order to give a
maximum address bus of 24 bits in
width.  This will allow for a maximum
of 16MB to be accessed.  (AEMS)
 
   On the 128k card, only the 5 least
significant bits (6 for 512k) are used
from the mapper register.  The other
pins from the mapper output are
unconnected.  "Mapping" is accomplished
by changing the value in the mapper
register, to point to one of 32 pages
(64 on 512k).  No reading/writing or
any transfer of memory is involved with
mapping.  All that is changed is the
pointer to the RAM chip on the address
bus.
 
   Because of this, mapping can be done
in a few clock cycles, using only a
couple of instructions.  Programs which
take advantage of the AMS can be
extremely large, with about no change
in execution time.  As with any memory
expansion, there are limitations on
overlay sizes.
 
   It is recommended for the AMS
system, that the root segment of your
program be placed in low memory.
(>2000 - >3FFF)  Code overlays, from 4k
to 24k in size, (in 4k increments) can
be mapped in within the upper 24k of
memory.  This means that an 8k root
segment can call as many (up to 24k)
overlays as necessary.  The result is a
HUGE program, with structure and
modularity.  (Most desired in the
programming field!)
 
   This document will describe how
mapping works, as well as the AMS
resident library created by Art Green
and myself.  I will also go into some
detail as to how you can use Charles
Earl's Hot Bug debugger to debug AMS
code.  Before I get to the meat of this
document, I would like to explain why
we chose our method of memory
expansion.
 
   We chose 4k pages for two reasons.
First, it made the hardware design
simple, and made utility routines short
and fast.  Second of all, since our
system uses overlay methods, many 4k
overlays can reside in memory at once.
The overlays can be any size from 4k to
24k in length, falling on 4k page
boundaries.  The larger the overlay is,
the less overlays you can have.  You
can have 4k, 8k, 12k, 16k, or 24k
overlays, and overlay size determines
how many overlays you can have.  (i.e.
Six 4k, three 8k, etc.)
 
   You are NOT required to use only one
size of overlay.  For instance, you
could have one 12k overlay, and an 8k
overlay, along with a 4k overlay. (24k
total).  Keep in mind that most
subroutines fall well under 4k of
space!  This means that MANY
subroutines can be placed within just
one 4k overlay!
 
   We believe you will find this to be
one of the most flexible memory
expansion systems, ever to be designed
for the TI-99/4A.  Many interesting
applications, besides large programs,
can be developed.  We are making all
information on the use of the memory
available, so that programmers can make
full use of its abilities.
 
----------------------------------------
 
PART ONE:  Map Modes and Registers
==================================
 
   Because the AMS has a 17 bit address
bus, (18 for 512k) and the TI-99/4A
only has a 16-bit address bus, the
extra bit(s) need to come from
somewhere.  These extra bits are taken
from the mapper registers.  4 bits are
taken from the memory address on the
9900 bus, and used to select one of 16
mapper registers.  The remaining 12
bits from the 9900 bus are combined
with the 5 bits from the mapper
register, to make a new bus with 17
bits.  The actual paging process is
done by changing the values in the
mapper registers, to point to new pages
in memory.  Here is the address
diagram:
 
* From 9900 Address bus:
 
>A000
 |+--
 ||
 |+---> 12 MSB to 12 LSB of new address
 |
 +----> 4 MSB to mapper register select
 
* From mapper
 
 5 bit output forms 5 MSB of new address
 
A15 - Mapper Register Select
A14 - "    " "      " "    "
A13 - "    " "      " "    "
A12 - "    " "      " "    "
 
      Mapper Address Bus
 
A11 -       MA11
A10 -       MA10
A09 -       MA09
A08 -       MA08
A07 -       MA07
A06 -       MA06
A05 -       MA05
A04 -       MA04
A03 -       MA03
A02 -       MA02
A01 -       MA01
A00 -       MA00
 
* On Mapper Address bus:
 
     +---+
A15 -| M |- MA16
A14 -| a |- MA15
A13 -| p |- MA14
A12 -| p |- MA13
     | e |- MA12
     | r |
     +---+
 
   Since only the >A000 - >F000 range
inside memory is mapped, register >A
through >F are used in the mapper.
Note the mapper register number
corresponds with the 4 MSB of the
address being accessed.
 
   Once the mapper register is loaded
with a page number, (5 bit address
which forms 5 MSB on AMS bus) any read
or write to the 4k block it corresponds
to, will access that 4k memory page.
 
   For example:  I load mapper register
10 (>A) with page number >15.  Any time
I read/write or access the >A000 block,
I will be writing to the 4k page >15.
 
   If I were to load mapper register
10 with >15, then perform a CLR @>A02E,
I will actually be writing (from the
AMS address bus) the address:
>0001502E.  Notice how the page/offset
are combined on the AMS bus to get a 17
bit address (show here as 32 bits for
clarity).
 
   It will be worthwile to note, that
even though the address was >A000, the
>A had no influence whatsoever on the
new address.  The primary purpose of
the >A was to select which mapper
register the 5 bits will come from.
The same holds true for the entire
>A000 - >FFFF range.  Because >A - >F
are used to select the mapper
registers, we have 6 registers to use,
and 6 4k boundaries.  To load
consecutive addresses, just load
the consecutive page numbers into the
mapper registers.
 
   The AMS system works in two modes.
map mode, and pass mode.  Power-up
places AMS into pass mode.  When the
mode is in pass, AMS acts as a plain
32k card, and the mapper passes actual
addresses used to the AMS bus.  (i.e. a
CLR @>A000 will clear the memory
location at >A000).  There is no
difference between pass mode, and
standard 32k mode.  The second mode is
map mode.  Map mode is the mode which
uses the MSB of an address to select
the mapper register, then dump the
register's contents to the AMS bus.
Note, when map mode is enabled, it
would be a good idea to initialize the
mapper registers to known values!
 
   The CRU address for AMS is >1E00.
In order to use mapping, some CRU
instructions are required to: 1) Switch
AMS between map/pass modes, and 2)
enable/disable register read/writes.
Below is the code which changes map
modes for AMS.
 
 LI   R12,>1E00  * AMS CRU Address
 SBO  1          * Enable Mapping
 SBZ  1          * Disable Mapping
 
   This is the only code required to
switch back between modes.  Only 2
instructions are necessary.
 
   In order to access the mapper
registers, CRU bit 0 must be set.  When
it is, DSR space is temporarily
disable, so that writing in the >4000
space will set a map register.  When
CRU bit 0 is set to zero, original DSR
space is recovered, with no side
effects.  It is recommended that you
enable mapper registers, write their
values, and then disable them
immediately.  The reason being that
forgetting to disable the registers
will keep you from accessing and DSR
routines.  To enable and disable
registers, use the following code:
 
 LI   R12,>1E00  * AMS CRU address
 SBO  0          * Enable Registers
 .
 .               * Set registers here
 .
 SBZ  0          * Disable Registers
 
 
   In order to load a mapper register
with a page number, all you need to do
is write to the >4000 block.  To
determine which mapper register you
wish to change, use the follow
calculation:
 
   MRAD = 2 * Register# + >4000
 
   So that to clear mapper register 10,
you would use:
 
      2 * >A + >4000 = >4014
 
   CLR  @>4014  * Clear Register 10
 
   Note, you can also read from a
mapper register, for the purpose of
saving previous page values:
 
SPAGE BSS  2             * Hold Page #
 
EXMPL MOV  @>4014,@SPAGE * Get MR10
      CLR  @>4014        * Clear MR10
      .
      .
      .
      MOV  @SPAGE,@>4014 * Restore Page
      RT
 
   Because writing to the mapper
registers is just writing to an
address, indirect addressing could be
used as well.  For example, Consider
setting up the mapper so that when in
map mode, addresses are the same as in
pass mode.  It is always a good idea to
first set up the mapper registers, and
then go into map mode.  While in map
mode, registers can be changed at will
to point elsewhere.  If your code to do
mapping resides in upper memory, take
care NOT to change the register where
your code is executing.  Pointing to
some other place in memory will
continue execution on the new page,
causing undesired, or unknown results.
It is possible though, to point to a
new page where code is executing,
provided that valid code exists at the
current offset and new page.
 
   Below is an example which sets up
the mapper registers as normal 32k pass
mode, yet places the mapper into map
mode.  The registers can be changed
later to access other pages of memory.
 
PAGES DATA >0A00,>0B00,>0C00
      DATA >0D00,>0E00,>0F00
 
START LI   R12,>1E00   * AMS CRU
      LI   R1,PAGES    * Page Table
      LI   R2,>4014    * Start at MR >A
      LI   R3,6        * 6 Pages to set
 
      SBO  0           * Enable MR's
RSET  MOV  *1+,*2+     * Write to MR
      DEC  R3          * Dec counter
      JNE  RSET        * Continue
 
      SBZ  0           * Disable MR's
      SBO  1           * Enable map mode
      END
 
   IMPORTANT:  Note that the 5 bit page
value is placed in the most significant
byte of the mapper register.  Because
only 5 bits are used in a map register,
and because the 2 cycle read/write on
the data bus loads the most significant
byte last, the mapper is loaded with
this value.  Therefore, page >18 would
be >1800, page >05 would be >0500, etc.
It might be worthwhile that AEMS
addresses page numbers normally, since
12 bits are used instead.
 
----------------------------------------
 
PART TWO: Overlay Techniques
============================
 
   Because the AMS system is able to
map overlays up to 24k in length, on 4k
boundaries, it lends itself well to
program development using overlays.
 
   First, a root segment is
established, which will contain the
code to call an overlay.  The root
segment must remain in memory (without
using tricky code to map it out), and
will contain the routine used to call
an overlay subprogram.
 
   We recommend the following: 1) Place
the root segment into low memory (8k).
2) Make all overlay calls BLWP
routines.  Below is the stub code for
both the root segment overlay manager,
which is used to handle the simulation
of a BLWP vector for a mapped
environment.
 
****************************************
 
* Overlay Manager
* Version 1.0
* R.A.Green
 
OVMGR SBO  0         * Enable map regs
      MOV  *R11+,R10 * Get N # pages
      MOV  *R11+,R9  * Get 1st map reg
      MOV  *R11+,R7  * Get 1st page #
OMGR2 MOV  R7,*R9+   * Set mapper reg
      AI   7,>0100   * Add 1 to page #
      DEC  R10       * Loop for N pages
      JGT  OMGR2     * Finish loop
      SBZ  0         * Disable map regs
      MOV  *R11,*R11 * Get real BLWP vec
      MOV  *R11+,R7  * Get WSP
      MOV  *11,R9    * Get sub address
      MOV  R13,@26(R7)  * Simulate BLWP
      MOV  R14,@28(R7)  *
      MOV  R15,@30(R7)  *
OMGRW EQU  $-12      * OVMGR workspace
 
      LWPI 0         * R6,R7 user wrkspc
      B    @0        * R8,R9 call sub
      BSS  2         * R10
      BSS  2         * R11
      DATA >1E00     * R12 AMS CRU addr
      BSS  6         * R13 - R15
 
****************************************
 
   Below is the code which replaces the
original BLWP call in the root segment.
This is done for every subroutine that
generates an overlay.
 
* Overlay Call
* Version 1.0
* R.A.Green
* BLWP @OSUB
 
OSUB   DATA OMGRW  * Manager WSP
       DATA $+2    *
       BL   @OVMGR * Use overlay manager
       DATA N      * # Pages in overlay
       DATA >40xx  * 1st mapper reg addr
       DATA n      * 1st page number
       DATA sub    * REAL BLWP vector
 
****************************************
 
   In order to generate a call for an
overlayed subroutine, the real BLWP
call must be replaced by the OSUB
information.  Now, the overlay
generator needs to know: 1) How many
pages the overlay is.  Remember that it
can be 4k-24k in length, broken into 4k
pages.  The routine needs to know what
mapper registers to start mapping the
'N' pages in at.  The first page number
the overlay resides on is also given,
along with the ACTUAL BLWP vector
address for the overlayed routine.
 
   To illustrate this, let us say our
root segment is in the >2000 - >3FFF
block.  We have created an overlay, and
inside the overlay is a routine called
INPUT, for which the BLWP vector starts
at >C2E0.  The overlay is in page >18
of memory, and is only 4k, or 1 page in
length.  To call the overlay, we would
use the following code:
 
   BLWP @OSUB1  * Call overlay stub
   DATA >0001   * 1 page long (4k block)
   DATA >4018   * >C000 block
   DATA >1800   * 1st page # (only one)
   DATA >C2E0   * BLWP Vector
 
 
   Note, it would be very useful to
have a program loader to load segments
of code into different pages.  Although
such a loader exists for AMS, it is
only used for AMS files with special
headers for overlay and root segments.
A similair loader can be constructed,
which loads the overlays into their
corresponding pages.
 
   The overlay code examples above, are
the code segments installed
(automatically) by the linker.  That
eliminates the need for passing the
arguments to the overlay generator, and
keeping track of relative page
addresses.
 
   You may however, choose your own
method of overlaying.  We made it very
flexible to customize your software so
you can choose how you want to map.
Keep in mind that other programs may be
resident to AMS, and using the
linker/loader will ensure that AMS
programs are page relocatable, and
won't overwrite memory resident code.
 
----------------------------------------
 
PART THREE: Using Hot Bug with AMS
==================================
 
   Most often overlay programs are
tedious to debug.  If you have access
to Charles Earl's Hot Bug debugger, I
recommend you learn how to use it.  It
is by far one of the best debugging
utilities available, and can certainly
work well for debugging AMS programs.
 
   Since Hot Bug will also load program
files, you can use the debugger to
change the page map, and load in your
overlay code!
 
   Hot Bug Command Summary
   ER  - Edit Register
   EW  - Edit Word
   DM  - Display Memory
   SPC - Set Program Counter
   G   - Go (# of steps)
 
 
   In order to access the registers,
and check the code/data within pages,
we need to enable both the mapper, and
the registers.  Choose a word of memory
that does not have code to use the
following commands: (For this example,
I use >3FF0)
 
   1:  ER 12 1E00
   2:  EW 3FF0 1D00
   3:  EW 3FF2 1D01
   4:  SPC 3FF0
   5:  G 2
 
(1: Load Register 12 with CRU >1E00)
(2: Put SBO 0 at 3FF0 Enable REGS  )
(3: Put SBO 1 at 3FF2 MAP Mode     )
(4: Set program counter to >3FF0   )
(5: Execute 2 instructions         )
 
   NOTE: If the mapper is in an unknown
state, (register values unknown), you
will want to set the registers before
actually placing into map mode.  Just
use G 1 instead, edit the registers
(see below) and then G 1 again to get
into map mode.
 
   To read/write to the mapper
registers, use the DM command to look
at the >4000 block.  Only addresses
>4000 - >4020 are of interest to us.
(Mapper regs 0 - 16).
 
   NOTE: Even though the upper 24k is
mapped using mapper registers 10 - 16,
the other mapper registers can be used
for temporary storage.
 
   1: DM 4000
 
(1: Display Memory at >4000)
 
   In order to change a register value,
just use the EW command.  For example,
to load mapper register 11, (>B000
block) with page >15, use the
following:
 
   1: EW 4016 1500
 
(1: Load mapper register 11 with >15)
 
   Now let's try an experiment.  What
we will do is write the same page to 2
different mapper registers, and observe
what happens.  Use the following
commands:
 
   1: EW 4014 1500
   2: EW 4016 1500
   3: DM A000
 
(1: Load mapper register 10 with >15)
(2: Load mapper register 11 with >15)
(3: Display memory at >A000         )
 
   Note what the data in memory is at
>A000.  Now, if you use DM B000, you
should see the same data you saw
before.  Let's try something
interesting.  Use the following
commands:
 
   1: EW A000 FACE
   2: DM B000
 
(1: Put value >FACE into >A000)
(2: Display memory at >B000   )
 
   When you use the DM B000, you should
get a surprise.  When you wrote to
A000, you actually changed the word at
B000, as well as A000.  Why?  Because
both 4k block point to the same page!
Perhaps now you can envision some of
the interesting tricks you can
accomplish with the AMS system.  One
such application is the arbitrary
locating of data buffers!
 
   It is also possible to load a memory
image file, on non-consecutive pages,
and yet still load the mapper registers
such that the program is contiguous in
the upper 24k!  If that's so, then it
means we can load E/A option 5 program
files anywhere inside AMS, and then
just map in their pages to the proper
blocks in high memory!  In this manner,
even code with absolute origins becomes
page relocatable, at least for paging
purposes.
 
   By placing page numbers into the
registers, and using Hot Bug's LOAD
command, you can load overlay image
files.  Keep track of the address for
the BLWP vector in the overlay, as well
as the page you LOAD it into, and how
many pages it takes up.  This
information you will need to pass to
the overlay generator in your program.
 
   See?  Loading, debugging, and
running overlay code on the AMS system
is very feasible, and not difficult.
 
----------------------------------------
 
   This concludes this section of
programmer's documentation.  The next
document will focus on the memory
resident utility routines, which AMS
programmers can use in their software.
Memory allocation, exit code, memory
moves, and far VDP read/write routines
are available.  Also, AMS program have
access to the E/A 5, and AMS Overlay
program file loader.  The loader will
load either type of file.  The exit
routines for AMS have the option of
keeping the programmer's code resident
for instant execution when desired.
 
   We have worked very hard for the
past couple of years, to make this
memory expansion as user friendly as
possible.  We are, and will continue to
supply support for the AMS card.
Without the software support to use
AMS, it would just be an expensive
paperweight.

Asmusr · September 12, 2020

4 hours ago, Lee Stewart said:

I just had an epiphany! What @FALCOR4 said earlier about reading SAMS registers and a discussion among [member-'RXB'], [member-'tursi'] and me over in the Classic99 Updates thread led me to the conclusion that my method in fbForth 2.0 of determining the amount of SAMS memory available is flawed! See post #1870 for my thinking.

What occurred to me is that checking any, putative “highest” bank of an actual SAMS card will succeed, even for the lowest SAMS (128 KiB, highest page = >001F) expected, leading my code to conclude there is 32 MiB of SAMS available. FYI, my method writes a test value to the mapping window (>E000) and starts paging in “highest” pages of SAMS beginning with 32 MiB’s highest page, >1FFF, to see whether the test value is still there. If it fails, the code checks for SAMS at half that value until it succeeds or fails at 128 KiB, My point is that mapping >1FFF will always map a writable SAMS page for any working SAMS, 128 KiB or higher. If the card only has 128 KiB, mapping page >1FFF will actually map page >001F because the unattached bits are ignored. I will need to get more clever so that I am mapping only pages that I expect or use a clever pattern that will lead me to the correct conclusion in the shortest amount of code—aye, there’s the rub! ( sorry, Will )

...lee

If you write the page number to all the pages you want to test first, e.g. 255, 127, 63, 31, and then check that the values are still there in a second loop, it should work.

+TheBF · September 12, 2020

7 hours ago, Asmusr said:

If you write the page number to all the pages you want to test first, e.g. 255, 127, 63, 31, and then check that the values are still there in a second loop, it should work.

Could you write then read inside one loop.

These new huge cards are getting pretty big for our old 9900.

Even now erasing 1Mbyte takes significant time.

+TheBF · September 12, 2020

I should have read this earlier myself GDMike.

I ended up re-inventing this after a quite a long road of trial and error.

PAGES DATA >0A00,>0B00,>0C00
      DATA >0D00,>0E00,>0F00
 
START LI   R12,>1E00   * AMS CRU
      LI   R1,PAGES    * Page Table
      LI   R2,>4014    * Start at MR >A
      LI   R3,6        * 6 Pages to set
 
      SBO  0           * Enable MR's
RSET  MOV  *1+,*2+     * Write to MR
      DEC  R3          * Dec counter
      JNE  RSET        * Continue
 
      SBZ  0           * Disable MR's
      SBO  1           * Enable map mode
      END

GDMike · September 12, 2020

My problem isn't actually how to create a mapped area, but adding banks of that mapped area is giving me fits. I'm not sure the problem isn't my card, as it's worked in the past for me, then stopped during testing, then I asked for you all to chime in, then it worked until I powered off and on the PEB?

So I'm really needing source that gives the 240 banks of address

>3000-3FFF(>4006 SAMS, SMR3) that I can test with.

It seems now, I can create page >10 but if I make page >11 and read it, it is actually a copy or IS page >10 as anything I map and read back just reads as my page 16 data. So I'm back into looking at the docs as I must be missing something?? And I'm having to test and turning off the Peb at each test.

Today, even though the address can't be changed, I'll be moving the card to a different port in the Peb, or just reseating may do something too, not sure, but I can rule that out.

Thx for that routine, definitely something I can use for initializing the card with.

Edited September 12, 2020 by GDMike

+FALCOR4 · September 12, 2020

17 hours ago, FALCOR4 said:

Correct. The SAMS circuitry is simplistic, and I don't mean to say that is a bad thing. It's not, it just means that not all possible functionality is implemented which would require more ICs and board space. It will put the same page number (repeats) for a 1M segment in both the LSByte and the MSByte from the LS612 when you do a register read (>00 to >FF). The LSByte that is latched (which gives you banks beyond the first 1M) is not connected in such a way that it can be read back. So, you'll only be able to see page numbers for any one particular 1M bank, you won't be able to read back what bank you're in which would be in the LSByte if it were implemented. If needed, the software will just have to keep track of banks.

I just put together another 4M board and am doing a burn in right now that should run through the night. When it's done, I'll play with it to verify that what I'm telling you is true or not. I'll report back with what I find.

Finished burn in on the second 4M board and it passed with flying colors <yipee>.

I did a double check on reading back the register values and it indeed only reads back 1M of pages and not the bank you may be in. Example:

LI R12,<1E00 CRU ADDRESS

LI R0,>0A01 SET REGISTER VALUE WITH PAGE >0A AND BANK >01

SBO 0 ENABLE WRITING TO REGISTERS

MOV R0,@>4014 LOAD REGISTER FOR MEM LOCATION >A000

MOV @>4014,R1 READ BACK REGISTER, WHAT YOU GET IS

NOT >0A01 BUT RATHER >0A0A. THE BANK

VALUE >01 DOES NOT READ BACK.

SBZ 0 TURN OFF ACCESS TO REGISTERS

+retroclouds · September 12, 2020

2 hours ago, TheBF said:

Could you write then read inside one loop.

These new huge cards are getting pretty big for our old 9900.

Even now erasing 1Mbyte takes significant time.

I didn’t even bother to try to erase SAMS before use. As long as your pointers and memory structures are ok, it doesn’t matter what is next to what you have in use. Don’t see a benefit in initializing memory up front or am I missing something?

+Lee Stewart · September 12, 2020

On 9/12/2020 at 2:28 AM, Asmusr said:

If you write the page number to all the pages you want to test first, e.g. 255, 127, 63, 31, and then check that the values are still there in a second loop, it should work.

If you mean writing 255 to a spot in page 255, 127 to the same spot in page 127, ..., I agree that it should work. What I have finally contrived is a little faster and less code, I think, but I will set up both ways to be sure. My way (currently):

Initialize mapping with pages >2:>2000, >3:>3000, >A:>A000, >B:>B000, >C:>C000, >D:>D000, >E:>E000, >F:>F000
Write test word to >E000.
Start at page >000E higher than half of highest SAMS page size (32 MiB): >2000 / 2 + >000E = >100E)
Map current SAMS page, >xxxE into >E000.
1. If we get past page >001E, no SAMS. Quit with SAMS flag = 0.
Check for test word.
1. If equal, shift left-most set bit right one bit (2nd time, >080E, ...) and go to 4.
2. If not equal, we know SAMS size. Quit with SAMS flag = highest SAMS page (32 MiB:>1FFF, ...,, 128 KiB:>001F).
SAMS flag = highest SAMS page available. 0 indicates no SAMS.

...lee

Edited September 13, 2020 by Lee Stewart
correction

+TheBF · September 12, 2020

1 hour ago, retroclouds said:

I didn’t even bother to try to erase SAMS before use. As long as your pointers and memory structures are ok, it doesn’t matter what is next to what you have in use. Don’t see a benefit in initializing memory up front or am I missing something?

In ED99 I used a naïve record based data structure for display speed. This is trouble under 2 conditions:

1. When I list the visible page of records if they are filled with random stuff like when you first start the machine it is pretty ugly.

2. If the previous file content is still in the memory and you load a new file, I don't have an end of line marker. I just write the contents of each record to the screen. So you could see the old contents if the new line is shorter.

So I fill the space with spaces (purge) before loading a new file.

I could add an end of line marker but then there are more things to manage when inserting and deleting and displaying is slower because you have to read the contents for the magic character.

Gains and losses...

GDMike · September 12, 2020

Ok, we got fixed. Mr. Falcor4 solved my issue, I also found my R1 that contains my address to pass wasn't being pulled across to my subs correctly..

Edited September 12, 2020 by GDMike

GDMike · September 12, 2020

On 9/10/2020 at 6:44 PM, GDMike said:

*** MAP A BANK *++ Bank# must be in R1 before calling this routine. *++ Trashes R3.

MAPPG

LI R12,>1E00

SBO 0

LI R3,>4006

MOV R1,*R3

SBZ 0

RT

Lee, I did get this working with a little help from@Falcor4 seems my R1 was getting lost.?? I haven't dug that deep into that part yet, just happy my card is good.

Thank you for this code!

It's spot on for what I need.

apersson850 · September 14, 2020

Why

LI R3,>4006

MOV R1,*R3

instead of

MOV R1,@>4006

?

GDMike · September 14, 2020

I was doing a mov @>4006 in my early stages, actually I had a SP1 EQU >4006

And I was doing a mov R# @SP1

But somehow after I submitted code we started using R1 for the future bank # whereas I was using R1 to push >4142 to the screen at location R0, then we just for clarification started making sure R1 was our address and to use it to pass onto >4006.

So I think it's just for clarification between registers to use to make sure I knew the difference.

Edited September 14, 2020 by GDMike

apersson850 · September 15, 2020

It's not R1 I'm confused about, but the detour via R3, instead of direct access.

GDMike · September 15, 2020

Again, I think we were just taking existing code and just adjusting what was needed to make it work as is, but I'll write it with what you suggest and make sure it still works.

Thx, it'll save a byte or so, and just look better.

Yesterday, I was able to finish my init 240 banks with >2020 and it took about 30 seconds, on (RS) real steel.

Edited September 15, 2020 by GDMike

apersson850 · September 16, 2020

You are of course free to do exactly as you want to. I'm asking to find out if there was a specific reason, or if it was just something that slipped by. I've quite frequently found that people who are experienced with some simpler 8-bit processors write bad code for the TMS 9900, not utilizing it's strengths, but just being punished by its weak spots.

Fewer instructions take less space, as you write. They also execute faster. If you look at the add-on timing for more advanced addressing modes, you'll find that the same instruction with more complex addressing is at least similar, but frequently better, than writing more instructions. This is due to the fact that instruction fetch and decode isn't pipelined in the TMS 9900, as it is in the TMS 9995, so there's always a penalty for fetching yet another instruction to accomplish something.

And even if it's just similar, it reduces clutter in the program.

; This is easier to read at a glance

A	@VAL1,@VAL2

; This does the same, but isn't necessary for the TMS 9900 (as it is for some simpler processors)

MOV	@VAL1,R1
MOV	@VAL2,R2
A	R1,R2
MOV	R2,@VAL2

GDMike · September 16, 2020

What I'd enjoy would be a "9900 assembly" BEST practices video series.

That's probably something a lot would be interested in. How cool

+TheBF · September 16, 2020

6 hours ago, apersson850 said:
You are of course free to do exactly as you want to. I'm asking to find out if there was a specific reason, or if it was just something that slipped by. I've quite frequently found that people who are experienced with some simpler 8-bit processors write bad code for the TMS 9900, not utilizing it's strengths, but just being punished by its weak spots.

Fewer instructions take less space, as you write. They also execute faster. If you look at the add-on timing for more advanced addressing modes, you'll find that the same instruction with more complex addressing is at least similar, but frequently better, than writing more instructions. This is due to the fact that instruction fetch and decode isn't pipelined in the TMS 9900, as it is in the TMS 9995, so there's always a penalty for fetching yet another instruction to accomplish something.

And even if it's just similar, it reduces clutter in the program.
; This is easier to read at a glance

A	@VAL1,@VAL2

; This does the same, but isn't necessary for the TMS 9900 (as it is for some simpler processors)

MOV	@VAL1,R1
MOV	@VAL2,R2
A	R1,R2
MOV	R2,@VAL2

I have to get back to it, but when i was writing a native code Forth cross-compiler these were the things that were most interesting to achieve; using that memory-to-memory architecture as much as possible.

The secret that I found in an old paper by Thomas Almy was to keep track of all literals (real numbers, addresses, constants) in the source code on a literal stack as they became known and delay emitting code for them until you knew fully what you had to work with. It worked pretty well.

apersson850 · September 16, 2020

There are many ways to handle such things. The UCSD p-system's Pascal compiler places all constants either in the constant pool or in the real constant pool. The reason for having two of them is that constants require only byte-flipping, if they are of the wrong byte sex, but real values require real conversion, to adapt to the real format used on the particular machine. This comes back to the portability requirements for code files under the p-system. Regardless of which machine they are compiled on, they should run on any other p-system as well.

Once data is in the constant pool, there are special p-codes to fetch constants, at certain lengths, stored at certain offsets from the start of the code pool.

But that's different than generating code for the TMS 9900 itself. Note that reading constants from a constant pool can easily be done by indexing. There are two basic methods.

; Use a constant pool base pointer

CBASE EQU 9
CONST_POOL	DATA	A lot of data

;Fetch constant 26 bytes away

	LI	CBASE,CONST_POOL
	MOV	@26(CBASE),R0

The other method is to have the constant pool base address fixed, then index vith the register

; Use fixed constant pool address, then index into it via the register

CONST_TO_GET	EQU	26
CONST_POOL	DATA	A lot of data

	LI	R1,CONST_TO_GET
	MOV	@CONST_POOL(R1),R2

The first method makes it easy to switch constant pools. It's also useful for accessing activation records, if you implement recursion in your assembly code (or generate code for a language which supports recursion).

The second is convenient if you want to easily fetch larger blocks from the constant pool, since you can increment the index register and keep the base.

; Fetching large constants

CONST_TO_GET	EQU	26
CONST_SIZE	EQU	80
CONST_POOL	DATA	A lot of data
VARIABLE	BSS	CONST_SIZE

	LI	R1,CONST_TO_GET
	LI	R2,VARIABLE
	LI	R3,CONST_SIZE
GET_LOOP	MOV	@CONST_POOL(R1),R2+
	INCT	R1
	DECT	R3
	JNE	GET_LOOP

You can use auto-incrementing for the source too, to make it more efficient. It just depends on if you want to preserve the CONST_POOL base address or not, in the code.

; Fetching large constants

CONST_TO_GET	EQU	26
CONST_SIZE	EQU	80
CONST_POOL	DATA	A lot of data
VARIABLE	BSS	CONST_SIZE

	LI	R1,CONST_POOL+CONST_TO_GET
	LI	R2,VARIABLE
	LI	R3,CONST_SIZE
GET_LOOP	MOV	R1+,R2+
	DECT	R3
	JNE	GET_LOOP

Edited September 16, 2020 by apersson850

Tursi · September 17, 2020

16 hours ago, GDMike said:

What I'd enjoy would be a "9900 assembly" BEST practices video series.

That's probably something a lot would be interested in. How cool

It ultimately ends up simpler than you'd think on the TI-99/4A. Like @apersson850 said, most of the time the fastest code is that with the fewer number of instructions - no matter how complex those instructions are. The basic tricks that work on most CPUs are true on the 9900 as well, so long as they do not increase instruction count (for instance, a shift is usually faster than a divide - but a divide tends to take more set up so it also wins on the instruction count). Sometimes you just have to think out of the box.

This also explains why unrolling a loop is faster, though on the surface it looks like more instructions - but it's actually less. If you code this:

LP
 MOV *R1+,*R2+
 DEC R2
 JNE LP

... and use it to move 8 bytes, then you get 8 hits on the MOV, 8 hits on the DEC, and 8 hits on the JNE for a total of 24 instructions. But if you unroll it only once:

LP
 MOV *R1+,*R2+
 MOV *R1+,*R2+
 DECT R2
 JNE LP

... then you have 4 hits on each MOV (total of 8), 4 hits on the DECT and 4 hits on the JNE - a total of 16 instructions.

Ermm... back to your original program.

apersson850 · September 17, 2020

Ehhm, you can't use the same register as a pointer and counter (R2). MOV also moves two bytes at a a time, so it takes four MOV to move eight bytes.

But I understand what you intended to illustrate.

DIV is usually efficient if you need to divide with something else than a multiple of two, and especially if you want to compute both the quotient and the reminder. You get both in one fell swoop.

Edited September 17, 2020 by apersson850

GDMike · September 17, 2020

I know that mov, basically is Saying, "copy" rather than MOVe

Sometimes that gets confusing when I'm actually wanting something Moved, with a result leaving the original location zero.

That'll never happen, unless I'm MOVing zero

Edited September 17, 2020 by GDMike

apersson850 · September 17, 2020

That's true. Such a move is a two-stroke thing.

; Move and clear COUNT words

	LI	R1,SOURCE
	LI	R2,DEST
	LI	R3,COUNT
LOOP	MOV	*R1,R2+
	CLR	*R1+
	DEC	R3
	JNE	LOOP

; Or replace with some arbitrary constant, instead of clearing
	LI	R1,SOURCE
	LI	R2,DEST
	LI	R3,COUNT
	LI	R4,DEFAULT
LOOP	MOV	*R1,R2+
	MOV	R4,*R1+
	DEC	R3
	JNE	LOOP

Edited September 17, 2020 by apersson850

GDMike · September 17, 2020

Ooops, there it is. Haha,

It's just a waste to just leave without putting something valuable in it, yup.

Or replace with some arbitrary constant, instead of clearing

Edited September 17, 2020 by GDMike

apersson850 · September 17, 2020

Well, that depends entirely on the reason for moving the data in the first place. There's no reason to load a value that has no purpose at the moment. Sometimes you move a value to some other location to make the first location available for other use. Then you just don't care what's in there until it's time to load something useful. In which case it's a waste of time to clear it.

SAMS usage in Assembly

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members