Jump to content

Photo

SAMS usage in Assembly


90 replies to this topic

#51 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 12:58 AM

Oooh... Looking in the Editor Assembler manual, section 7.19, page 125, I see XOP is a Format IX instruction. It has two parameters. My head hurts! I thought it was a single parameter instruction. Going to have to read this carefully!
Edit:
 
 

The effective address of the source operand is placed in Workspace Register 11 of the XOP workspace.

 
The effective address. So, if my workspace (prior to XOP) was >8300, and I do a XOP R1,1 I should see >8302 in R11 of the XOP workspace, right?
 
Time to fire up Classic99, ASM99 and Notepad++
 
<clicks knuckles> :-D


Edited by Willsy, Thu Sep 4, 2014 1:44 AM.


#52 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 1:45 AM

My interpretation was correct. Just single stepped the following in Classic99:

 

        def START
       
START   lwpi >8300      ; set my workspace
       
        clr r0
        xop r1,1        ; yeah baby that's what I'm talking 'bout
loop    inc r0
        jmp loop

        aorg >fff8
        nop             ; at this point R11 should be >8302
                        ; >8302=address of R1 (passed in via XOP)
       
        rtwp            ; see ya
       
        end

 

Gotta love Classic99 :)



#53 Stuart ONLINE  

Stuart

    Dragonstomper

  • 692 posts
  • Location:Southampton, UK

Posted Thu Sep 4, 2014 1:48 AM

Oooh... Looking in the Editor Assembler manual, section 7.19, page 125, I see XOP is a Format IX instruction. It has two parameters. My head hurts! I thought it was a single parameter instruction. Going to have to read this carefully!
Edit:
 
 

 
The effective address. So, if my workspace (prior to XOP) was >8300, and I do a XOP R1,1 I should see >8302 in R11 of the XOP workspace, right?
 
Time to fire up Classic99, ASM99 and Notepad++
 
<clicks knuckles> :-D

Yes, I think you're right. Also, if you do XOP @>FF00,1, the value in R11 will be >FF00.

 

Now, what if you do XOP *R1,1?  ;-)



#54 mizapf ONLINE  

mizapf

    River Patroller

  • 2,516 posts
  • Location:Germany

Posted Thu Sep 4, 2014 5:22 AM

XOP is even more expensive than BLWP because it implies one more memory access: save source operand to R11 of new workspace.

 

Base cycles for TMS9900:

BLWP = 26

XOP = 36

 

Base cycles for TMS9995:

BLWP = 11

XOP = 15

 

(base cycles only apply for all operands being register accesses (and being on-chip for 9995))

 

I don't know whether there are 99/4A consoles without XOP 1. Never heard of them, although I remember the heads-up in the E/A manual.


Edited by mizapf, Thu Sep 4, 2014 5:23 AM.


#55 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 5:54 AM

XOP is even more expensive than BLWP because it implies one more memory access: save source operand to R11 of new workspace.

 

Base cycles for TMS9900:

BLWP = 26

XOP = 36

 

Base cycles for TMS9995:

BLWP = 11

XOP = 15

 

(base cycles only apply for all operands being register accesses (and being on-chip for 9995))

 

I don't know whether there are 99/4A consoles without XOP 1. Never heard of them, although I remember the heads-up in the E/A manual.

 

Good information, Michael, thanks.

 

I think it's a neat instruction, but since the vectors are all in ROM it's of limited use on the 4A. I think I'd rather stick to BLWP. As you say, it's a little cheaper, and I think it's easier to pass parameters using BLWP; at least, it seems that way to me:

 

BLWP @DO_THING

DATA SOME_DATA

 

Though I suppose there's nothing stopping you from doing that with XOP.

 

One issue with XOP is that the PC jumps to the end of memory (XOP sets PC to >FFF8) so the first thing (realistically) you would have to do in an XOP routine is branch somewhere else to actually execute your routine. So that's another cost. You do not have that issue with BLWP.

 

I think BLWP is a great instruction. We avoid it because we see it as expensive. However, is it really expensive compared to the 8-bitters, which would typically push data onto a stack on one side of a subroutine call, and pop it off on the other?



#56 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 5:56 AM

Now, what if you do XOP *R1,1?  ;-)

 

Now you're just showing off :) You're making my head hurt!



#57 Stuart ONLINE  

Stuart

    Dragonstomper

  • 692 posts
  • Location:Southampton, UK

Posted Thu Sep 4, 2014 6:59 AM

 

Now you're just showing off :) You're making my head hurt!

Confuses the hell out of me too!

 

I would think XOP is slightly quicker than a BLWP if you want to pass a parameter. So with your BLWP following by a DATA statement above, with the XOP you don't need to MOV your parameter value (from a register or elsewhere) to that DATA location?


Edited by Stuart, Thu Sep 4, 2014 7:05 AM.


#58 mizapf ONLINE  

mizapf

    River Patroller

  • 2,516 posts
  • Location:Germany

Posted Thu Sep 4, 2014 7:07 AM

XOP *R1,1 means this is a XOP 1 operation (vector FFD8/FFF8), and after the execution, the memory word at the address stored in the old R1 is stored in the new R11.

 

I always found operations like BLWP R4 looking weird, but this only means that the vector is stored in R4 (WP) and R5 (PC).

 

BTW, did you know that on B @>A000, the TMS9900 actually reads the contents of A000 and discards them? (Not the 9995)



#59 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 7:30 AM

Confuses the hell out of me too!

 

I would think XOP is slightly quicker than a BLWP if you want to pass a parameter. So with your BLWP following by a DATA statement above, with the XOP you don't need to MOV your parameter value (from a register or elsewhere) to that DATA location?

 

Makes sense. However, unless the XOP routine is very short, you won't have enough memory (because it's right at the end of RAM) to fit the routine in, so you then you have to branch elsewhere to actually run your code. That may cancel out any advantage? I'm not au-fait with the TMS9900 instruction cycle counts, particularly on the 4A's crack-smoking architecture!



#60 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 7:37 AM



XOP *R1,1 means this is a XOP 1 operation (vector FFD8/FFF8), and after the execution, the memory word at the address stored in the old R1 is stored in the new R11.

 

I always found operations like BLWP R4 looking weird, but this only means that the vector is stored in R4 (WP) and R5 (PC).

 

BTW, did you know that on B @>A000, the TMS9900 actually reads the contents of A000 and discards them? (Not the 9995)

 

So... what you're saying is... (we're coders, right... ;) )

LI R1,>FACE
XOP *R1,1
 
At this point, R11 of the XOP workspace contains the value >FACE

Seems to make sense to me.

 

EDIT: Just tried it in classic99 - you're right Michael. A neat way to pass a value into an XOP routine, rather than the address of a value. Cool.

 

See the screen-shot below:

Attached File  XOP.png   174.02KB   3 downloads


Edited by Willsy, Thu Sep 4, 2014 7:38 AM.


#61 mizapf ONLINE  

mizapf

    River Patroller

  • 2,516 posts
  • Location:Germany

Posted Thu Sep 4, 2014 7:46 AM

Ehm ... sorry, my fault, and you even confirmed it ... :P

 

R11 contains the memory address of the source operand, not the value. You can see that in your example. You loaded R1 with 0xFACE, and R11 contains 0xFACE after the call. So when we assume that the workspace is A000, if you do a XOP R1,1, the new R11 contains A002, and when you load R1 with 0xFACE and call XOP *R1,1, then it contains 0xFACE.

 

[Edit: In machine language we cannot distinguish between a value and an address since we don't have data types; thus, the value 0xFACE in R1 can be interpreted as an address, and by using XOP *R1,1 you pass that address to the XOP routine. Of course, if you like, you can also think of it as a value that you pass to the routine as if you were doing a cast (int) in C.]


Edited by mizapf, Thu Sep 4, 2014 8:00 AM.


#62 gregallenwarner OFFLINE  

gregallenwarner

    Chopper Commander

  • Topic Starter
  • 181 posts

Posted Thu Sep 4, 2014 8:56 AM

This is getting way deeper than I ever imagined from the beginning!

 

So let me see what we've got so far. Are we still saying that XOP is slower than BLWP? If that's the case, I'd rather go with BLWP, since passing arguments to a BLWP routine isn't inherently more difficult than passing arguments to an XOP routine. Different, but not harder. Speed is a big concern of mine.

 

Basically, here's what I'm attempting to do: I want to implement a Heap memory structure. I've read the C stdlib source code and found out how malloc() and free() work, when in the context of a single contiguous block of memory. Of course, malloc() and free() are also running in a virtual memory space which is provided to it by the operating system, which we don't have on the TI. But working with hard coded addresses, I believe I could implement a Heap structure in the TI using standard memory.

 

Now I want to consider using the SAMS as well. malloc() returns an address pointing to the beginning of the allocated chunk of memory, immediately following the length prefix, so in the SAMS case, I believe I can replace this return value with an address/bank pair. I will need to write a fetch routine to operate on this address/bank pair, so that the application software can be agnostic about where in memory its data lies. The memory management routine will take care of the mapping/banking for it.

 

Of course, what this means is, I'll have a maximum allocation size of 4K when requesting a block from malloc(), since the application software won't be able to tell when it's reached the end of a page, if the memory management function is doing all the management for it. I'll have to think on this some more.

 

I'm sort of a computer science geek, so I like data structures and structured programming and things like this, so I like the notion of having a managed Heap of memory that I can allocate and deallocate at will, avoiding memory leaks that way. Just in case you guys wanted to know what I'm learning all this for!



#63 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 9:05 AM

Sounds excellent. The TI has been lacking good memory management techniques, especially in regard to the SAMS unit, so this could open up a whole new world!



#64 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Thu Sep 4, 2014 9:11 AM

A completely transparent malloc/free implementation with paged memory support would be amazeballs! Especially if usable from C :).

 

It could be implemented with different back-ends, so that you could perhaps page in a swap file on a ram disk, SAMS memory, Supercart memory, ... depending on the hardware available without having to write specifically for one type of expansion.



#65 gregallenwarner OFFLINE  

gregallenwarner

    Chopper Commander

  • Topic Starter
  • 181 posts

Posted Thu Sep 4, 2014 9:22 AM

I already have a calling convention/call stack of sorts written for my own personal uses, though I'm not sure how compatible it is with the current C compiler for the TMS9900. I haven't even attempted to use a C compiler with the TI yet, so I'm not sure what the binaries it generates even look like, or how to ensure my assembly routines would be compatible with it. If C programming on the TI is a common enough thing, please let me know, so I can investigate it now, rather than try and shoe-horn in that compatibility later!

 

Also, I happen to know that there exists at least two different C compilers for the TI--C99 and a TMS9900 extension for GNU GCC. Correct me if I'm wrong about those two. In any case, what are people currently using the most? Do either of those compilers already have SAMS support built in?

 

I'd be very interested in writing new assembly routines to expand the standard libraries for either of those compilers, if the demand were great enough.



#66 mizapf ONLINE  

mizapf

    River Patroller

  • 2,516 posts
  • Location:Germany

Posted Thu Sep 4, 2014 9:35 AM

A lot of things are possible, but the good old 64K address space eventually steps on our feet. People like me who dug a bit deeper into Geneve usage and programming know that problem already pretty well, and this is also what I explained on ninerpedia.

 

At the end of day, virtual memory addresses can only be reasonably used when invoking special routines. That is, you let the routine do all memory accesses for you in an opaque way (you don't have insight into the actual operations).

 

Generally, suppose you wanted to malloc a 70000 byte array. You would expect to be able to fill that array with a certain value, e.g. FF. But this won't work on the machine level in a transparent way. Apart from the fact that you need more than 16 bit to store the address, you'd somehow need to invoke bank-switching code just in time to replace certain parts of memory, including calculating new offsets and addresses. This is the opposite of "transparent".

 

Writing programs in C would not be much easier either, since many memory operations of C map directly on machine language operation.

 

I once thought about a kind of hardware abstraction layer (yes, strongly trying to ignore the not-so-impressive clock speed where we have to start from) which could be used to create a whole new virtual architecture.

 

Things become completely different once we get to higher-level programming languages, maybe even Forth.



#67 Stuart ONLINE  

Stuart

    Dragonstomper

  • 692 posts
  • Location:Southampton, UK

Posted Thu Sep 4, 2014 9:43 AM

 

 

Also, I happen to know that there exists at least two different C compilers for the TI--C99 and a TMS9900 extension for GNU GCC.

This one as well: http://atariage.com/...e-ti/?p=2952493



#68 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Thu Sep 4, 2014 9:49 AM

To be honest, I haven't seen anything written with c99 since I joined Atariage. As far as gcc is concerned, besides myself I've seen things done by Tursi (libti99, Mr. Chin, his vgmplayer, ...) and Lucien (Rush Hour, Demons, Nyog'Sothep, ...). All-in-all, I think the C community is still pretty small, but if we keep improving the tools and libraries I'm sure the developers will come :).

 

Also, slowly building out a libc style library might help us eventually bring something like GEOS to the TI as well.



#69 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Thu Sep 4, 2014 9:55 AM

 

Oh, and there is David Pitts' older gcc port as well: http://www.cozx.com/.../tigccinst.html

 

But I think Insomnia's port is the only C compiler that gets any real use these days.



#70 gregallenwarner OFFLINE  

gregallenwarner

    Chopper Commander

  • Topic Starter
  • 181 posts

Posted Thu Sep 4, 2014 10:06 AM

The problem I see with porting the entire libc over to the TI is the limited space. Eventually, you're going to have to start using SAMS bank switching for program code, and as it's already been stated earlier, that's trickier, as you need a sort of trampoline routine just to get in and out of routines that lie in an unmapped bank. So far, my efforts have been limited to implementing only a few certain functions from the libc code in TMS9900 Assembly, as the need arose. It wasn't intended to provide a libc implementation to a C compiler, though, that would be a nice thing to have. As mizapf already stated, the 64K address space is the real killjoy here.

 

Still, at least for my immediate purposes, I think I've learned enough about the SAMS to make limited use of it in my own Assembly programs. It's certainly far from ideal, but I guess if I were trying to do ideal programming, I wouldn't be doing it on a 30+ year old architecture! :-D



#71 RXB OFFLINE  

RXB

    River Patroller

  • 2,727 posts
  • Location:Vancouver, Washington, USA

Posted Thu Sep 4, 2014 10:09 AM

I already have a calling convention/call stack of sorts written for my own personal uses, though I'm not sure how compatible it is with the current C compiler for the TMS9900. I haven't even attempted to use a C compiler with the TI yet, so I'm not sure what the binaries it generates even look like, or how to ensure my assembly routines would be compatible with it. If C programming on the TI is a common enough thing, please let me know, so I can investigate it now, rather than try and shoe-horn in that compatibility later!

 

Also, I happen to know that there exists at least two different C compilers for the TI--C99 and a TMS9900 extension for GNU GCC. Correct me if I'm wrong about those two. In any case, what are people currently using the most? Do either of those compilers already have SAMS support built in?

 

I'd be very interested in writing new assembly routines to expand the standard libraries for either of those compilers, if the demand were great enough.

What about the RAG AMS C compiler package? Or the RAG LINKER or RAG Assembler all set up with the SAMS in mind?



#72 gregallenwarner OFFLINE  

gregallenwarner

    Chopper Commander

  • Topic Starter
  • 181 posts

Posted Thu Sep 4, 2014 10:48 AM

I've heard of the RAG Linker/Assembler, but I've never used them. I'll take a look. Thanks.



#73 Willsy OFFLINE  

Willsy

    River Patroller

  • 3,009 posts
  • Location:Uzbekistan (no, really!)

Posted Thu Sep 4, 2014 10:48 AM

Rich have you got any docs on those programs? I know of their existence but no nothing of the programs themselves.

#74 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 744 posts
  • Location:Belgium

Posted Thu Sep 4, 2014 11:42 AM

The problem I see with porting the entire libc over to the TI is the limited space. Eventually, you're going to have to start using SAMS bank switching for program code, and as it's already been stated earlier, that's trickier, as you need a sort of trampoline routine just to get in and out of routines that lie in an unmapped bank. So far, my efforts have been limited to implementing only a few certain functions from the libc code in TMS9900 Assembly, as the need arose. It wasn't intended to provide a libc implementation to a C compiler, though, that would be a nice thing to have. As mizapf already stated, the 64K address space is the real killjoy here.

 

Still, at least for my immediate purposes, I think I've learned enough about the SAMS to make limited use of it in my own Assembly programs. It's certainly far from ideal, but I guess if I were trying to do ideal programming, I wouldn't be doing it on a 30+ year old architecture! :-D

 

True that!

 

Anyway, didn't want to derail the thread into a discussion about my C wants and needs ;). Although I have to say, as far as C is concerned I think it might be easier to support bank switching for program segments than for data or bss segments. It would take a modification to the compiler, but I can easily imagine a function prologue/epilogue setup that masks the paging for the developer. For data/memory manipulation, Michael is probably right, I see no direct way of fitting that in easily, and definitely not by means of a loadable library.



#75 RXB OFFLINE  

RXB

    River Patroller

  • 2,727 posts
  • Location:Vancouver, Washington, USA

Posted Thu Sep 4, 2014 10:13 PM

Confuses the hell out of me too!

 

I would think XOP is slightly quicker than a BLWP if you want to pass a parameter. So with your BLWP following by a DATA statement above, with the XOP you don't need to MOV your parameter value (from a register or elsewhere) to that DATA location?

I would have to disagree with that Stuart.

In RXB I use a command called CALL EXECUTE(cpu-address)

And it is a just a BLWP @cpu-address

 

So what I do is the CPU Workspace is where all the values are loaded before I call that routine. Thus all registers are pre-loaded and ready to go.

 

That has to be one hell of a lot faster then having to fetch values after the XOP. 8 Registers are loaded and ready to execute before I even do the call.






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users