Jump to content
Sign in to follow this  
BillG

Cycle counting question

Recommended Posts

Please reference Chapter 4 of https://ia800307.us.archive.org/10/items/9900MicroprocessorSeriesFamilySystemsDesignDataBook/TexasInstruments9900MicroprocessorSeriesFamilySystemsDesign_text.pdf

 

Starting with page 4-89 and particularly pages 4-91, 4-92, 4-95 and 4-100

 

The following code sequence appears to take 29 machine cycles:

 

    mov     @Var,R0          ; 11 cycles

    ai      R0,2             ; 7 cycles

    mov     R0,@Var          ; 11 cycles

 

While this only takes 9:

 

    inct    @Var             ; 9 cycles


 

Am I correct?  If so, there would appear to be plenty of opportunities for a compiler to optimize code like this.

  • Like 1

Share this post


Link to post
Share on other sites

You are correct.

 

Which compiler is generating the first code?

  • Like 1

Share this post


Link to post
Share on other sites

I guess my cover is blown, so...

 

I have been working on compilers for the 6502 and 680x among other processors.  I just started to attempt code generation for the 9900.

 

It appears that adding a value from 1 to 6 to a variable can be optimized; likewise for subtracting a value from 1 to 6.

  • Like 2

Share this post


Link to post
Share on other sites

Cool.

 

The 9900 has that interesting memory to memory architecture so a lot of ideas that are used for 6502 or 68xx uPs have to be re-thought. :)  (Had a commercial project on 68HC11 years ago) 

You can also apply the INC/DEC instructions to arrays with indexed addressing mode. So pretty powerful instruction set.

 

What language are your compilers for?

 

 

 

 

 

Share this post


Link to post
Share on other sites
5 minutes ago, TheBF said:

What language are your compilers for?

 

 

 

 

 

Python on the 6502

BASIC on the 6502, 6800 and AVR

Pascal on the 6502, 6800, 8080, AVR, 68000 (maybe 9900)

COBOL on the 6800 (started as a joke)

Share this post


Link to post
Share on other sites

Not very big right now.  The only things I have working is DISPLAY and PERFORM.

 

PERFORM is rather complicated if you know COBOL.

Share this post


Link to post
Share on other sites

Only at the syntax level. Never thought about internals.

Why is a COBOL sub-routine call complicated?

Share this post


Link to post
Share on other sites
6 minutes ago, BillG said:

You can PERFORM something THROUGH something else, so they are not like standard subroutines.

 

https://www.ibm.com/support/knowledgecenter/en/SS6SG3_4.2.0/com.ibm.entcobol.doc_4.2/PGandLR/ref/rlpsperf.htm

 

Wow.

Mind numbing complication for a simple idea.

But hey I write in a language that has no syntax so what do I know.  :) 

 

Thanks for the link.

  • Like 1

Share this post


Link to post
Share on other sites
32 minutes ago, BillG said:

Python on the 6502

BASIC on the 6502, 6800 and AVR

Pascal on the 6502, 6800, 8080, AVR, 68000 (maybe 9900)

COBOL on the 6800 (started as a joke)

Very nice collection.  I first read this as "COBOL on the 68000."  I used a COBOL compiler on my Amiga in college.  Worked well enough until I had to start using RM*COSTAR.

  • Like 1

Share this post


Link to post
Share on other sites

A complication: I was somehow under the impression that the 32K RAM card in the PEB was 16-bit memory, but it is 8-bit.

 

The TI documentation says three machine cycles for a memory access.  Now I read about a 4-cycle penalty due to the 8-bit bottleneck.  Four and not three.  So is the total 7 cycles or is there an extra wait state on both bytes for a total of 8?

 

Cycle counting is further complicated by whether the Workspace is in fast or slow RAM.

Share this post


Link to post
Share on other sites
48 minutes ago, BillG said:

A complication: I was somehow under the impression that the 32K RAM card in the PEB was 16-bit memory, but it is 8-bit.

 

The TI documentation says three machine cycles for a memory access.  Now I read about a 4-cycle penalty due to the 8-bit bottleneck.  Four and not three.  So is the total 7 cycles or is there an extra wait state on both bytes for a total of 8?

 

Cycle counting is further complicated by whether the Workspace is in fast or slow RAM.

I guess our cover is blown! 🙂

 

Ya totally sucks doesn't it.  If you are using the TI-99 "O/S" as it were you are best to treat the 256 bytes of 16 bit RAM with kid gloves.  There are about 120 ish bytes at the top that you can use.

For example in most Forth systems the workspace is at >8300. So at least the primary registers are in fast ram.

The reason by the way is that the machine was originally designed for the TMS9995 which was like the 8088.  16 bits internal with an 8 bit buss for cheaper memory parts usage.

The 9995 wasn't ready in time and so a shoehorn was applied for the 9900... and the rest is ... our life now. :)  

 

Share this post


Link to post
Share on other sites
1 hour ago, TheBF said:

Ya totally sucks doesn't it.  If you are using the TI-99 "O/S" as it were you are best to treat the 256 bytes of 16 bit RAM with kid gloves.  There are about 120 ish bytes at the top that you can use.

For example in most Forth systems the workspace is at >8300. So at least the primary registers are in fast ram.

My cross assemblers for most other processors offer an option to display the number of machine cycles each instruction uses.

 

My 9900 one does not attempt doing so due to the complexity of calculating it and that was not even taking 8-bit delays into account.  It is so helpful that I may still attempt it.

  • Like 1

Share this post


Link to post
Share on other sites

The intention was to use the TMS 9985, but that didn't get ready in time. The TMS 9995 came later. When it was ready, the Home computer was already cancelled.

 

The TMS 9900 does a memory access in two cycles. But if it accesses the memory expansion, or rather anything outside the 256 bytes RAM or the 8 K monitor ROM in the console, it accesses that memory byte by byte, adding a wait state for each byte. So instead of two cycles, a memory access is six cycles.

Thus an assembler can't really know the cycle count, since it doesn't know where the code and the workspace is located. The same software will also behave differently, depending on the machine. My main console has 16-bit wide (two cycle per access) memory for the memory expansion, so it runs faster than a standard TI 99/4A.

There are also different additions depending on the addressing mode used, but they can be calculated, as they are consistent.

  • Like 1

Share this post


Link to post
Share on other sites
22 hours ago, BillG said:

Please reference Chapter 4 of https://ia800307.us.archive.org/10/items/9900MicroprocessorSeriesFamilySystemsDesignDataBook/TexasInstruments9900MicroprocessorSeriesFamilySystemsDesign_text.pdf

 

Starting with page 4-89 and particularly pages 4-91, 4-92, 4-95 and 4-100...............................

 

I think you are in the 9940 section. On page 8-23 of that book we find:  "TMS 9900 INSTRUCTION EXECUTION TIMES"

Your jaw will drop when you see the number of clock cycles each instruction uses.

  • Like 2

Share this post


Link to post
Share on other sites
3 hours ago, apersson850 said:

The TMS 9900 does a memory access in two cycles. But if it accesses the memory expansion, or rather anything outside the 256 bytes RAM or the 8 K monitor ROM in the console, it accesses that memory byte by byte, adding a wait state for each byte. So instead of two cycles, a memory access is six cycles.

Thus an assembler can't really know the cycle count, since it doesn't know where the code and the workspace is located. The same software will also behave differently, depending on the machine. My main console has 16-bit wide (two cycle per access) memory for the memory expansion, so it runs faster than a standard TI 99/4A.

There are also different additions depending on the addressing mode used, but they can be calculated, as they are consistent.

A simpler way is to assume the workspace is in fast memory and everything else is in slow memory.

 

Unfortunately, the tables in the manual do not separate workspace accesses from everything else, so I will have to figure those out.

Share this post


Link to post
Share on other sites
3 hours ago, senior_falcon said:

I think you are in the 9940 section. On page 8-23 of that book we find:  "TMS 9900 INSTRUCTION EXECUTION TIMES"

Your jaw will drop when you see the number of clock cycles each instruction uses.

You are right.

 

Jaw dropped...

  • Like 1

Share this post


Link to post
Share on other sites
4 hours ago, BillG said:

Unfortunately, the tables in the manual do not separate workspace accesses from everything else, so I will have to figure those out.

One access for the instruction, one for any immediate, one for any symbolic or indexed address (two if it's a destination, like in A @SOURCE,@DESTINATION). The rest are registers.

Share this post


Link to post
Share on other sites

Consider the case of

    inc     @I

 

The documentation says 10 base cycles, 3 memory accesses and Table A says 8 additional cycles and one additional access for symbolic mode.  The operand is listed as the source instead of the destination.  None of the memory accesses is workspace.

Share this post


Link to post
Share on other sites

That makes sense. Read instruction, read destination, write to destination. Three memory cycles. When the address is symbolic, you need to read the address too, so one more memory cycle for symbolic (and indexed).

Share this post


Link to post
Share on other sites

Even though it is a challenge and I love a challenge, I am getting somewhat discouraged about trying to generate good code for the 9900.

 

Are any of the emulators accurate in counting machine cycles?

Share this post


Link to post
Share on other sites
On 9/13/2020 at 10:38 AM, TheBF said:

What language are your compilers for?

I forgot that I also started PL/M compilers for the 6800 and 6502.

  • Like 1

Share this post


Link to post
Share on other sites
1 minute ago, BillG said:

I forgot that I also started PL/M compilers for the 6800 and 6502.

That sounds ambitious too.

Share this post


Link to post
Share on other sites
1 hour ago, BillG said:

Even though it is a challenge and I love a challenge, I am getting somewhat discouraged about trying to generate good code for the 9900.

Are any of the emulators accurate in counting machine cycles?

MAME precisely executes the operations with all cycles as in the tables. To see the cycles, however, you'll have to set a flag and recompile.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...