The 7's Problem

+TheBF · November 23, 2019

46 minutes ago, unhuman said:

If I remember correctly, I used HCHAR b/c printing with ; puts an extra space in on the TI, unlike other basics.

Thanks for weighing in. Nice to hear from the author.

When I ran it, the output seemed exactly the same as the original. Try it if you have an emulator and let me know if I missed something.

It also takes a fair bit less code. Always a win in TI BASIC.

+TheBF · November 23, 2019

On 11/18/2019 at 10:17 PM, TheBF said:
I have been using Lee's version of this program to evaluate my screen I/O routines. I found a small improvement.

We don't have to make a counted string since TYPE uses a stack string (addr,len) and we have LENGTH.

It takes a few milliseconds off.
: TYPE-A1 ( -- )
\   LENGTH PAD C!   \ store string length
   PAD               \ copy of PAD to start string storage loop
   A1 1- DUP LENGTH + DO  \ DO A1+length-1 to A1
      I C@           \ get next digit
      48 +           \ convert to ASCII
      OVER C!        \ store ASCII digit in PAD
      1+             \ next PAD location
   -1 +LOOP
   DROP              \ clean up stack
   CR PAD LENGTH TYPE CR ;  \ type number

And after glancing at it again I realized we don't have put the chars into PAD at all!

Lee has nicely made the array printable as is, we just have to type from back to front.

: TYPE-A1
    A1 LENGTH 1- +
    0 LENGTH
    DO
       DUP C@ 48 + EMIT
       1-
    -1 +LOOP
    DROP
 ;

And if I make use of my new words to use VDP hardware auto-increment it becomes this and it runs in 1:03

: TYPE-A1
    VPOS VDPWA!
    A1 LENGTH 1- +
    0 LENGTH
    DO
       DUP C@ 48 + VEMIT ?CR
       1-
    -1 +LOOP
    DROP
 ;

+Lee Stewart · November 23, 2019

3 hours ago, unhuman said:

If I remember correctly, I used HCHAR b/c printing with ; puts an extra space in on the TI, unlike other basics.

TI Basic “PRINT ;” does output a space or ‘-’ before and a space after a number. It does not, however, output any spaces for characters or strings.

...lee

unhuman · November 23, 2019

9 minutes ago, Lee Stewart said:

TI Basic “PRINT ;” does output a space or ‘-’ before and a space after a number. It does not, however, output any spaces for characters or strings.

...lee

Wow... I never knew that. I was always annoyed at that extra space. Prevented some things that other basics allowed when I was a kid...

Edited November 23, 2019 by unhuman

unhuman · November 23, 2019

3 hours ago, TheBF said:

Thanks for weighing in. Nice to hear from the author.

When I ran it, the output seemed exactly the same as the original. Try it if you have an emulator and let me know if I missed something.

It also takes a fair bit less code. Always a win in TI BASIC.

Now you gotta compile it!

+TheBF · November 23, 2019

9 hours ago, unhuman said:

Now you gotta compile it!

I will have to learn how first... or write a compiler

+TheBF · September 1, 2020

This topic is rather long winded but what the heck.

I have taken to using Lee's version of the Sevens Problem, with a few "Camel Forth friendly changes, as a general test of my system builds.

Lately I built a "poor-man's just-in-time compiler addition that requires some changes to the source code.

You use the INLINE[ ] directive on Forth primitives, constants, variables and user-variables and they are compiled as inline machine code. Kind of manual but it works.

So I wondered what it would do.

With the inlining, doing the version that only prints the final results, I finally got down near the times that Lucien gets with his MLC compiler. 9.3 seconds versus 16.1 secs without inlining any code.

Nice to know the tool can make a difference.

Spoiler


\ Lee Stewart's mod of Lucien2's code for the sevens problem

\ Speedup mods for CAMEL99 Forth
\ 1. Used VALUES
\ 2. VTYPE for all short strings
\ 3. Used UM/MOD , native division
\ 4. >DIGIT for digit conversion
\ 5. Redefined PAD as static memory
\ 6. Defined 7*
\ 7. Used inline for as many primitives as possible


NEEDS ELAPSE FROM DSK1.ELAPSE
NEEDS VALUE  FROM DSK1.VALUES
NEEDS MALLOC FROM DSK1.MALLOC
NEEDS VTYPE  FROM DSK1.VTYPE
NEEDS INLINE[ FROM DSK1.INLINE

MARKER /SEVENS

DECIMAL
\ ------------------------------------
180 CONSTANT SIZE
SIZE MALLOC CONSTANT A1  \ A1 digit array
SIZE MALLOC CONSTANT PAD

0 VALUE LENGTH    \ current number of digits in result

HEX
CODE 7*   C044 , 0A34 , 6101 , NEXT, ENDCODE

DECIMAL
: A1*7->A1 ( -- ) \ perform 7 * last result
   0              \ initialize carried digit on stack
   1 +TO LENGTH   \ assume we will increase length by 1 digit
   A1 LENGTH BOUNDS
   DO
      INLINE[ I C@  7* +  0 10 UM/MOD ] \ make result ud..unsigned divide by 10
      INLINE[ SWAP I C!  ]   \ store rem as cur digit..carry on stack
   LOOP
   DROP            \ clean up stack
 \ eliminate leading 0
   A1 LENGTH INLINE[ 1- + C@ 0=  ]   \ highest digit = 0?
   IF
      -1 +TO LENGTH  \ correct digit count
   THEN  ;

: A1$ ( -- addr len)
   PAD DUP              \ PAD & COPY for string storage
   A1 1- DUP LENGTH +
   DO
      INLINE[ I C@ >DIGIT   OVER C!  1+ ]
   -1 +LOOP
    DROP             \ clean up stack
   ( PAD) LENGTH ;

: A1$.TYPE ( --)
   [ A1 1- ] LITERAL LENGTH OVER +
   DO
     INLINE[ I C@ >DIGIT  VPUT  VCOL 1+@ C/L@ ]
     >= IF  CR  THEN
   -1 +LOOP ;

: 7COUNTER ( -- ? )  \ Brian Fox's technique
   0                 \ initialize counter
   A1 LENGTH BOUNDS  \ DO A1 to A1 + length
   DO
    INLINE[ 1+ I C@ 7 = AND  DUP 6 = ]
      IF             \ more than '77777'?
         LEAVE       \ yup..we're done
      THEN
   LOOP
;

DECIMAL
: .POWER ( n -- ) S" SEVEN TO THE POWER OF " VTYPE DECIMAL .  S" IS" VTYPE ;

: RUN      \ V2.58 1:26 ,
           \ v2.59 with 8 line scrolling, 1:02
           \ v2.62 with inline v4  54 sec.
   PAGE
   A1 SIZE 0 FILL
   7 A1 C!
   1 TO LENGTH
   2                 \ starting power
   BEGIN
      7COUNTER 5 <
   WHILE
      A1*7->A1
      DUP            \ dup power for display
      CR .POWER
      1+             \ increment power
      CR A1$.TYPE
      CR
   REPEAT
   DROP ;

DECIMAL
: NOSCROLL
   PAGE
   A1 SIZE 0 FILL
   7 A1 C!
   1 TO LENGTH
   2                 \ starting power
   BEGIN
      7COUNTER 5 <
   WHILE
      A1*7->A1
      DUP            \ dup power for display
      0 0 AT-XY .POWER
      1+             \ increment power
\     CR CR A1$ TYPE    \ 39:08
     CR CR A1$ VTYPE    \ 24:50, W/INLINE4 00:15.35
   REPEAT
   0 7 AT-XY ;

DECIMAL
: FASTRUN         ( 16.1 seconds) ( W/INLINE 9.3 seconds)
   PAGE ." Working..."
   A1  SIZE 0 FILL
   PAD SIZE 0 FILL
   7 A1 C!
   1 TO LENGTH
   2                 \ starting power
   BEGIN
     7COUNTER 5 <
   WHILE
      A1*7->A1
      1+             \ increment power
   REPEAT
   1- CR .POWER
   CR A1$ VTYPE
   0 7 AT-XY
;

DECIMAL

CR .( TYPE: ELAPSE RUN, ELAPSE NOSCROLL or ELAPSE FASTRUN ) CR

+TheBF · June 1, 2022

Revisiting old benchmarks to beat up my newest version of Camel99 Forth and I am back on this old chestnut.

It is a good general workout for a language.

Using my last Lee Stewart version that prints the output array directly, in ITC Forth I now get: 59 seconds on the stopwatch.

Can't use the screen timeout timer because we spend too much time with interrupts disabled doing all that VDP I/O.

A lot of the speedup is because I am now using unused memory for a large screen buffer for fast screen scrolling. (what the hell)

In the new DTC Forth we get 56 seconds on the stopwatch.

BillG · July 23, 2022

This is probably years away from my being able to produce actual performance numbers, but I thought I would throw it out there.

Many trips around the sun ago, I joined a makerspace where I experimented with the Arduino and began to learn Python. I thought Arduino is popular and Python is popular. Why can't we program an Arduino in Python?"

I set out to implement a Python compiler. Yes, a compiler. By not requiring an interpreter implementing the entire language, we have a fighting chance to get some programs to run on the microcontroller of the Arduino.

Initially, I had concerns around the 2K bytes of RAM in the Arduino. Because a Commodore 128 was on display at the makerspace available for me to use, I chose the 6502 as the initial target.

Some arm twisting from a friend wanting me to target the 9900 brought me to this forum where I discovered the Sevens problem. Solving it requires:

* multiplying a large number by 7

* converting the number to ASCII decimal

* searching for "777777" in the string

all of which are built into the Python language. Meaning that I can write the critical parts in assembly language as part of the run-time library, something I cannot do in most other languages. Python has the potential to provide the fastest high-level language solution of the Sevens problem, so I thought.

The Python code would be merely be the "glue" tying these parts together.


number = 1

power=0

string=""

while string.find("777777") < 0:

 

    number = number * 7

    power = power + 1

    string = str(number)

 

print('Seven to the power of', power, 'is', string)

When I got to the point where I was able to compile and run this program, disappointment set in. It crawled.

Analysis showed the vast majority of the time was spent in the int function which converted a number to a string representation. I had initially chosen to implement the obvious approach - dividing the number by 10 to extract each digit. The divisions were killing me.

Some research turned up the Double Dabble algorithm to avoid dividing:

https://en.wikipedia.org/wiki/Double_dabble

I finally got around to reimplementing the faster conversion and got a greatly improved time, going from over 22 billion CPU cycles to a little over 90 million. An Apple II would take just over 90 seconds to find the solution.

I do not know enough to even guess how fast something like this would run on the 9900.

https://talk.dallasmakerspace.org/t/project-log-python-on-the-6502-c64-8080-6800-6809-and-avr/18852/363

In parallel, I experimented with the different approach of performing the calculation in BCD to avoid having to do the conversion. My best solution along those lines is using the Borland dialect of Pascal, doing the arithmetic using unpacked BCD stored in a string, marching a pchar pointer through the string for fast "indexing" and using pos to look for six consecutive bytes of binary 7 in the string. It completes almost instantaneously on my relatively slow netbook.

Unless there is a 9900 Pascal compiler supporting pchar, I cannot test this.

+TheBF · July 23, 2022

This is really interesting to me Bill. I have been noodling on how to make number conversion faster as well.

As you probably know the Forth converter called # traditionally converts a double integer and so does two divisions. Even worse.

And a python compiler , even a subset of the language, would be a big hit around here I suspect. I bet it's no mean feat to accomplish.

I am going to look over the double_dabble very closely.

I have found however that because 9900 consumes so many cycles for every instruction it is hard to get big improvements in speed by replacing DIV with other instructions but some improvements are possible.

We shall see.

Thanks!

Edited July 23, 2022 by TheBF
typo

+TheBF · July 24, 2022

Way back in 2011 Lucien posted a version of this program running on ~~Wycove~~ Weiand Forth. I always wondered how Wycove performed. (I will see if I can run it locally for a Wycove timing)

It was his first cut at it using two arrays, but we have a video record of the time on ~~Wycove~~ Weiand Forth.

I was curious if we (and me particularly) are making progress in Forth on TI-99 so I ran the exact same code with a small amount of translation harness off the top for Camel99.

We also have a video record of the time on TurboForth.

Some progress has been made.

EDIT: Ran the program on Wycove 3.0 and posted timing July 25,2022

\ Camel99 DTC 2:51
\ TurboForth  3:04
\ Camel99 ITC 3:16
\ Wycove  3.0 4:10 
\ Weiand      4:58

Using Lee Stewart's Forth solution, Camel99 2.69 ITC runs the the program Un-optimized, with full text output now in 53 seconds.

The calculation only version runs in 15.8 seconds.

Using my latest version of the inline optimizer and direct to video text output, the full text runs in 45 seconds and "calculation only" version runs in 9 seconds.

Original early version code

Spoiler


\ Original Code by Lucien
\ Camel99 DTC 2:51
\ TurboForth  3:04
\ Camel99 ITC 3:16
\ Wycove      4:58


\ CAMEL Forth harness ......

\ don't need these
\ : CREATE2 ( -- )
\  <BUILDS DOES> ;
\ : CELLS ( n -- n )  2 * ;
\ -1 CONSTANT TRUE
\ 0 CONSTANT FALSE

: ENDIF    POSTPONE THEN ;  IMMEDIATE
: CLS      PAGE ;

\ WYCOVE version begins .............

180 CONSTANT SIZE
 VARIABLE V1
 VARIABLE V2
 VARIABLE V3
CREATE A1   SIZE CELLS ALLOT
CREATE A2   SIZE CELLS ALLOT

: A1*7->A2 ( -- )
  0 V1 !
  0 V2 !
  BEGIN
    A1 V1 @ CELLS + @ 7 * V2 @ + DUP
    10 MOD A2 V1 @ CELLS + !
    10 / V2 !
    V1 @ 1+ DUP V1 !
  SIZE < WHILE REPEAT ;

: A2->A1 ( -- )
  SIZE 0 DO A2 I CELLS + @ A1 I CELLS + ! LOOP ;

: TYPE-A2 ( -- )
  0 V1 !
  FALSE V2 !
  -1 SIZE 1- DO
    A2 I CELLS + @ 48 + DUP
    48 = 0= IF TRUE V2 ! ENDIF
    V2 @ IF
      PAD V1 @ 1+ + C!
      V1 @ 1+ V1 !
    ELSE
      DROP
    ENDIF
  -1 +LOOP
  V1 @ PAD C!
  CR PAD COUNT TYPE CR ;

: TEST-A2 ( -- f )
  0 V1 !
  FALSE V2 !
  SIZE 0 DO
    A2 I CELLS + @
    7 = 0= IF
      0 V1 !
    ELSE
      V1 @ 1+ V1 !
    ENDIF
    V1 @ 5 > IF
      TRUE V2 !
    ENDIF
  LOOP
  V2 @ ;

: 7'S-PROBLEM
  CLS
  A1 SIZE CELLS 0 FILL
  A2 SIZE CELLS 0 FILL
  7 A1 !
  2 V3 !
  BEGIN
    A1*7->A2
    CR ." SEVEN TO THE POWER OF " V3 @ . ." IS"
    V3 @ 1+ V3 !
    TYPE-A2
    A2->A1
  TEST-A2 UNTIL ;

lucien2 · July 24, 2022

It was Weiand (named "Fig Forth" on the title screen), not Wycove.

I don't know if they are more or less the same.

BillG · July 24, 2022

22 hours ago, TheBF said:

I am going to look over the double_dabble very closely.

I have found however that because 9900 consumes so many cycles for every instruction it is hard to get big improvements in speed by replacing DIV with other instructions but some improvements are possible.

I should probably warn you that the wiki writeup on Double Dabble is neither very clear or very optimal. It is oriented toward someone writing conversion code in a high-level language.

There are two ways to convert a number from one base to another. The obvious and most intuitive method is to repeatedly divide the number to be converted by the destination radix to extract digits. But it is the slowest due to the many division operations required.

A better way is to take each "digit" of the number from high to low, multiply the accumulated resultant number (initially 0) by the source radix then add in the new digit. Note that these operations must be done in the destination radix.

For example, 9 is 1001 in binary. It can be viewed as

1 x 8 + 0 x 4 + 0 x 2 + 1 x 1

So you take the highest bit of the number to be converted and add to the resultant number (initially 0) yielding 1.

Then multiply it by 2 yielding 2 and add the next highest bit of the number to be converted yielding 2.

And then multiply it by 2 yielding 4 and add the next highest bit of the number to be converted yielding 4.

Finally multiply it by 2 yielding 8 and add the lowest bit yielding 9.

Double Dabble is an implementation of the second approach but shifts the number left instead of multiplying by 2. All of that fiddling about testing groups of bits and adding a correction factor is nothing more than what is known as "decimal adjust" on hardware supporting it. Unfortunately, the 9900 does not have decimal adjust so you will have to do it.

More discussion and implementation for some common microprocessors can be found at:

https://talk.dallasmakerspace.org/t/project-log-python-on-the-6502-c64-8080-6800-6809-and-avr/18852/364

atrax27407 · July 24, 2022

Weiand and Wycove are completelt different applications of the same language (i.e., FORTH).

+TheBF · July 24, 2022

7 hours ago, lucien2 said:

It was Weiand (named "Fig Forth" on the title screen), not Wycove.

I don't know if they are more or less the same.

Thank you for the correction I will update the post.

Reciprocating Bill · July 24, 2022

1 hour ago, atrax27407 said:

Weiand and Wycove are completelt different applications of the same language (i.e., FORTH).

Actually, Wycove Forth descended from Fig Forth. On page 55 of the Wycove Forth 2.0 manual it states, "Wycove Forth is based on FIG Forth, version 1.0, with several minor differences and extensions for the 99/4A environment."

I like Wycove Forth 3.0, which is quite fast, particularly with 16-bit memory installed. For example, that combination runs Lee's version of the SEVENS benchmark (with slight modifications) in ~ 58 seconds, printing everything to screen.

The 7's Problem

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members