IntyBASIC compiler v1.0 crunchy&tasty :)

+Tarzilla · March 11, 2015

The Intellivision has 240 bytes of 8-bit RAM at $100 - $1EF, and 112 words of 16-bit RAM at $2F0 - $35F. That's it, if you exclude GRAM.

The 209 and 51 are what's left after IntyBASIC takes its share for internal variables, stack, etc.

You can see what it's allocating into both spaces by looking at the assembler output. In particular, the listing file has a list of symbols and what addresses they got allocated to. You can see what symbols got allocated in the range $2F0 - $35F, for example, to see what got allocated over there.

Or, you could read intybasic_prologue.asm and intybasic_epilogue.asm.

EDIT: Ok, I tried assembling a simple "hello" program with IntyBASIC 1.02. It consisted solely of PRINT AT 0, "HELLO", followed by "x: GOTO x". IntyBASIC reported:

.
8 used 8-bit variables of 212 available
0 used 16-bit variables of 71 available
.

Off the bat, I'm not sure what 8 variables I used, but there's apparently 204 more 8-bit variables and 71 16-bit variables available to me. So, what did IntyBASIC use in RAM? I assembled and asked the assembler to output a symbol list to hello.sym. Here's the portion that falls in $100 - $1EF. These are all in 8-bit memory:

.
00000100 ISRVEC
00000100 SCRATCH
00000102 _int
00000103 _ntsc
00000104 _rand
00000105 _gram_target
00000106 _gram_total
00000107 _gram2_target
00000108 _gram2_total
00000109 _mode_select
0000010A _border_color
0000010B _border_mask
0000010C _SCRATCH
.

The SCRATCH and _SCRATCH labels aren't actually variables. These keep track of where the allocation pointer is in 8-bit memory. So, it looks like the first 12 locations out of 240 are taken, which should leave 228 locations ($10C to $1EF) available. Not sure why fewer are available.

For 16-bit memory, here's what's allocated between $2F0 and $35F:

.
000002F0 STACK
000002F0 SYSTEM
00000308 _SYSTEM
00000323 _scroll_buffer
00000337 _music_table
00000338 _music_start
00000339 _music_p
0000033A _frame
0000033B _read
0000033C _gram_bitmap
0000033D _gram2_bitmap
0000033E _screen
0000033F _color
00000340 _mobs
00000358 _col0
00000359 _col1
0000035A _col2
0000035B _col3
0000035C _col4
0000035D _col5
0000035E _col6
0000035F _col7
.

Here, the SYSTEM and _SYSTEM labels are what IntyBASIC uses to keep track of what 16-bit space it's using. It looks like the stack takes the first 24 words of 16-bit RAM ($2F0 - $307), The rest is apparently at the top of 16-bit RAM:

SCROLL buffer from $323 to $336

Music stuff from $337 to $339

Current frame number at $33A

Current READ pointer at $33B

GRAM loading variables at $33C to $33D

Current PRINT AT and COLOR variables. (_screen and _color, respectively)

MOB shadow at $340 - $357

MOB collision tables at $358 - $35F

If I counted this correctly, that's an additional 61 words over the 24 words of stack, for a total of 85 words used by IntyBASIC. That should leave only 27 words available for 16 bit variables, not 71. And yet IntyBASIC reports 71?

nanochess, can you help us out here?

Are these numbers corrected in IntyBASIC 1.0.3?

As much as we all hate you taking time from finishing the LTO ;-) I enjoy these posts because I'm learning about the debug capabilities of the assembler and other features most people compiling a BASIC programmer would never bother with...or understand ;-)

freewheel · March 11, 2015

That 352 words of 16 bit System Memory includes the 240 words of Backtab (the 20 by 12 grid of characters making up the Intellivision display) and the 112 remaining words that Joe talks about.

AH! I knew I had to be missing something. Thank you sir!

intvnut · March 11, 2015

That 352 words of 16 bit System Memory includes the 240 words of Backtab (the 20 by 12 grid of characters making up the Intellivision display) and the 112 remaining words that Joe talks about.

AH! I knew I had to be missing something. Thank you sir!

I knew I should have mentioned the BACKTAB explicitly. I did think of mentioning it.

Thanks, catsfolly, for clearing it up.

Still could use some info from nanochess on how the variable accounting works.

catsfolly · March 11, 2015

I added an array to Joe's routine:

dim #bigarray(70)

print "hello world"
z1: goto z1

And it resulted in this same allocation:

0x323                           ORG $323,$323,"-RWB"

0x323                   _scroll_buffer: RMB 20  ; Sometimes this is unused
0x337                   _music_table:	RMB 1	; Note table
0x338                   _music_start:	RMB 1	; Start of music
0x339                   _music_p:	RMB 1	; Pointer to music
0x33A                   _frame:         RMB 1   ; Current frame
0x33B                   _read:          RMB 1   ; Pointer to DATA
0x33C                   _gram_bitmap:   RMB 1   ; Bitmap for definition
0x33D                   _gram2_bitmap:  RMB 1   ; Secondary bitmap for definition
0x33E                   _screen:    RMB 1       ; Pointer to current screen position
0x33F                   _color:     RMB 1       ; Current color
0x340                   _mobs:      RMB 3*8     ; MOB buffer
0x358                   _col0:      RMB 1       ; Collision status for MOB0
0x359                   _col1:      RMB 1       ; Collision status for MOB1
0x35A                   _col2:      RMB 1       ; Collision status for MOB2
0x35B                   _col3:      RMB 1       ; Collision status for MOB3
0x35C                   _col4:      RMB 1       ; Collision status for MOB4
0x35D                   _col5:      RMB 1       ; Collision status for MOB5
0x35E                   _col6:      RMB 1       ; Collision status for MOB6
0x35F                   _col7:      RMB 1       ; Collision status for MOB7

But with this at the end:

0x2F0                   SYSTEM:	ORG $2F0, $2F0, "-RWBN"
0x2F0                   STACK:	RMB 24
0x308                   Q1:	RMB 70	; #BIGARRAY
0x34E                   _SYSTEM:	EQU $

So it looks like the memory from 0x323 to 0x34e is being alloocated to two different variables sets.....

Catsfolly

+nanochess · March 11, 2015

If I counted this correctly, that's an additional 61 words over the 24 words of stack, for a total of 85 words used by IntyBASIC. That should leave only 27 words available for 16 bit variables, not 71. And yet IntyBASIC reports 71?

nanochess, can you help us out here?

Are these numbers corrected in IntyBASIC 1.0.3?

Oops! something weird is going in here, let me check.

Edit: So far the count of 8-bit variables and available space is right, except the offset by 8 in used variables. In source code there was an eight instead of a zero for initializing the count.

Edit 2: The count of 16-bits available space is wrong (forgot to substract the 24 words of stack space). The available space depends on if you use SCROLL and/or VOICE.

+nanochess · March 11, 2015

Just I've found a small bug in intybasic_epilogue.asm hindering the drums in PLAY SIMPLE mode.

Change to this the code in _activate_drum:

@@2:    MVI _music_mode,R0
        CMPI #2,R0
        BNE @@3

Of course, every change is being accumulated for next version of IntyBASIC, probably v1.0.4

intvnut · March 12, 2015

Oops! something weird is going in here, let me check.

Edit: So far the count of 8-bit variables and available space is right, except the offset by 8 in used variables. In source code there was an eight instead of a zero for initializing the count.

Edit 2: The count of 16-bits available space is wrong (forgot to substract the 24 words of stack space). The available space depends on if you use SCROLL and/or VOICE.

Ok, the 8-bit count starting at 8 rather than 0 makes sense. But what about 212 vs. 228? Your count of available 8-bit variables seems low by 16. With the simple "Hello" test, I see $10C through $1EF available, which is 228 bytes on my calculator.

On the 16-bit space: I guess you RMB the _scroll buffer whether or not the program uses SCROLL? I haven't tested, but do you decrement the total number of available 16-bit words by 20 when the program uses the SCROLL keyword?

(Personally, I'm still not convinced you need the scroll buffer, although I understand why it's there. I think you can recast the shift-screen-downward scroll as a series of operations that swap with row 0 and eliminate it that way, while still staying ahead of the refresh.)

intvnut · March 12, 2015

(Personally, I'm still not convinced you need the scroll buffer, although I understand why it's there. I think you can recast the shift-screen-downward scroll as a series of operations that swap with row 0 and eliminate it that way, while still staying ahead of the refresh.)

Specifically, I think that particular scroll can be done as:

SWAP(0,1)
SWAP(0,2)
SWAP(0,3)
...
SWAP(0,10)
COPY(0,11)
FILL(0, new_data)

If I'm not mistaken, the rows of the screen go through this progression (drawn horizontally as columns here):

0123456789AB
1023456789AB
2013456789AB
3012456789AB
4012356789AB
...
A0123456789B // this is SWAP(0,10)
A0123456789A // this is COPY(0,11)
x0123456789A // this is FILL(0, new_data)

If I did my math right, the SWAP(x, y) operation costs less than 800 cycles, which is more than fast enough. (There's 914 cycles between row fetches on the STIC, and you have even more than that available if you don't blow your entire vertical retrace on other things.) Swapping with row 0 should be acceptable, as the frame that triggers the scroll should be the one that hides row 0 entirely under the top border extension row.

For pushing the display rightward, you just need to unroll once, and alternate between two registers for the value you're "pushing forward":

.

    ; R4 = start of line to push rightward
    MVI@    R4,     R0   ; col => R0
    MOVR    R4,     R5
    REPEAT  9
    MVI@    R4,     R1   ; col+1 => R1
    MVO@    R0,     R5   ; R0 => col+1
    MVI@    R4,     R0   ; col+2 => R0
    MVO@    R1,     R5   ; R1 => col+2
    ENDR
    MVI@    R4,     R1   ; col+18 => R1
    MVO@    R0,     R5   ; R0 => col+18
    MVO@    R1,     R5   ; R1 => col+19

.

Pushing leftward or upward are just straight unrolled memcpy operations.

So... with a little effort, I think you can eliminate the _scroll_buf.

I haven't looked in awhile... does SCROLL include copies for the 4 diagonals too? You can fit them in this framework, I believe. I'm just asking.

Edited March 12, 2015 by intvnut

intvnut · March 12, 2015

Specifically, I think that particular scroll can be done as:

SWAP(0,1)

SWAP(0,2)

SWAP(0,3)

...

SWAP(0,10)

COPY(0,11)

FILL(0, new_data)

If I'm not mistaken, the rows of the screen go through this progression (drawn horizontally as columns here):

0123456789AB

1023456789AB

2013456789AB

3012456789AB

4012356789AB

...

A0123456789B // this is SWAP(0,10)

A0123456789A // this is COPY(0,11)

x0123456789A // this is FILL(0, new_data)

If I did my math right, the SWAP(x, y) operation costs less than 800 cycles, which is more than fast enough. (There's 914 cycles between row fetches on the STIC, and you have even more than that available if you don't blow your entire vertical retrace on other things.) Swapping with row 0 should be acceptable, as the frame that triggers the scroll should be the one that hides row 0 entirely under the top border extension row.

For pushing the display rightward, you just need to unroll once, and alternate between two registers for the value you're "pushing forward":

.
    ; R4 = start of line to push rightward
    MVI@    R4,     R0   ; col => R0
    MOVR    R4,     R5
    REPEAT  9
    MVI@    R4,     R1   ; col+1 => R1
    MVO@    R0,     R5   ; R0 => col+1
    MVI@    R4,     R0   ; col+2 => R0
    MVO@    R1,     R5   ; R1 => col+2
    ENDR
    MVI@    R4,     R1   ; col+18 => R1
    MVO@    R0,     R5   ; R0 => col+18
    MVO@    R1,     R5   ; R1 => col+19
.

Pushing leftward or upward are just straight unrolled memcpy operations.

So... with a little effort, I think you can eliminate the _scroll_buf.

I haven't looked in awhile... does SCROLL include copies for the 4 diagonals too? You can fit them in this framework, I believe. I'm just asking.

Ok, so I actually went and looked at intybasic_epilogue. (I should have looked at it again before posting, as it's been awhile. I did look at it some time ago.) The rightward shift uses the technique I mentioned above.

The downward shift I think does run quite a bit faster than my cascade of SWAPs. It's a pure time-space tradeoff, really. Mine will take around 2/3rd of a frame to scroll it, while nanochess' will take about 1/3rd a frame, assuming I've counted cycles correctly. Quite a difference, really.

All that said: The bottom row of the display gets discarded in a downward shift. Is there a way to use that row and combine our two techniques somehow? ie. get closer to nanochess' clearly superior cycle counts while keeping the zero-extra-buffering footprint?

Edited March 12, 2015 by intvnut

intvnut · March 12, 2015

Ok, so I actually went and looked at intybasic_epilogue. (I should have looked at it again before posting, as it's been awhile. I did look at it some time ago.) The rightward shift uses the technique I mentioned above.

The downward shift I think does run quite a bit faster than my cascade of SWAPs. It's a pure time-space tradeoff, really. Mine will take around 2/3rd of a frame to scroll it, while nanochess' will take about 1/3rd a frame, assuming I've counted cycles correctly. Quite a difference, really.

All that said: The bottom row of the display gets discarded in a downward shift. Is there a way to use that row and combine our two techniques somehow? ie. get closer to nanochess' clearly superior cycle counts while keeping the zero-extra-buffering footprint?

Here's a thought... and then I need to go to bed.

COPY( 5, 11 )
MOVE( 4 .. 0 to 5 .. 1 ) // moved starting at highest numbered row, working backward
COPY( 11, 0 ) // now that 0 is vacated
MOVE( 10 .. 6 to 11 .. 7 ) // moved in highest to lowest row order
COPY( 0, 6 )
Now you can fill 0 with whatever new data you need

This should be almost as fast as nanochess' approach, requiring one extra copy (copying 11 to 0), which costs ~360 cycles. That is, it adds about 8% to 9% to the cost of what's currently in there, but gives you back 20 16-bit variables.

EDIT: This is what I understand the current IntyBASIC shift-display-down copy to be:

COPY( 5, scroll_buf )
MOVE( 4 .. 0 to 5 .. 1 ) // highest to lowest row order
MOVE( 10 .. 6 to 11.. 7 ) // highest to lowest row order
COPY( scroll_buf, 6 )

In terms of row position vs time (again, showing rows as columns; sb[] means scroll_buf):

0123456789AB // start
0123456789AB, sb[5] // copy 5 to SB
0123446789AB, sb[5] // move 4..0 to 5..1, in high-to-low row order
0123346789AB, sb[5]
0122346789AB, sb[5]
0112346789AB, sb[5]
0012346789AB, sb[5]
0012346789AA, sb[5] // move 10..5 to 11..6 in high-to-low row order
00123467899A, sb[5]
00123467889A, sb[5]
00123467789A, sb[5]
00123466789A, sb[5]
00123456789A, sb[5] // copy SB to 6

Did I understand your scroll algorithm correctly?

Edited March 12, 2015 by intvnut

freewheel · March 12, 2015

Does this mean I need to subtract 24 from my available 16 bit variables? I better ask now as I'll be pushing close to full soon.

+nanochess · March 12, 2015

Ok, the 8-bit count starting at 8 rather than 0 makes sense. But what about 212 vs. 228? Your count of available 8-bit variables seems low by 16. With the simple "Hello" test, I see $10C through $1EF available, which is 228 bytes on my calculator.

On the 16-bit space: I guess you RMB the _scroll buffer whether or not the program uses SCROLL? I haven't tested, but do you decrement the total number of available 16-bit words by 20 when the program uses the SCROLL keyword?

(Personally, I'm still not convinced you need the scroll buffer, although I understand why it's there. I think you can recast the shift-screen-downward scroll as a series of operations that swap with row 0 and eliminate it that way, while still staying ahead of the refresh.)

Oops! another mistake, somehow I thought the extra ECS PSG was mapped in $01e0-$01ef, but it is $00f0-$00ff , so be happy everyone! you have 16 more 8-bit variables than indicated by IntyBASIC

That's right, using SCROLL automatically takes 20 16-bits variables.

If I did my math right, the SWAP(x, y) operation costs less than 800 cycles, which is more than fast enough. (There's 914 cycles between row fetches on the STIC, and you have even more than that available if you don't blow your entire vertical retrace on other things.) Swapping with row 0 should be acceptable, as the frame that triggers the scroll should be the one that hides row 0 entirely under the top border extension row.

For pushing the display rightward, you just need to unroll once, and alternate between two registers for the value you're "pushing forward":

.
    ; R4 = start of line to push rightward
    MVI@    R4,     R0   ; col => R0
    MOVR    R4,     R5
    REPEAT  9
    MVI@    R4,     R1   ; col+1 => R1
    MVO@    R0,     R5   ; R0 => col+1
    MVI@    R4,     R0   ; col+2 => R0
    MVO@    R1,     R5   ; R1 => col+2
    ENDR
    MVI@    R4,     R1   ; col+18 => R1
    MVO@    R0,     R5   ; R0 => col+18
    MVO@    R1,     R5   ; R1 => col+19
.

Pushing leftward or upward are just straight unrolled memcpy operations.

So... with a little effort, I think you can eliminate the _scroll_buf.

I haven't looked in awhile... does SCROLL include copies for the 4 diagonals too? You can fit them in this framework, I believe. I'm just asking.

In fact the code for scrolling is based in some example routines written for you, I remember you giving me permission for this

No, currently no support for diagonal scrolling.

Ok, so I actually went and looked at intybasic_epilogue. (I should have looked at it again before posting, as it's been awhile. I did look at it some time ago.) The rightward shift uses the technique I mentioned above.

The downward shift I think does run quite a bit faster than my cascade of SWAPs. It's a pure time-space tradeoff, really. Mine will take around 2/3rd of a frame to scroll it, while nanochess' will take about 1/3rd a frame, assuming I've counted cycles correctly. Quite a difference, really.

All that said: The bottom row of the display gets discarded in a downward shift. Is there a way to use that row and combine our two techniques somehow? ie. get closer to nanochess' clearly superior cycle counts while keeping the zero-extra-buffering footprint?

I'm trying for the fastest scrolling technique thinking that the IntyBASIC user is using a compiled program, not so efficient as a hand-code machine game.

Here's a thought... and then I need to go to bed.

COPY( 5, 11 )

MOVE( 4 .. 0 to 5 .. 1 ) // moved starting at highest numbered row, working backward

COPY( 11, 0 ) // now that 0 is vacated

MOVE( 10 .. 6 to 11 .. 7 ) // moved in highest to lowest row order

COPY( 0, 6 )

Now you can fill 0 with whatever new data you need

This should be almost as fast as nanochess' approach, requiring one extra copy (copying 11 to 0), which costs ~360 cycles. That is, it adds about 8% to 9% to the cost of what's currently in there, but gives you back 20 16-bit variables.

EDIT: This is what I understand the current IntyBASIC shift-display-down copy to be:

COPY( 5, scroll_buf )

MOVE( 4 .. 0 to 5 .. 1 ) // highest to lowest row order

MOVE( 10 .. 6 to 11.. 7 ) // highest to lowest row order

COPY( scroll_buf, 6 )

In terms of row position vs time (again, showing rows as columns; sb[] means scroll_buf):

0123456789AB // start

0123456789AB, sb[5] // copy 5 to SB

0123446789AB, sb[5] // move 4..0 to 5..1, in high-to-low row order

0123346789AB, sb[5]

0122346789AB, sb[5]

0112346789AB, sb[5]

0012346789AB, sb[5]

0012346789AA, sb[5] // move 10..5 to 11..6 in high-to-low row order

00123467899A, sb[5]

00123467889A, sb[5]

00123467789A, sb[5]

00123466789A, sb[5]

00123456789A, sb[5] // copy SB to 6

Did I understand your scroll algorithm correctly?

Yep, you understood it right

Does this mean I need to subtract 24 from my available 16 bit variables? I better ask now as I'll be pushing close to full soon.

That's right. Sorry for the miscalculation!

You can always check the LST file for variable collision.

freewheel · March 12, 2015

That's right. Sorry for the miscalculation!

You can always check the LST file for variable collision.

But 16 more 8-bits, so it's not a total wash

While sure, I can check other files - I gotta tell you, having console output immediately to remind me of how much RAM I'm using is INVALUABLE. Like you have no idea how much. Instant feedback on any change I make and just a good mental accumulator. So please, if at all possible, have these fixed for 1.0.4 :)

intvnut · March 12, 2015

I'm trying for the fastest scrolling technique thinking that the IntyBASIC user is using a compiled program, not so efficient as a hand-code machine game.

I understand the motivation. I decided to write up a variant of yours that uses row 11 for temporary storage and costs once extra copy.

If I did the cycle counting correctly, mine costs 4760 cycles, while yours costs 4464 cycles. So, it's about 300 cycles more expensive, but it buys back 20 16-bit variables. That might be worth it.

I've attached a modified intybasic_epilogue.asm that has both versions, guarded by an IF / ELSE / ENDI, with cycle counting notations added to both.

intybasic_epilogue.asm

+nanochess · March 12, 2015

I understand the motivation. I decided to write up a variant of yours that uses row 11 for temporary storage and costs once extra copy.

If I did the cycle counting correctly, mine costs 4760 cycles, while yours costs 4464 cycles. So, it's about 300 cycles more expensive, but it buys back 20 16-bit variables. That might be worth it.

I've attached a modified intybasic_epilogue.asm that has both versions, guarded by an IF / ELSE / ENDI, with cycle counting notations added to both.

Interesting! specially the cycle counting.

freewheel · March 12, 2015

If I did the cycle counting correctly, mine costs 4760 cycles, while yours costs 4464 cycles. So, it's about 300 cycles more expensive, but it buys back 20 16-bit variables. That might be worth it.

Thumb in the air says it's worth it to me. 20 16-bit vars is a hell of a lot, especially if someone implements both SCROLL and VOICE in a program.

intvnut · March 12, 2015

Thumb in the air says it's worth it to me. 20 16-bit vars is a hell of a lot, especially if someone implements both SCROLL and VOICE in a program.

Putting it in perspective: 300 cycles is ~2% of a frame time. So, instead of scrolling taking 33% of the available cycles in a frame, it takes 35%, and then only on the frames when it has to shift the whole display down.

freewheel · March 12, 2015

Here's a fun one. I don't think I've run into this exact one before as this is by far the largest game I've done yet. Here was the memory map before the fault:

[mapping]
$0000 - $1AFF = $5000
$1B00 - $59FF = $C100

Adding a few more bytes at the end (ie: the segment after $C100) killed the assembler. I actually got a seg fault in as1600.

Usually when I exceed memory ranges, I've seen everything compile and assemble, and puke in the emulator. But this time I managed to kill the assembler itself. Impressive. Obvious fix, another memory re-map in the hordes of DECLEs.

Maybe I hit this before and am just forgetting. Seg faults usually stick in my brain though.

Edited March 12, 2015 by freeweed

intvnut · March 13, 2015

Here's a fun one. I don't think I've run into this exact one before as this is by far the largest game I've done yet. Here was the memory map before the fault:

[mapping]

$0000 - $1AFF = $5000

$1B00 - $59FF = $C100

Adding a few more bytes at the end (ie: the segment after $C100) killed the assembler. I actually got a seg fault in as1600.

Usually when I exceed memory ranges, I've seen everything compile and assemble, and puke in the emulator. But this time I managed to kill the assembler itself. Impressive. Obvious fix, another memory re-map in the hordes of DECLEs.

Maybe I hit this before and am just forgetting. Seg faults usually stick in my brain though.

Wow. Segfault in AS1600! *blush* I wonder what I b0rked.

If you don't mind sharing with me an ASM file that triggers the segfault, I can look to see what's gone pear-shaped, and get things fixed. I prefer real email to PM, although either is fine, I suppose. (I haven't checked my PM box in awhile, though.) My email address is intvnut@gmail.com . I'll keep whatever you share in complete confidence.

Also, please let me know what platform, and if you happen to know, which version of AS1600.

Edited March 13, 2015 by intvnut

freewheel · March 15, 2015

IntyBASIC feature request: logical shift operators. Or exponent calculations. Or both.

I can't remember how to cheat it, but a=x >> y would be so very, very handy right now. Plus the reverse << operator of course. I'll take a=x/(2^y) and a=x*(2^y) if necessary .

Unless I'm completely braindead and this functionality already exists.

Edited March 15, 2015 by freeweed

intvnut · March 15, 2015

IntyBASIC feature request: logical shift operators. Or exponent calculations. Or both.

I can't remember how to cheat it, but a=x >> y would be so very, very handy right now. Plus the reverse << operator of course. I'll take a=x/(2^y) and a=x*(2^y) if necessary .

Unless I'm completely braindead and this functionality already exists.

When you say "X >> Y", you mean a shift whose amount is determined at run time?

Those are trickier than one might like on the Intellivision, since the CPU doesn't offer variable shifts like that.

My suggestion to nanochess would be to code up optimal left/right shift routines and call them, rather than open-code them, if you want the code to be efficient. Looping over a shift instruction is slow. Also, there's some trickiness with shift instructions being non-interruptible, and so you need to code it carefully to prevent display glitches.

To get the discussion started, I'm attaching a prototype I spent literally 5 minutes coding. It may have errors, and you can probably speed up a few of the shifts by a cycle or two here or there, but I think it's a solid starting point.

Thoughts?

shifts.asm

freewheel · March 15, 2015

Hmm. I've never implemented one on bare metal, so to be honest I have no idea how they work internally. What gave me the idea was this entry from the IntyBASIC manual:

A=A*B Note it does multiplication by repeated addition (can be slow)
Multiplication by 2/4/8/16 is internally enhanced as logical shift

And the same for division. So logical shifts are already implemented for at least 1-4 bits I assume? Sadly I need all 7 - I'm not greedy, I don't need this for 16-bit variables

intvnut · March 15, 2015

When you say "X >> Y", you mean a shift whose amount is determined at run time?

Those are trickier than one might like on the Intellivision, since the CPU doesn't offer variable shifts like that.

My suggestion to nanochess would be to code up optimal left/right shift routines and call them, rather than open-code them, if you want the code to be efficient. Looping over a shift instruction is slow. Also, there's some trickiness with shift instructions being non-interruptible, and so you need to code it carefully to prevent display glitches.

To get the discussion started, I'm attaching a prototype I spent literally 5 minutes coding. It may have errors, and you can probably speed up a few of the shifts by a cycle or two here or there, but I think it's a solid starting point.

Thoughts?

Oh, and in case it wasn't obvious from the code, it seems to me like you'd want 3 different shift operators (logical left shift, logical right shift, arithmetic right shift). The IntyBASIC compiler would generate something along these lines:

.

    MVI V0, R0 ; get value to shift into R0
    MVI V1, R1 ; get shift amount into R1
    ANDI #$F, R1
    ADDI #_shl.tbl, R1
    MVII #L1, R5
    MVI@ R1, PC ; do the shift
L1:

intvnut · March 15, 2015

Hmm. I've never implemented one on bare metal, so to be honest I have no idea how they work internally. What gave me the idea was this entry from the IntyBASIC manual:

A=A*B Note it does multiplication by repeated addition (can be slow)

Multiplication by 2/4/8/16 is internally enhanced as logical shift

And the same for division. So logical shifts are already implemented for at least 1-4 bits I assume? Sadly I need all 7 - I'm not greedy, I don't need this for 16-bit variables

My question is: Dos the shift amount vary at run time, or is it constant? If it's constant, then the A*B method is adequate. Just double up the multiplication if you need a larger range: (A*16)*4. (Although, be careful. (A*16)*16 looks like it may trigger a display glitch. It's on the edge.)

For runtime-variable shift amounts (B not known until run time), you need something different than what multiply gives you.

Which is it?

freewheel · March 15, 2015

My question is: Dos the shift amount vary at run time, or is it constant? If it's constant, then the A*B method is adequate. Just double up the multiplication if you need a larger range: (A*16)*4. (Although, be careful. (A*16)*16 looks like it may trigger a display glitch. It's on the edge.)

For runtime-variable shift amounts (B not known until run time), you need something different than what multiply gives you.

Which is it?

I'm assuming I mean not constant.

Let's say I'm using an 8 bit value to control inventory of things. I want to check an arbitrary item, say item #2. So I'm gonna shift out bit 2 (depending on which way we're counting): X >> 2. But I need to be able to pass "2" as a parameter, because you never know which item I wanna interrogate.

If it's constant, I can just AND &00000010. That's trivial. I'm just trying to avoid a million ANDs and all the associated IF statements to deal with every potential case.

It's "constant" in the sense that it's always going to be the same 7 shifts, I just don't know which I'm going to want at any given time. I suspect that's not what you mean, but gotta be sure

Shifts are just multiplication and division by powers of 2 - so I was just hoping to piggy back onto what's already in place.

IntyBASIC compiler v1.0 crunchy&tasty :)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members