bogax
-
Content Count
902 -
Joined
-
Last visited
Posts posted by bogax
-
-
True, but if he really needs the lower 2 bits to be cleared (which might not be the case), it's much faster to AND the value with %11111100 then to do more divisions or shifts.
Sure and I would guess that the 0-15 result is more like what he wants
than the multiples of 4.
I was just pointing out that (at least as I understand it) there will
be a difference.
-
Is there any reason for you not to replace the division by 64 followed by the multiplication by 4 with a division by 16?
I'm not very good at math, so I'll do that. I was thinking along the lines of having the number stay below 255 so it will work, coupled with the fact of what I already knew about bB and variables and doing random stuff with them (which isn't very much.) Thank you.
I'm not very familiar with batari Basic, but as I understand
it will just do 8 bits unless you tell it otherwise.
The fractional bits will get truncated.
So (rand/64)*4 will produce a multiple of 4, ie one of 12,8,4 or 0
rand/16 will produce a number 0 to 15 (inclusive)
-
Oh boy.
Okay, I seem to remember a trick for this that was first done in Combat. I think it went like this, IIRC.
1. Have the scanline you want the missile to appear on stored in ram.
2. Compare the scanline you are on the the scanline the missile is on.
3. Push the processor status onto the enable register. The **trick** (why this works) is the Zero Flag is set when the compare is equal to the scanline. Since the Missile Enable and Zero Flag occupy the same bit xxxx xxXx the processor status can be used to enable/disable the missle.
7-8 cycles though?
;stack pointer has to be adjusted before hand, it needs to point to the enable register (ie ENAM1, ENAM2, etc..) so you need to find more cycles before it happens.
CPY scanline ;3
PHP ;4 (already at 7 cycles)
however, you need to readjust the stack register back each time you do this! So you again you need to find more cycles.

The simple way to adjust the stack pointer is to waste 4 more cycles (when you can) with a PLA. This will trash the accumulator, so do it when you are not needing, or about to reload the accumulator.
i.e.
PLA ;4 (reset stack to the missile, but trashes the accumulator)
LDA #whatever (ahhhh, now it doesn't matter.)
The minimum needed for this method is 11 cycles (split into 7 cycles, and 4 cycles), plus the stack pointer adjusted before hand (more cycles), plus a ram register (to compare the scanline to), plus the stack pointer free (i.e. you are not jumping to a subroutine during the kernel, or using the stack pointer for other things.)
PHP is three cycles
Assuming you were willing to use self modifying code and reserve the X register
to the pupose you could
TSX
CPY #
PHP
-
Currently I have 5 different animations, and only 152 bytes of ROM left in bank 1. If I try to add another "if clearstyle" line above, and it's data statement, I'll be -138 bytes or so.
I'm not sure how to interpret that but it sounds like you're saying that an if-then-assignment statement plus 25 data bytes takes 152 + 138 bytes. that sounds like too much.
.
.
if clearstyle = 6 then block = next_block6[z]
That code above adds a 6th animation.
and..this would be the data statement for it:
data next_block6 0,1,2,3,4,9,14,19,24,23,22,21,20,15,10,5,6,7,8,13,18,17,16,11,12 end
Adding those two lines of code above causes me to be -138 bytes,
It sounds like you're saying that after adding the 5
data blocks and their correspnding
"if clearstyle = then block = next_block6[z]"
statements you have 152 bytes left.
and that adding another block of 25 bytes and another
if-then-assignment uses up that 152 bytes and wants
another 138 bytes so that the if-then-assignment
plus the 25 data bytes adds up to 152+138 = 290 bytes.
I'd expect that the if-then statement might use
maybe 8 (or so) bytes and an assignment and table look
up maybe 10 bytes. so that adding a 6th block your way
would cost maybe 60-70 bytes (or something like that),
much less than 290 bytes
(note that the way I was suggesting should only cost
25 + 2 bytes for each additional block)
I would expect that the changes I suggested (without
adding a 6th data block) would save you 50-60 bytes
or something like that.
But like I said I'm not familiar with batari Basic
so maybe there's a lot more going on than I expect
(or maybe I'm completely missunderstanding what you're
saying)
(if you do try my suggestion I'd be curious as to how
it works out)
-
I have the following code..
levelcomplete if clearstyle = 6 then clearstyle = 1 . .
Since that looks like you're wrapping clearstyle at the number
of blocks, then maybe you're incrementing clearstyle.
If so, you could build the increment into a look up table
So incrementing clearstyle becomes:
clearstyle = cs_lut[clearstyle] data cs_lut 1,2,3,4,5,1 end
and do away with the if-then wrapping altogther.
Of course. you have to initialize clear style.
-
Currently I have 5 different animations, and only 152 bytes of ROM left in bank 1. If I try to add another "if clearstyle" line above, and it's data statement, I'll be -138 bytes or so.
I'm not sure how to interpret that but it sounds like
you're saying that an if-then-assignment statement plus
25 data bytes takes 152 + 138 bytes. that sounds like
too much.
Or maybe you mean it requires 138 bytes, but that still
sounds like a lot.
Maybe a lookup table for clearstyle would help.
(I RTFM a little so maybe this is closer to what
batari Basic wants)
Something like:
levelcomplete clearstyle = cs_wrap[clearstyle] temp1 = row_offset[clearstyle} temp2 = temp1 + 24 for z = temp1 to temp2 block = next_block1[z] blockcolor[block] = pink for y = 0 to 3 gosub my_drawscreen bank2 cursorcolor = brown next blockcolor[block] = blue next data cs_wrap 1,1,2,3,4,5,1 end data row_offset 0,0,25,50,75,100 end data next_block1 0,1,2,3,4,9,14,19,24,23,22,21,20,15,10,5,6,7,8,13,18,17,16,11,12 0,2,4,6,8,10,12,14,16,18,20,22,24,1,3,5,7,9,11,13,15,17,19,21,23 24,23,22,21,20,15,16,17,18,19,14,13,12,11,10,5,6,7,8,9,4,3,2,1,0 4,3,9,14,8,2,1,7,13,19,24,18,12,6,0,5,11,17,23,22,16,10,15,21,20 0,5,10,15,20,21,16,11,6,1,2,7,12,17,22,23,18,13,8,3,4,9,14,19,24 end -
I'm not familiar with batari BASIC so I don't know
how you'd have to do this or if it would save you
any thing.
Put all the data in a two dimensional array and
make clearstyle the row address (or derive the
row address from clearstyle)
Something like this:
levelcomplete if clearstyle = 125 then clearstyle = 0 for z = clearstyle to clearstyle + 24 block = next_block1[z] blockcolor[block] = pink for y = 0 to 3 gosub my_drawscreen bank2 cursorcolor = brown next blockcolor[block] = blue next data next_block1 0,1,2,3,4,9,14,19,24,23,22,21,20,15,10,5,6,7,8,13,18,17,16,11,12 0,2,4,6,8,10,12,14,16,18,20,22,24,1,3,5,7,9,11,13,15,17,19,21,23 24,23,22,21,20,15,16,17,18,19,14,13,12,11,10,5,6,7,8,9,4,3,2,1,0 4,3,9,14,8,2,1,7,13,19,24,18,12,6,0,5,11,17,23,22,16,10,15,21,20 0,5,10,15,20,21,16,11,6,1,2,7,12,17,22,23,18,13,8,3,4,9,14,19,24 end -
With the first method-- using a single rand value to get temp1 and temp2-- rand would have to be 1 for temp1 to be 0, and rand would have to be between 1 and 7 for temp2 to be 0, so for val to be 0, rand would have to be 1, hence the probability would be 1 out of 255. I guess you would take the smaller probability.
.
.
If that's right, then for val=158 the probability would be 2 out of 255 the first way, or 16 out of 65025 the second way.
Michael
In the case of using a single rand it's just a hard coded multiplication
by .625 which scales 255 to 159.375, close to 160
Obviously you can't map 255 values to 160 values with out collisions.
In this case (approximately) every second value comes up
twice and the rest once for all 255 original values.
If you divide the 0-160 range in to groups of 2
consecutive values then any group is equally likely
(again, approximately).
If it were me, for this application, I'd just leave it
at that.
Does it really matter if it comes out to position 5 twice
in 255 and position 4 once in 255? (or that 160 is left
out)
I don't know how batari BASIC works.
A more serious problem IMO is if the fractional parts of
the partial products are truncated.
To get a nicer distribution you realy need 16 bits both
in the math and the random value. (but especially the math)
-
Does anyone know where I can obtain detailed instruction timings for the 6502 instructions? I don't just want the total number of cycles that each instruction takes (which I know), but what the 6502 does on each cycle. For example, here is a breakdown of the BRK instruction:
cycle addr data -- ---- ---- 1 PC 00 ; read BRK opcode 2 PC+1 ?? ; read padding byte (ignored) 3 S PCH ; store high byte of PC 4 S-1 PCL ; store low byte of PC 5 S-2 P ; store status flags with B flag set 6 FFFE ?? ; low byte of target address 7 FFFF ?? ; high byte of target address
I'm looking for this level of detail for the JSR, RTS, and the stack manipulation instructions, but it would be nice to know them all for future reference?
Chris
-
Not sure I understand this correctly.
but how about
BEQ SKIP EOR $80 SKIP EOR $80
looks like it would cost some cycles? but might save you some code.
but maybe you can work that saved code into some saved cycles
(fewer jumps or something)
-
Thanks bogax, but kept the numbers in the same order and it works.Yes, in this case it makes no difference
I was just pointing out that my loop doesn't work quite the same as yours
My loop works backwards through the data from higher addresses to lower addresses
and yours works forward from lower addresses to higher addresses
It's also possible to work forward though the data and still use the flags as set by
INY the code is just not quite as straight forward (as in easy to immediately understand)
I just thought it was more clear to do it the way I did
to separate the numbers together, to take just one, I have to subtract (SBC), but as I get another?Assuming you are still talking about seperating the digits, the code you were already
given shoul work just fine (as far as I can see, if it didn't work, you must have done
something wrong)
The BCD byte is still just collection of bits the ADC/SBC instructions just treat them differently
AND and LSR don't treat them diferently
If you clear the decimal mode they'll be treated as straight binary by ADC/SBC
-
No need to worry about the extended hex digits A-F, because decimal mode ignores them.That's a bit of a misstatement. Decimal mode doesn't ignore A-F it fixes them
provided you're using BCD. but if (for example) you ADC immediate a non BCD
number and it may get it wrong (or it may not, which can be useful for binary to BCD
conversion)
Instructions other than ADC/SBC can result in non BCD values
Well then the BCD, and to add $ 04 + $ 04 + $ 02 = $ 10 then put the code if someone helps.For future reference, instructions that set the value of A, X or Y and read-modify-write
instructions (INC/DEC memory) all set the Z and N flags
So, though there's nothing wrong with your code, it might be more usuall to write
it something like this:
LDY #$03 LDA #$00 SED CLC JSR SUM SUM ADC DATA-1,Y DEY ; sets the Z flag BNE SUM RTS
note it adds the numbers in reverse order now
-
Just for the hell of it, here's a couple that do 256 byte tables.
Not as neat, I couldn't think of an elegant way to use y for
the counter.
They're basically the same, one using selfmodifying code.
ldx #$3F lda #$00 tay sta cntr_lo sta cntr_hi sta acc_lo clc bcc ENTER_LOOP LOOP lda cntr_lo adc #$10 sta cntr_lo lda cntr_hi adc #$00 sta cntr_hi lda acc_lo adc cntr_lo sta acc_lo lda sin+191,y adc cntr_hi ENTER_LOOP sta sin+192,y sta sin+128,x eor #$FF sta sin+64,y sta sin,x iny dex bpl LOOP
ldx #$3F lda #$00 tay clc bcc ENTER_LOOP LOOP lda #$00 ;cntr_lo adc #$10 sta LOOP+1 ;sta cntr_lo bcc SKIP inc CNTR_HI+1 SKIP lda #$00 ;acc_lo adc LOOP+1 ;add cntr_lo sta SKIP+1 ;sta acc_lo lda sin+191,y CNTR_HI adc #$00 ;cntr_hi ENTER_LOOP sta sin+192,y sta sin+128,x eor #$FF sta sin+64,y sta sin,x iny dex bpl LOOP
-
well... and with this version is there a chance for a fixed point version? f.e. 8.8?As in 16 bits total with 8 integral and 8 fractional bits ?
Not sure there's any point, a parabola is just not that close to a sine.
It only gives you something like 4 bits of accuracy in the worst case
I think the fractional part would be nonsense except for just a few table
entries
I did 256 bytes per quadrant because it looked to me like that was what the
Z80 code was doing and because it fit neatly with using the y register as
the counter.
What exactly do you want to end up with? Four quadrants in 256 entries in 8.8 format?
I might also mention that the code I posted does not result in 2's complement
althought that would be easy enough to do.
-
No one else has responded so I'll take a whack.
I can't read that Z80 gibberish so I'm not really sure what it's doing

You can generate squares by accumulating a constantly varying difference.
That is, the difference between consecutive squares goes up linearly.
So to create a parabola you increment a counter and accumulate the
the count as you go.
This code is just off the top of my head and not really tested so take it with a grain of salt.
y is used for the counter (and table pointer)
x is used as a table pointer to mirror
eor #$FF flips it around the horizontal axis
sin0 is the first quadrant, sin1 the second etc
The count is accumulated with the low byte in lo and the high byte in a
and/or the table
ldx #$FF lda #$00 sta lo tay clc bcc ENTER_LOOP LOOP tya adc lo sta lo lda #$00 ADC sin3-1,y ENTER_LOOP sta sin3,y sta sin2,x eor #$FF sta sin1,y sta sin0,x dex iny bne LOOP
(Edited to make it look a lot more like a sine and a lot less like an inverted cosine
) -
The problem here is that the levels are generated randomly (along with the secrets) so I do not know in advance what the maximum number will be for a level. One time it might be 26, the next time 38, and so on... in this case a hard-coded look-up table won't work (or at least, could get very big).Maybe storing the reciprocal in a look-up table for each possible maximum, then using that in an adding loop?
i.e. 1/2,1/3,1/4 ... 1/50
Of course, that probably the reciprocal probably requires 2 bytes in order to prevent too much round-off error....
I'm kinda curious what other constraints you have and how much accuracy you need
My approach would be similar to your suggestion (or groovybee's) but I think the thing
to do (if you can) would to generate 100/t when t is determined and then accumulate
that as secrets are found.
I think you can fit enough accuracy into a byte with scaling but it would be time
consuming (but probably not quite so bad as a full blown division)
On the other hand if you've got enough time when you generate t
and if you already have or could use a GP division why not use that?
Also you might accumulate to 200% then divide by 2 and round for accuracy
The scaling would look something like this (untested):
ldy t sty temp lda #$00 sta percent_of_one_secret_lo lda percent_table,y bne ENTER_LOOP LOOP lsr ror percent_of_one_secret_lo ENTER_LOOP lsr temp bne LOOP sta percent_of_one_secret_hi
I think you could use a small table and interpolate,
but you'd end up spending 50 cycles or something
and only save a few bytes.
Then when you find a secret you add percent_of_one_secret
to what ever the current percentage is
(I think groovybee's get's it wrong by a couple of
percent in some cases but I'm not sure I'm reading
his code correctly)
-
A simple shifting approach is more efficient in terms of size (and in many cases, cycles) than any other routine I saw on that thread.Yes
I just think it's a clever hack (OK, so it's not a killer hack..)
I presume it was originally in C
-
As I was growing up, I kept a notebook full of cool code snippets and ideas. My notebook had been misplaced but I ran across it recently and here is one of the pages which is from a 1987 Dr. Dobbs article by Mark S. Ackerman. "6502 Killer Hacks".Post your own 6502 Killer Hacks and share them with the rest of us!
.
.
.
Well here is the killer hack. This one is to scrimp on RAM.
Incrementing only the lower 4 bits of a byte (with wrap)
.
.
.
- David
Just joined these forums so sorry if I'm a little late to this party

Here's a couple of my favorites
First the counter
eor something with its self you get 0
eor something with 0 you get its self
lda counter inc counter eor counter and #$F0 eor counter sta counter
Of course you can insert bits from one byte into another
byte (not just from a changed version of itself)
Used eg for setting pixels
=========
Parity is just an xoring of bits
A simple sum is just an xoring of bits
0+0=0
0+1=1
1+0=1
1+1=0
Disregarding the carry obviously
Carry is a way of propagating bits across a byte (sort of)
000a +0111 =a???
We can combine the two to get parity and collect "bits" across a byte
;parity of A sta temp asl eor temp and #b10101010 adc #b01100110 and #b10001000 adc #b01111000 ;now the parity is in the sign bit
=========
Already posted this to a different thread
Rotate two bits left through the carry
asl adc #$80 rol
Do it twice to swap nibbles
============
Kernigans method for counting set bits in a byte
This code lifted directly from dclxvi in the 6502.org
programming forum
http://forum.6502.org/viewtopic.php?p=6993...highlight=#6993
TAX BEQ L2 LDX #0 SEC L1 INX STA SCRATCH SBC #1 AND SCRATCH BNE L1 TXA L2 RTS
-
OK, back to divide by seven
Again, untested (yes, I should set up to do that)
sta temp
lsr
lsr
lsr
adc temp
ror
lsr
cmp #$3F
adc #$00
lsr
-
Your routine will work if you replace the 2 LSR's with ROR's here:sta temp
lsr
sec
adc temp
sta temp
lsr < ror
lsr
lsr
lsr
adc temp
lsr < ror
lsr
lsr
lsr
ACK!
Yes, right, those were supposed to be ror's
-
I edited that third routine.
It will still fail at 228 and there's no good way to fix that
with it rotating left instead of right (that I can think of at least)
Oh well, it was only ment to be an illustration, I guess that's
as good an illustration as any
Edit: Also is there a way to go faster (and maybe save more bytes) then the previous /5 I did? If we can find a faster /5 then we can get a faster /10 too.
I think with division into eight bits the simple straight forward way
is going to be the best we can do
here's a division by 10 similar to your division by 15
UNTESTED!
sta temp lsr sec adc temp sta temp lsr lsr lsr lsr adc temp lsr lsr lsr lsr
I expect, if you care to test it, you'll get around to that before I do
hmm wonder it something similar can be done for division by 7
-
Bogax, were you just showning the reciprocal extension, or dividing by 7? I tried the first routine and it failed from numbers 228 up. At 228 they started back at zero. The second routine broke very early. It gave me a one when dividing 6 by seven. I didn't check anymore after that.My bad
I should have said the code was untested and just for illustration
I was trying to point out that most of these routines (obviously not the LUTs)
are multiplication by reciprocals and that the reciprocals are repeating decimals (binimals?
) and the fact that they're repeating could probably be used to advantage.
The first routine is meant to be essentially the divide by 7 routine previously
posted by djmips except that I have it with the dividend starting out in A
instead of in memory (which I also failed to mention).
I figured it was so simple and straight forward it would be good for illustration
Sounds like you used an lsr where you should have an ror?
The second and third routines will fail for values over 227. I believe that
could be fixed by moving the last sta and adc down one shift so you get
the carry into A before you save it to temp like so:
sta temp lsr lsr lsr adc temp ror sta temp lsr lsr lsr lsr lsr lsr adc temp ror lsr
hmm
dividing 6 by 7 should be like this
sta temp 00000110. 00000110 lsr 00000011.0 lsr 00000001.1 lsr 00000000.1 adc temp 00000111.0 sta temp 00000111.0 00000111 ror 00000011.1 lsr 00000001.1 lsr 00000000.1 lsr 00000000.0 lsr 00000000.0 lsr 00000000.0 adc temp 00000111.0 ror 00000011.1 lsr 00000001.1 lsr 00000000.1
For the third routine I'll have to look at it again, I might well have screwed
something up
-
Wonder if you could impliment a similar process for other numbers too. (still preserving the x and y)Sure it's basically just reciprocal multiplication
divide by 7 1/7 = .001001001001001...
sta temp 1. lsr .1 lsr .01 lsr .001 adc temp 1.001 leave the carry alone to round ror .1001 lsr .01001 lsr .001001 adc temp 1.001001 ror .1001001 lsr .01001001 lsr .001001001
what I find intriguing is that, since they're all repeating decimals,
you can (sort of) accumulate accuracy
sta temp 1. lsr .1 lsr .01 lsr .001 adc temp 1.001 sta temp 1.001 ror .1001 lsr .01001 lsr .001001 lsr .0001001 < I've skipped an addition here lsr .00001001 lsr .000001001 adc temp 1.001001001 ror .1001001001 lsr .01001001001 lsr .001001001001
obviously doesn't help any in the case of dividing an 8 bit number by 7
but I've gotten to the point where it's less work to rol
sta temp xxxxxxxx 1. lsr xxxxxxxx .1 lsr 0xxxxxxx .01 lsr 00xxxxxx .001 adc temp xxxxxxxx 1.001 sta temp xxxxxxxx 1.001 rol jjjjjjjx .00000001001 I've (in effect) jumped a byte to the right rol jjjjjjxx .0000001001 the j's are junk that would have been rol jjjjjxxx .000001001 shifted out now I have to mask them off and #$F8 00000xxx .000001001 still slightly faster adc temp xxxxxxxx 1.001001001 ror xxxxxxxx .1001001001 lsr 0xxxxxxx .01001001001 lsr 00xxxxxx .001001001001
OK, that doesn't help much either but it might if you were dividing more than
8 bits by a constant and in the cases where the repeating part is some integral
fraction of a byte as is the case for eg division by 10 or 15 I could acummulate
a full bytes worth and not have do any actual shifting at all
But I don't know if it's likely to be useful for any case of an 8 bit dividend.
(or usefull at all for that matter, but I'm thinking 16 bit binary to decimal)
For example Omegamatrix managed to do a pretty good job on division by 15 with
just some rounding.
Edit:changed some lsr's to the rol's they should have been (second and third rol)
-
How'd you come up with that?Someone posted a question about bit twiddling in one of the 6502.org forums
that was part of the answer
some interesting stuff there (one of the interesting things is a pointer in one
of the forums to this forum
)That ranks right up there with the following routine someone came up with (on another platform, but adaptable to 6502):; Magical mystery routine (hint: use with acc = 0-15) sed adc #$90 adc #$40 cld
Anyone ever seen anything like that one before?
I believe I saw that posted to comp.sys.cbm some years ago, don't remember where
it was said to have come from.
Don't you need a clc in there?

Math help
in batari Basic
Posted · Edited by bogax
Assuming you mean 7-25 including 7 and 25 in the range
and assuming you don't mind it not being too random,
(and assuming I'm understanding the batari Basic manual
correctly) I think this will do it
or maybe you can lose some of the assignments
The order the math is done in (and hence the parenthesis)
is important to preserve accuracy (like the difference
between the 0-15 range and the multiples of 4).
Again, I'm not very failiar with batari Basic.
It's basically multiplying by 1/16 + 1/128 + 1/256 = .07421875
which is close to but less than 19/255 to get a range of 0-18
and then adding 7
The multiplication is done in such a way as to preserve as much
accuracy as possible in the partial products without over flowing.
eg rand/2 is the 1/16 part but multiplied by 8 which is reduced
to 1/16 by the final division by 8
etc.
any way good luck with it
(I haven't actuall tryed it..
)