-
Content Count
266 -
Joined
-
Last visited
Posts posted by fox
-
-
A and X are where you return the result of your function.
If your asm function is declared void, the values you leave in A and X
don't matter. But if you mix asm and C (as I wrote, better don't do that),
then you should be extremely care about all registers.
You shouldn't use any zeropage location you can imagine ($cb etc).
Instead, you should "importzp" memory locations. I think you can safely
modify 16-bit ptr1..ptr4 and 8-bit tmp1-tmp4 in your asm-only routines.
Moreover, you should not just "RTS" from a C routine. The C code may
need some stack cleanup. Just remove asm("RTS").
-
B PC=FunctionToDebug OR PC=FunctionDoneLabelIf you debug your own code, you can compile in CIM instructions which basically act as breakpoints. For example, use:
dta 2
I guess with Perl (or another scripting language) you could crossreference the exclude list with the labels file and then filter the trace
output accordingly, so yes, that would suffice too.
Exactly. This can be done with a few lines of Perl.
-
I would like to be able to break at the end of an NMI or DLI and get the cycle count for the interrupt, or even a running log of cycle counts.What if another interrupt interrupts your interrupt?
VCOUNT and SP conditions are available in 4.0. You can also set a breakpoint for instructions that write to portb (or any address space area), but you can't set a breakpoint for the value which is written.
Go upgrade to 4.0.

-
One thing I think I'd find useful in development would bea trace-exclude feature, so that code execution going through
know ranges of memory wouldn't be logged, e.g. DLIs.
Linked in with labels exported by your assembler/compiler
this should be straight-forward to put in, though (as with
most debugging features) you'd get a performance hit in
the emulation, but systems tend to be fast enough to cover
this these days.
If you mean that for the TRON command, I don't think it's useful.
Anyway, I recommend you learning basics of the Perl programming language
- you can easily filter any text files with Perl.
Following in from this it would also then be nice to build yourdebugger commands into a script and be able import them.
This would throw away any existing debugger breakpoints etc
and then follow your script.
I don't get it. What script do you mean?
Does anyone else have some ideas to assist their developments?I think A800Win PLus debugger is already too sophisticated.
The old one was good enough for my purposes.
-
Even better, write asm routines in *.s and not *.c files.
See source code of cc65 libraries for lots of examples.
You will also see there how to pass arguments from C to asm - it is completely different from BASIC.
-
It's not impressive at all. This is how normally the multiplication is done on a 6502 (easy and slow way).
The cool way is:
a * b = f(a + b) - f(a - b)
where:
f(x) = x * x / 4 and is of course stored in a precomputed lookup table.
This can be very effectively computed on a 6502, even with signed numbers! (unlike the Russian crap)
I described it in detail and with examples years ago in Polish disk magazine "Syzygy" (issue #5 I think). If someone is interested, I may translate it here. Real-world example is in my portal engine in Numen, where I use both signed and unsigned variations and 16-bit * 16-bit with 32-bit result too. Using the Russian way it would be several times slower.
As for the division I don't know a really fast algorithm. To my knowledge, even in modern CPUs division is much slower than multiplication. Avoid it if you can. If you can't, calculate 1/b once and then multiply a * (1/b) - I use such a technique in this portal engine, too (see Project_calcInverseD for how to quickly compute $2800 / x).
-
Another difference from zlib is how the offsets work.The offsets of Exomizer points to the 'wrong' end of the sequences, the end furthest away. But this has the advantage of supporting sequences that can encode repeating patterns of one or more bytes like this:
(offset 1, length 2345) == rle sequence.
(offset 4, length 16) == four byte pattern repeated 4 times.
It's the same in zlib.
I repeat, can someone do a comparison of compression results: exo vs zlib?
Maybe I can find the uncompressed "Drunk Chessboard" somewhere.
It was 24KB, and 16 KB when compressed with FlashPack. The depack routine was something like 80 bytes long and supported LZ77+RLE+changing destination memory addresses. I think you can't beat its speed.
* Depack routine for FlashPack 2.1 * By Fox of Taquart, 5th May 1997 * Average speed: 30 kB/sec * 4 bytes on page 0 ff equ $fc bt equ $fd ad equ $fe * Do not start from here!!! dep1 tax beq ret lda #$7f dep2 bcc *+3 inx inx sta ad dep3 lda (ad),y put sta $8080,y iny bne dep4 inc ad+1 inc put+2 dep4 dex bne dep3 asl bt bne dep7 asl ff bne dep5 * Routine starts here! start equ * sec jsr get rol @ sta ff dep5 lda #1 bcc dep6 jsr get rol @ dep6 sta bt dep7 jsr get ldx #1 bcc put lsr @ bne dep2 jsr get bcs dep1 tay jsr get sta ad+1 sta put+2 bcc dep7 ! * Set address of packed data here! * (or make your own "get" routine) get lda $ffff inc get+1 bne ret inc get+2 ret rts end of code
-
Negative numbers aren't natively supported in Assembler.Actually they are, maybe just some crappy assemblers don't support negative constants directly. The same problems with signed numbers that you can encounter with assembly language can be seen in C/Java: e.g. you add two positive integers and get a negative result. So understanding the representation of signed integers is good no matter which language you are programming in (except when it's only Atari BASIC).
But, that won't work for multi-byte binary values.Of course it will. Moreover, it's much easier than your "sign & absolute value" method. You add multi-byte signed integers as normal:
lda val1 clc adc val2 sta result lda val1+1 adc val2+1 sta result+1
-
Acually if you use my code with "cpx #$80" and not "cmp #-48",
wrapping problems arise only if the absolute difference exceeds 96,
which is close to the range of signed 8-bit number: -128..127.
So there's not much you can do without widening the input to 16 bits
(problematic).
-
It's even better when you replace "cmp #-48" with "cpx #$80".

-
cursor_add_x .local txa clc adc cursor_x cmp MOD_max_x bcc _set cmp #-48 ; if it's -48..-1 then set to 0 ; I chose -48 because 48+160+48==256 lda #0 bcs _set ; else set to MOD_max_x lda MOD_max_x _set sta cursor_x rts
-
Unlike zlib inflate/deflate, exomizer doesn't do any encoding of uncompressed bytes but it does compress sequences down to the length of one byte to compensate.Do you mean you support one-byte LZ77 sequences?
Another difference compared to zlib inflate/deflate is that the encoding is static and not dynamic. This makes the decruncher simpler and probably faster than zlib deflate.My zlib decompress routine isn't actually that slow, thanks to clever lookup tables.
Actually it's several times faster than most of the other decompressors on Atari.
I've used it in Numen for background decompression while the effects are running.
-
Length of indata: 45533 bytes.
So original = 33,050 bytes, new xex = 22,707 bytes
Was the original 33050 or 45533 bytes long?
Can you "gzip -9" the original file and add 800 bytes for the depack routine,
for a quick comparison?
Or even better, create GZip archive ("Ultra" compression level) using 7-Zip on Windows?
-
The mirroring actually occurs through more then just one page. For example POKEY is enabled when A15=1, A14=1,A13=1,A12=0 and A11=1, the rest of the address bits don't matter. So $E800 will access the same register as $E900, $EA00, etc.Okay, so POKEY is in $E800-$EFFF.
What about the remaining unexplored areas?
-
So, the code that Heaven used is a standard zlib-compatible inflate routine: dynamic Huffman-coded LZ77 with 32KB window (same as in ZIP). As I understand, exo does some kind of Huffman only for offsets and length, not uncompressed bytes, right?
My old FlashPack did only LZ77 2- and 3-byte sequences in 127-byte window
+ RLE and compressed marks of uncompressed bytes (a block of 8 uncompressed bytes is marked with a single bit).
-
The emulator is written in pure assembly language, the main emulator core is completely coded in Thumb and runs from ROM. The CPU emulation is a combination static recompiler & interpretive core -- for those pesky VCS games that run parts of their code out of RAM -- and is coded in ARM assembly running from the internal WRAM of the GBA.Frame rate is a respectable 39fps, and a little more optimisation will get that to a nice 60fps. There is no sound emulation yet
800/XL/XE/5200 are much more complicated than 2600 and you can't reliably use
static recompiler since code is usually in RAM (unlike on 2600).
Coding full 800/XL/XE/5200 in Thumb asm is tedious and I doubt we can reach
100% real speed even at 10fps.
-
I know about these and I'm pretty sure that whole pages are used for mirrored registers. But what's in $C100 - $D3FF, $D500-$E7FF and $E900-$F7FF ?
-
I don't know know what kind of compression exo uses, nor what routine did you use in Boinxx.
-
Could you shortly describe the compression algorithm used?
-
I think GBA is not powerful enough for the Atari800 emulator.
-
I don't have Atari800WinPLus, I have XFORMER 2000XFormer sucks.
As for DOSes, MyDOS and DOS II+ aren't bad.
-
I know that $0000-$3fff is RAM, $4000-$bfff is cartridge, $f800-$ffff is built-in ROM.
What about the remaining area? How is it mapped to chips?
-
Can you tell me how a real 5200 behaves when powered on without a cartridge?
Atari800 emulator (and therefore Atari800Win PLus) shows a rainbow fuji logo and crashes after a while. I don't believe it's what happens on real hardware, but I don't see any code in 5200 BIOS to make it behave differently.
-
Looks like the emulator palettes are based on a PAL paletteThe single reason for this is that the current Atari800 default, jakub.act
was grabbed from PAL machine, just like the previous one, real.act.
If you agree here for a "correct" NTSC palette, I can upload it to the Atari800
emulator repository. Not necessarily one - there may be several NTSC palettes
to choose from.
*.act are plain 768-byte RGB palettes, supported by e.g. Photoshop,
so you should have no problem with creating *.act from a picture
where Atari colors are shown in order (dark on the left, light on the right).
You can then load this *.act file into Atari800 and see how it looks
in different programs.

Russian multiplication
in Atari 5200 / 8-bit Programming
Posted
Fast multiplication
(original article in Polish disk magazine "Syzygy" no. 6)
The problem of multiplication on the 6502 is as old as this processor.
The algorithms that can be found in various asm courses base on bit shifting
and addition, and need about 100 cycles for calculating product of two
8-bit numbers. Such algorithms are not fast at all.
There is an effective solution to this problem: create 2-dimensional array
(multiplication table) that contains all the possible results.
Unfortunately not always one can afford 16 KB memory for this.
What I suggest is an intermediate solution: implement multiplication
using an addition and a function "calculated" using a lookup table.
Consider the following formulas:
(1) a*b = exp( log(a)+log(b) )
(2) a*b = ( cos(x+y)+cos(x-y) )/2
where: x = arc cos (a), y = arc cos (b)
(3) a*b = ( (a+b)^2-a^2-b^2 )/2
(4) a*b = ( (a+b)^2-(a-b)^2 )/4
Which formula to choose? The best would be one:
- simple to calculate,
- true for any a and b,
- that not requires high precision,
- uses a lookup table easy to generate.
These conditions are fulfilled best by the formula (4).
The calculation is very simple:
a*b = f(a+b)-f(a-b)
where: f(x) = (x*x)/4
At first glance, one thinks that you have to store
the fractional part of (x*x)/4 in the lookup table
in order to get an exact result. This is not the case,
because:
1) for even x, x = 2*k :
f(2*k) = k*k
- the result is an integer
2) for odd x, x = 2*k+1
f(2*k+1) = k*k+k+1/4
- the result is an integer + 0.25.
However, if a+b is odd, then a-b is odd too.
The result of f(a+b)-f(a-b) would be therefore same,
no matter if f() includes this 0.25 or not.
Calculating successive squares is easy once you remember
that:
1*1 = 1 = 0+1
2*2 = 4 = 0+1+3
3*3 = 9 = 0+1+3+5
...
That is, you only need to sum successive odd numbers.
The algorithm is general enough to calculate products
of signed numbers.
There is no universal "fastest multiplication" routine,
because universal routines are not fast and vice-versa.
When implementing your own fast multiplication routine
you should consider the following things:
- how big are numbers which you want to multiplicate
- are the numbers signed or not
- where the factors are stored and where the result
should be stored
- what precision you want - for example, you may need
just the high byte of the product and +-1 is acceptable
But to not make this article too theoretical, I present
two routines:
- MULU8 - multiplicates unsigned numbers in A and X registers
- MULS8 - multiplicates signed numbers in A and X
A few words about the routines:
- both routines calculate sums using 6502's indexed addressing mode.
Unfortunately -A needs to calculated separately, which can be done
with EOR #$ff and adding one. Instead of adding 1, we use a separate
lookup table for (x+1)^2.
- for MULU8 one lookup table contains squares of 0..510 and the other
one -255..255
- for MULS8 both tables contain squares of -255..255
- for MULS8 $80 needs to be added to both input numbers, using EOR #$80.
This eliminates the ambiguity that arises when adding signed numbers,
for example:
$60+$60=$c0
-$40+0=$c0
Instead, we have:
$e0+$e0=$1c0 (positive $c0)
$40+$80=$c0 (negative -$40)
If the numbers are small (precisely: abs(A)+abs(X)<128), the EOR #$80
are redundant, but you need to use different lookup tables.
mulu8.asm:
muls8.asm: