mariuszw
-
Content Count
179 -
Joined
-
Last visited
-
Days Won
4
Posts posted by mariuszw
-
-
Batman can not be unless he starts studying Z80 assembly, it wasn't available for C64 and only for ZX Spectrum and Amstrad CPC.
I already know Z80 assembly from working on MAME project

There are still two hires isometric games on C64: Bobby Bearing and Nosferatu. I was actually looking at Bobby Bearing and for now it looks that this one should be also quite easy to port to Atari.
-
1
-
-
(...). I wish there were more of these isometric games on the Atari.
That's quite likely to happen

-
6
-
-
Input > Keyboard > Send raw key strokes
Thanks, that did the trick.
BTW: thanks for your awesome Altirra emulator. Makes Atari programming and code optimizing a joy

-
2
-
-
Here is next version of Fairlight port. New features:
- screens are colored
- keyboard is working (although there are issues under Altirra, it seems that not all keypresses are properly recognized, I am not sure if it is my bug - please test on real Atari and let me know, game keys are listed on title screen, please do not shift key, it doesn't work)
- code optimization - game seems to be up to two times faster (or less slow) than C64 version, but still slows down when there is a lots happening on screen. Screen drawing which happens when hero enters the room is also faster.
Mariusz.
-
4
-
-
Great start!
It's not like Spectrum version but at least it's not slower than C64 version

No chance for code to be slower than C64 - 1,79MHz vs. 1MHz makes the difference

The code really looks to be converted line-by-line from Z80 to 6502, so I expect it may be quite well optimized later. I also wonder why original programmer didn't use hardware sprites on C64 - they would fit nicely in the game, making less work for the programmer (no need to convert software-sprite routines) and the game speed would be really good.
BTW: is there an easy way to estimate real clock of Atari during executing given software, i.e. how much CPU time (expressed in MHx) is stolen by ANTIC? Here in Fairlight narrow screen is used (32 bytes), but in character mode. There are also 24 character lines displayed (192 pixel lines).
-
1
-
-
Here is my next C64 to Atari 8-bit port: Fairlight: A Prelude.
For now only joystick is implemented, so you can walk around and fight the guards, but you cannot take and use items etc.
It should work on real Atari, however I tested it under Altirra only.
Enjoy!
-
18
-
-
Hi!,
If I understand the code correctly, you are moving the framebuffer 1 byte to the left, by copying the area from $CF41 to $CF40, of length $CC0. As you unrolled the loop, you use self-modifying code to change the 8 addresses.
Is this Ok?
Well, when you unroll loops on the 6502, it is faster to unroll row-wise instead of column-wise, this is an example:
loop: lda $CF41,x sta $CF40,x lda $D041,x sta $D040,x lda $D141,x sta $D140,x .... lda $DB41,x sta $DB40,x inx bne loop
As you see, this is much faster because you don't need to increment X on each copy, only once per loop.*BUT*
There is a problem with this example. You can not move in parallel because there is a data dependency from one copy to the next!
But I suspect something: you are moving a rectangular region to the left, so you don't have a data dependency from onw row to the next. As each row has 192 columns, the dependency is broken each 24 bytes. You then need to move in multiples of 24 bytes.
So, this should work (SCR is address of screen data, $CF40 in your example):
# First, copy 240 * 13 = 3120 bytes ldx #(256-240) loopBig: lda SCR-(256-240)+1,x sta SCR-(256-240),x lda SCR-(256-240)+240+1,x sta SCR-(256-240)+240,x lda SCR-(256-240)+2*240+1,x sta SCR-(256-240)+2*240,x lda SCR-(256-240)+3*240+1,x sta SCR-(256-240)+3*240,x lda SCR-(256-240)+4*240+1,x sta SCR-(256-240)+4*240,x lda SCR-(256-240)+5*240+1,x sta SCR-(256-240)+5*240,x lda SCR-(256-240)+6*240+1,x sta SCR-(256-240)+6*240,x lda SCR-(256-240)+7*240+1,x sta SCR-(256-240)+7*240,x lda SCR-(256-240)+8*240+1,x sta SCR-(256-240)+8*240,x lda SCR-(256-240)+9*240+1,x sta SCR-(256-240)+9*240,x lda SCR-(256-240)+10*240+1,x sta SCR-(256-240)+10*240,x lda SCR-(256-240)+11*240+1,x sta SCR-(256-240)+11*240,x lda SCR-(256-240)+12*240+1,x sta SCR-(256-240)+12*240,x inx bne loopBig # Now, copy the remaining 144 bytes: ldx #(256-144) loopSmall: lda SCR-(256-144)+13*240+1,x sta SCR-(256-144)+13*240,x inx bne loopSmall
Assuming that half the time the "LDA ABS,X" takes 5 cycles, you have 240*((4.5+5)*13+2+5)+144*(4.5+5+2+4) = 33552 cycles, this is about 13% faster than your loop (if you count 4.5 cycles per LDA in your loop also).As the theoretical minimum with "ABS,x" is 29376 cycles (9 cycles per copy) this is already only 14% slower than that.
Daniel.
Thank you very much for your help!
-
C= uses static address for origin of the screen (you can select only VIC bank). Atari has display list and possibility of setting any address for any line of screen apparently (constraints given by ANTIC are quite small).
I think best optimization of this routine would be:
1. expand game viewport to 256 pixels
2. change address of screen in display list (on scroll you need to update only one byte per line)
3. using pointer to point origin of screen in tiles map
But I'm afraid this require big changes in engine.
This is nice idea and it would definitely save the cost fo scrolling, but it has some drawbacks: double buffering will be required to avoid tearing during display. This will definitiely requite many changes to the engine.
-
I still can identify the potential for loop unrolling like already mentioned.
Another place:
711C: A2 07 LDX #$07
711E: BC 80 81 LDY $8180,X
7121: BD 10 9F LDA $9F10,X
7124: 91 0B STA ($B),Y
7126: CA DEX
7127: 10 F5 BPL $711E
-->
LDY $8187
LDA $9F17
STA ($B),Y
LDY $8186
LDA $9F16
STA ($B),Y
LDY $8185
LDA $9F15
STA ($B),Y
etc...
And like mentioned in PM: Going to change the PMG via IRQ and GRAF... registers could reduce memory footprint and DMA load.
This is actually my code, already optimized
The point is that lda $9f10,x has self modified address as an argument, and unrolling would create additional cost of modifing source adresses which would eat all benefit from unrolling.Can you elaborate a little bit about changing PMG via IRQ? What will be the savings? I'm already not using WSYNC too much (actually only two times a frame, to sync font base adress change) so how using IRQ would help here?
-
Just making sure the facts were presented correctly, s'all... if someone else picks up a similar project and believes that there's only four places a bitmap can be it's going to stuff them up if it's in one of the other two that are viable.
I guess any coder attempting to port any C64 game to Atari should know such VIC details
And http://icu64.blogspot.com/ may help to reveal details about C64 game display layout without even attempting to disassemble a game. -
As one of my friends put it 'nothing beats lda,sta

Game looks fast already, I wouldn't spend too much time on extra optimizations. Only thing that comes to mind is to change something in main code. Throw out some extra step, use double buffering or something like that (all this based on not knowing anything on how it works
).Yes, game is quite fast now. Most important is that it is faster than C64 version
. However, it is still slower than Spectrum version
Also, I find it entertaining digging through the code and searching for better solutions.This "change something in main code" will probably take lots of time, because game would need to be analyzed completely. (For now it is just C64 version with I/O patched for Atari and few small improvements here and there). I would rather spend that time looking at another game to port

-
1
-
-
Two tables with 'adc #$31' and 'adc #$3c' included with shift

If routine is not executed thousands of times per frame it's probably 'good enough' as it is. You did nice optimization on that lower part that executes 8 times, that should be enough.
ps. Nice trick with 'adc #0' and later 'and' to get lower and higher part of byte. Excellent work!
Thanks. This particular routine is sometimes executed 500 times per frame, so it is worth optimizing. But I don't really want to spend another $200 bytes for lookup tables, and still the savings will be not too big (as you mentioned, optimizing the loop is most important).
Here is another example, routine which scrolls the screen. Screen in game consists of framebuffer of size 192 pixels * 136 pixels (at address $CF40 in C64 code, size $cc0) and background tilemap of size $198 bytes at $C008 in C64 code. When game needs to scroll in either direction, these two buffers are moved. Here is one procedure for scroll:
$8983 A2 00 LDX #$00 ; 2//------------------------------
L_BRS_($8985)_($898C) OK
//------------------------------
$8985 BD 09 C0 LDA $C009,X ; 4+
$8988 9D 08 C0 STA $C008,X ; 5
$898B E8 INX ; 2
$898C D0 F7 BNE L_BRS_($8985)_($898C) OK ; 3 (14*256) = 3584
//------------------------------
L_BRS_($898E)_($8997) OK
//------------------------------
$898E BD 09 C1 LDA $C109,X ; 4+
$8991 9D 08 C1 STA $C108,X ; 5
$8994 E8 INX ; 2
$8995 E0 97 CPX #$97 ; 2
$8997 D0 F5 BNE L_BRS_($898E)_($8997) OK ; 3 (16*151) = 2416
$8999 A9 CF LDA #$CF ; 2
$899B 8D A7 89 STA $89A7 ; 4
$899E 8D AA 89 STA $89AA ; 4
$89A1 A2 0C LDX #$0C ; 2
$89A3 A0 00 LDY #$00 ; 2
//------------------------------
L_BRS_($89A5)_($89AC) OK
L_BRS_($89A5)_($89B5) OK
//------------------------------
$89A5 B9 41 CF LDA $CF41,Y ; 4+
$89A8 99 40 CF STA $CF40,Y ; 5
$89AB C8 INY ; 2
$89AC D0 F7 BNE L_BRS_($89A5)_($89AC) OK ; 3 (14*256) = 3584
$89AE EE A7 89 INC $89A7 ; 6
$89B1 EE AA 89 INC $89AA ; 6
$89B4 CA DEX ; 2
$89B5 D0 EE BNE L_BRS_($89A5)_($89B5) OK ; 3 (3584+17)*12 = 43,212
//------------------------------
L_BRS_($89B7)_($89C0) OK
//------------------------------
$89B7 B9 41 DB LDA $DB41,Y ; 4+
$89BA 99 40 DB STA $DB40,Y ; 5
$89BD C8 INY ; 2
$89BE C0 BF CPY #$BF ; 2
$89C0 D0 F5 BNE L_BRS_($89B7)_($89C0) OK ; 3 (16*191) = 3,056
$89C2 4C A5 87 JMP L_JMP_($87A5)_($89C2) OK
First it moves tilemap, and then framebuffer.
I attempted to optimize these in Atari version:
first loop:
lda $c009,x ; 4+sta $c008,x ; 5inx ; 2lda $c009,x ; 4+sta $c008,x ; 5inx ; 2lda $c009,x ; 4+sta $c008,x ; 5inx ; 2lda $c009,x ; 4+sta $c008,x ; 5inx ; 2bne loop ; 3 (11*4+3)*64 = 3008(11*8+3)*32 = 2912firs loop of framebuffer:lda $cf41,y ; 4+sta $cf40,y ; 5iny ; 2lda $cf41,y ; 4+sta $cf40,y ; 5iny ; 2lda $cf41,y ; 4+sta $cf40,y ; 5iny ; 2lda $cf41,y ; 4+sta $cf40,y ; 5iny ; 2bne loop ; (11*4+3)*64 =3008lda addr,x ; 4sta code1 ; 4sta code2 ; 4sta code3 ; 4sta code4 ; 4sta code5 ; 4sta code6 ; 4sta code7 ; 4sta code8 ; 4dex ; 2bne loop ; 3 ; (41+3008)*12 = 36,588Any ideas to improve these?
-
Oh I like the PMG overlays, but if RAM is limited and I have to choose between PMG overlays and rolled out code, I'd take the rolled out code. [ Or other methods that increase speed but use extra RAM ]
Now I get the point. Actually PMG takes around 1KB of RAM (both data and code). Prerotating sprites which I mentioned would need around 3,5KB * 16 = 56KB (0-7 bits rotations and alos flipped versions), so there is no option to fit that data in standard 64KB RAM.
-
That's great, so the full gameplay is there which is great.
I don't have a video to show you, but when I was outside, one of the guards walked straight through a building and they were visible. I don't know if this is in the original game or it's an Atari specific bug, but I saw it.
I was wondering if you would consider producing several versions of the game with conditional assembly producing the different code? (Therefore it is no extra development work for you). You've already mentioned having a 130XE version, but I would like it to be able to see a 64K version which has those optimisations in but would drop the use of overlays for extra colour. Of course, if you don't want to see a fragmented set of versions, completely ignore me.
I saw the bug with guards walking into the buildings, for now I think it is original (C64) bug, but it is yet to be proven.
About multiple versions: I am certainly not going to have multiple versions. Rather I prefer to have only one executable and turn features on or off on runtime. This 130XE version with optimized sprite (if I will really make it) will detect available memory on runtime and activate different code paths for sprites.
What is the point in having multiple versions? I understand that you don't like colour overlays and you want to turn these off? But then it will be simple black-and-white games, as the first preview I posted.
-
1
-
-
About optimizatoins: I with struggling with following routine:
//----------------------------
--
L_JSR_($6EC1)_($6E26) OK
//------------------------------
$6EC1 A2 00 LDX #$00 ; 2
$6EC3 86 50 STX $50 ; 3
$6EC5 0A ASL A ; 2
$6EC6 26 50 ROL $50 ; 5
$6EC8 0A ASL A ; 2
$6EC9 26 50 ROL $50 ; 5
$6ECB 0A ASL A ; 2
$6ECC 26 50 ROL $50 ; 5
$6ECE 69 31 ADC #$31 ; 2
$6ED0 85 4F STA $4F ; 3
$6ED2 A9 3C LDA #$3C ; 2
$6ED4 65 50 ADC $50 ; 3
$6ED6 85 50 STA $50 ; 3
$6ED8 98 TYA ; 2
$6ED9 48 PHA ; 3
$6EDA A0 00 LDY #$00 ; 2
$6EDC A2 08 LDX #$08 ; 2
$6EDE 86 0A STX $0A ; 3
$6EE0 A2 00 LDX #$00 ; 2 (total 53 cycles)
//------------------------------
L_BRS_($6EE2)_($6EF4) OK
//------------------------------
$6EE2 B1 14 LDA ($14),Y ; 5+
$6EE4 21 4F AND ($4F,X) ; 6
$6EE6 91 14 STA ($14),Y ; 6
$6EE8 E6 4F INC $4F ; 5
$6EEA D0 02 BNE L_BRS_($6EEE)_($6EEA) OK ; 3
$6EEC E6 50 INC $50 ; 5
//------------------------------
L_BRS_($6EEE)_($6EEA) OK
//------------------------------
$6EEE C8 INY ; 2
$6EEF C8 INY ; 2
$6EF0 C8 INY ; 2
$6EF1 C8 INY ; 2
$6EF2 C6 0A DEC $0A ; 5
$6EF4 D0 EC BNE L_BRS_($6EE2)_($6EF4) OK ; 3 ; 41 * 8 = 328
$6EF6 68 PLA ; 4
$6EF7 A8 TAY ; 2
$6EF8 60 RTS ; 6 ; 12 cycles
; Total: 393 cyclesNew Atari version is here:
asl ; 2
adc #0 ; 2
asl ; 2
adc #0 ; 2
asl ; 2
adc #0 ; 2
tax ; 2
and #$f8 ; 2
adc #$31 ; 2
sta sadr+1 ; 4
txa ; 2
and #$07 ; 2
adc #$3c ; 2
sta sadr+2 ; 4
ldx #$07 ; 2
sty sstorey+1 ; 4 (38 cycles)
sloop ldy mult4,x ; 4
lda ($14),y ; 5+
sadr and $ffff,x ; 4+
sta ($14),y ; 6
dex ; 2
bpl sloop ; 3 24 * 8 = 192 cycles
sstorey ldy #0 ; 2
rts ; 6 ; 8 cycles
; total: 38+8+192 = 238 cycles
mult4 :8 dta [#*4]
Any ideas if this can be improved, apart from creating lookup tables for shifting values three times left?
-
1
-
-
As this game is still a work in progress, what currently isn't implemented from the original game?
It seems that I have almost everything ready, apart from:
- title screen (propositions done, I just need to make decision which one to use)
- tiltle screen sound (this is being worked on, and propositions are ready)
- sound effects in game (also in progress)
Here is new version. It features colouring of items on status panel (thanks for Jose) and also some optimizations (thanks for Phaeron for his awesome Altirra debugger). Scrolling outside areas is now much faster, masking unvisible backgrounds is also optimized and there is small improvement to sprite rendering routine. Game is much faster when hero is walking outside buildings, but still slows down when there are many sprites visible. I am thinking to improve that by prerotating sprites and use prerotated versions while rendering, but this will require significant amount of additional memory, so it is going to be 130XE only feature.
-
3
-
-
I like it, but I like previous picture more. This one has too much details, making them look not good, because of picture resolution being not big enough. Thanks for your efforts!
-
-
Sorry for the delay but I wasn't home.
Thanks dmsc for source image.
Here is my conversion, 60 colors:
Perhaps an even better result could be achieved, but first I would like to know if Mariusz likes that image and wants to use it in his game.
Picture is nice and I thinking about using it, however it looks that picture uses lots of CPU time due to mid-scanline color changes and I wonder if there will be enough CPU cycles to play a music, which I would also like to include in the game.
Now I am waiting for th music to be done and then I will start implementing title screen.
-
Here is new version of Great Escape. It has PMG overlay implemented, so now it has some more colors. Thanks for Jose Pereira for coloring game screens. Also, it has bugfix for using fire button in joystick, so now it can be actually played, and items can be picked up, used, and dropped.
-
2
-
-
Played on my 130XE/PAL/1050/Mydos/MyPicodos. Didn't have a clue but was fun walking around

Thanks for report.
For instructions, please go here: http://c64games.de/phpseiten/spieledetail.php?filnummer=338and download Manual and Loesung (= Solution in English). Manual and solution are in English. These should give you some hints about where to walk actually
-
New version of The Great Escape attached. This version has fixed loading and it is supposed to work on real Atari. 64KB RAM required. If you run it on real Atari, please let me know the result. Also, graphical bugs are fixed.
-
11
-
-
Might be counter productive regarding your memory footprint, but there is some potential by
loop unrolling, loop unrolling and loop unrolling...:
(Examined only the hot-spots)
$6C26 LDX #$A0 $6C28 LDA #$FF $6C2A STA $035F, X $6C2D DEX => $6C26 LDX #$50 $6C28 LDA #$FF $6C2A STA $035F, X STA $03AF, X DEX --- Blitter functions for the 4 directions: ($89A5, $8A07, $8A61, $8AAE) like $8A04 LDX #$0C $8A06 DEY $8A07 LDA $CF40,Y ; OK $8A0A STA $CF41,Y $8A0D DEY $8A0E BNE L_BRS_($8A07)_($8A0E) OK $8A10 DEY $8A11 LDA $CF40 ; OK $8A14 STA $CF41 $8A17 DEC $8A09 $8A1A DEC $8A13 $8A1D DEC $8A0C $8A20 DEC $8A16 => $8A04 LDX #$0C $8A06 DEY SMCA: LDA $CF40,Y ; (here is some potential left, if you can avoid the page boundary crossing SMCB: STA $CF41,Y ; for the load - by start address/init adaptations...) DEY SMCC: LDA $CF40,Y SMCD: STA $CF41,Y DEY BNE SMCA DEY SMCE: LDA $CF40 SMCF: STA $CF41 DEC SMCA+2 LDA SMCA+2 STA SMCB+2 STA SMCC+2 STA SMCD+2 STA SMCE+2 STA SMCF+2 ---Of course you could create a version for memory upgraded machines where you fully unroll the 4 moves with absolute addressing...

I also notice sometimes the following corruption of the graphics (seems to be self-healing):
Thanks for the hints about performance. I discovered Altirra awesome profiler in the meantime and found all that wisdom
. There is still some space left for the code, so I may try to unroll most expensive loops.The problem you mentioned turned out to be a bug in patching C64 routines, now fixed, thanks for report.
Mariusz.
-
I was only saying that because C64 has always a screen size of 320x200 and ZXs a 256x192.
Yes I understand that what moves here in software is what is inside and that is the same size on the two but because A8 has three possible screen widths: 32, 40 and 48 set the screen to 32Bytes frees cycles and Screen RAM, what I don't know and was asking is if this will help in anything 'speed related' when the inside the field is the same size on the three versions.
But have you already put it in 32Bytes wide? At least for the screen you have about 8KBs for a 40Bytes wide screen that would decrease to an about 6KBs one, isn't it?
P.s.- Do you see any techinal explanation other than maybe the guy(s) behind the C64's version taste to didn't include the flag on the left side?
Yes, the screen is 32 bytes wide. And it needs 6KB RAM for display.
I am not sure why flag is missing. It is not missing RAM, as C64 version still has some free RAM left. I think either it made game even slower so they didn't accept it, or just author missed the deadline for delivering the game, so he had to cut some features. "Score" in Spectrum version appears to be expressed in medals and not simple numerical value like C64 version does.
BTW: I know you are doing lots of graphics for Atari, could you please make left and right ornament for playfield (vertical bars in yellow in C64 version) as PM/G graphics for Atari? This would allow me to have yellow for whole ornament like C64 has.
Mariusz.

New game port: Fairlight
in Atari 8-Bit Computers
Posted
Can you please reveal some more details on this?