Jump to content
IGNORED

cc65 newbie porting Action! game ("Gem Drop")


Recommended Posts

So almost 5 years ago, I took the source-code to my game "Gem Drop", written in Action! back in 1997 and ported it to C for cc65.  I got off on a tangent hacking on cc65 header files, buying a house and moving out of state, etc. etc. etc.  I finally got some of my header file hacking included in the upstream cc65 codebase, so this evening I finally went back and updated my code so it can build with cc65 (and its Atari header files) that come with Ubuntu. :)  Yay!

 

My problem is, I'm getting a lot of glitching on the screen when I actually play the game (sometimes your character mysteriously gets redrawn one line above where it should go; sometimes a rogue tile piece appears randomly on the screen, etc.; one time it started totally ignoring keyboard input!)

 

image.thumb.png.405cc7c7c72de1f98233c796366e0a05.png

 

None of these things are bugs in the original game... it's a quirk to my C port.  Some things to note:

  • I'm using a VBI to flicker between two fonts.  Press [F] to toggle this effect (however, the VBI will still be running!  Edit the code to disable.)
  • Joystick or arrow keys move left/right.  Down to pull down a block (or group, if they match vertically).  Up to throw what you're carrying back up.  Match 3 or more in a row vertically; horizontal matches will then also be counted.
  • Player graphics used to draw the score-explosions when you make a match.
  • Space to Pause, or console keys or Esc to abort.
  • Title screen / menu is not really operable.  The game supports Genesis game pads (for B / C to grab/throw) as an option.  It will try to detect them at start-up.

 

Is anyone out here willing to take a look at it and see what I might be doing wrong?  The code is here: https://github.com/billkendrick/gemdrop_deluxe  On Linux, I can just run "make run-xex" and it will build the XEX file and launch Atari800 with the appropriate options.  If you're not on Linux, I'm assuming you're versed enough in how to build things in cc65, otherwise why are you reading this? :)

 

Thanks in advance!

 

PS - The original game from 20+ years ago (augh!!!) is here: http://newbreedsoftware.com/gemdrop in case you want to compare...  Trivia: I actually ported it to C once before, back in 1998.  I ported it to X-Window, and soon after to libSDL.  That port runs on a ton of platforms! (see http://newbreedsoftware.com/gemdropx/)

 

  • Like 1
Link to comment
Share on other sites

52 minutes ago, billkendrick said:

I'm losing steam, but I THINK it might have just been due to using straight C within my VBI. I've rewritten it as inline assembly,

I've found exactly the same thing, if it's VBI code I use assembler modules, too much overhead in the C routines, so they are not

so short and quick.

Link to comment
Share on other sites

40 minutes ago, TGB1718 said:

I've found exactly the same thing, if it's VBI code I use assembler modules, too much overhead in the C routines, so they are not

so short and quick.

Even more worse, if you use subroutine calls in VBI, you likey trash the stack pointer or other registers CC65 relies on.

  • Like 1
Link to comment
Share on other sites

47 minutes ago, ilmenit said:

I'd propose to set a breakpoint at the beginning and the end of the VBI code. If the breakpoint at the beginning triggers twice, before the ending one, then you crossed your cycle limit in this procedure.

Alternatively in Altirra, use the menu Debug/Verifier... and set the Recursive NMI execution option

  • Like 1
Link to comment
Share on other sites

I see multiplication in the main game loop - get rid of that if possible.  As you know 6502 likes addition and subtraction.  If you need to multiply or divide try to do it with shifts.  Better yet to compute at compile time and use an array.

 

Function calls with a lot of parameters are horribly expensive with cc65 (and 6502 in general) - inline what you can (or use zero page memory).  You can enable static locals at compile time also.  That may help.  I notice you memcpy for 8 bytes here or 9 bytes there.  It may be faster just to POKE() those values into memory directly - especially if they're static values.  There's a certain point where memcpy makes sense - I'm not sure where that point is though.

 

Have you tried to look at the generated assembly language?  I think you may be horrified by what you find.  Your game loop may be taking too long so the VBI routine is updating the screen based on half-updated game state?

 

Properly tuned C can be pretty quick but you need to follow certain conventions - the end result may make a C programmer frown but it's what you have to do.  I've seen properly tuned C code easily keep pace with Action!.

Link to comment
Share on other sites

2 hours ago, damosan said:

I notice you memcpy for 8 bytes here or 9 bytes there.  It may be faster just to POKE() those values into memory directly - especially if they're static values.

That got me thinking, this is between 25% and 35% faster than memcpy for moves of 256 bytes or less, but not as flexible.

even small moves are much quicker.

 

C code bit.

 

#include <stdio.h>
#include <peekpoke.h>
#include <string.h>

extern void mcpy(void);

extern char size;

void main(void)
{
    int i;
    char *addr=&size-11;     // source address in .s module
    char *dest=&size-8;        // dest address in .s module
    char *num=&size-13;        // size to move (X register) in .s module
    
    // 500 loop test run to roughly time how long
    POKEW(19,0); // reset clock
    for(i=0;i<500;i++)
        memcpy((char *)0x6000,(char *)0x7000,5);
    printf("%u %u\n\n",PEEK(19),PEEK(20));

    // now 500 loop test run for new code
    POKEW(19,0);  // reset clock again
    for(i=0;i<500;i++)
    {
        POKEW(addr,0x6000);
        POKEW(dest,0x7000);
        POKE(num,5);
        mcpy();    
    }
    printf("%u %u",PEEK(19),PEEK(20));
    return;
}

 

Assembler module

    ; Export the start of program code
   .export _mcpy
   .export _size

 

    .proc _mcpy: near
    .code
    _mcpy:
     LDX #0             ; this will change before call
LOOP: LDA $1000,X     ; same again, will change
    STA $1000,X     ; and this too
    DEX
    CPX #255         ; have to do this as doing BNE after DEX will miss one byte
    BNE LOOP
    RTS
    .endproc
_size: .byte 0 ; dummy used for offsets
 

Link to comment
Share on other sites

1 hour ago, TGB1718 said:

That got me thinking, this is between 25% and 35% faster than memcpy for moves of 256 bytes or less, but not as flexible.

even small moves are much quicker.

 

 

If you're only going to do a few bytes at a time...

 

	ldx	#0
	lda	$6000,x
	sta	$7000,x
	inx
 
<repeat n times>

...or do you have to use Y for this?  Doesn't matter as the point is you can inline byte copies for a small number of bytes easily.  The code will run quite a bit faster as there are no compares, no jumps, etc.  You can probably use ZP as well to reduce the byte counts and increase speed further.  This is where ca65 comes in handy - you can create a core repeat macro and then create all sorts of byte copies for arbitrary sizes at assembly time.

 

Or make it smarter so that it will try to use inline code but if it cannot then it hops over to memcpy().

  • Like 1
Link to comment
Share on other sites

1 hour ago, damosan said:

...or do you have to use Y for this?

can use X or Y, the difference in using which register becomes apparent when using zero page indirect,

for this you would only use the Y, i.e.

LDY #0

LDA ($CB),Y

STA ($CE),Y

 

Using X register 

LDX #0

ldy #0

LDA ($CB,X)

STA ($CD),Y

 

doesn't work the same way, it indexes addresses stored in zero page via the X register to get the effective address,

it's good for lookup tables, but obviously there isn't much left of zero page to use, so this instruction is not seen

very often.

Link to comment
Share on other sites

  • 2 weeks later...

Thanks all for the comments!  Yeah, I'm sure I can optimize a lot more of the code, but so far it seems to be working alright, now that I've converted the VBI to pure assembly.  So I've released a beta!

 

Video: 

 

Download: http://newbreedsoftware.com/gemdrop_deluxe/

 

Source code: https://github.com/billkendrick/gemdrop_deluxe

 

If anyone wants to submit patches or pull requests, feel free to send them my way!  I appreciate the experts helping out. ;)

 

PS - So far I've tried it (the XEX build) out on:

  • "Atari800" emulator (stock XL mode, basically)
  • "Atari800" emulator in Atari 800 mode (400/800 OS, 48K RAM)
  • Atari 1200XL via SIO2SD
  • Atari 1200XL via The Ultimate Cart

Thanks!

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

I have been messing around with CC65 for about a year now. I am discovering the compiled program does take up more room than I originally expected. May be able to do a simple game. But to do anything extraordinary, it will  need machine language calls are required to help make things fit into RAM. Assembly routines compile down to much smaller code.

Gem Drop, I believe is the first game that uses two character sets switching in alternating frames to generate more colors. Not many games been made that uses this or many programmers / graphics arts mastered it. 

Edited by CuloMajia
Link to comment
Share on other sites

16 minutes ago, CuloMajia said:

I have been messing around with CC65 for about a year now. I am discovering the compiled program does take up more room than I originally expected. May be able to do a simple game. But to do anything extraordinary, it will  need machine language calls are required to help make things fit into RAM.

Don't hijack threads with this, start a new one, describe where your expectations aren't being met and others maybe able to help.

19 minutes ago, CuloMajia said:

Assembly routines compile down to much smaller code.

Not necessarily, using macros to unroll loops will make a larger amount if code.

(and an assembler assembles)

Link to comment
Share on other sites

I applied a few changes to a previous version in C and then passed it through cc65 (with -Osir) to see the resulting code.  It doesn't look too bad.  Instead of using X = X + 1 I used ++X (and --X)  which allows cc65 to use inc/dec vs. the lda/clc/adc.

 

	lda     _flicker
	beq     L0002

    lda     #$04
	sec
	sbc     _FLIP
	sta     _FLIP
	lda     _CHAddr
	clc
	adc     _FLIP
	jmp     L000F
L0002:	lda     _CHAddr
	clc
	adc     #$08
L000F:	sta     $02F4
	inc     _TOGL
	lda     _TOGL
	cmp     #$04
	bne     L000D
	lda     #$00
	sta     _TOGL
	lda     _ExAnim0
	beq     L0011
	sta     $02C0
	dec     _ExAnim0
	jmp     L0012
L0011:	sta     $D000
L0012:	lda     _ExAnim1
	beq     L0014
	sta     $02C1
	dec     _ExAnim1
	jmp     L0015
L0014:	sta     $D001
L0015:	lda     _ExAnim2
	beq     L0017
	sta     $02C2
	dec     _ExAnim2
	jmp     L0018
L0017:	sta     $D002
L0018:	lda     _ExAnim3
	beq     L001A
	sta     $02C3
	dec     _ExAnim3
	rts
L001A:	sta     $D003
L000D:	rts

 

Link to comment
Share on other sites

11 hours ago, CuloMajia said:

I have been messing around with CC65 for about a year now. I am discovering the compiled program does take up more room than I originally expected. May be able to do a simple game. But to do anything extraordinary, it will  need machine language calls are required to help make things fit into RAM. Assembly routines compile down to much smaller code.

Gem Drop, I believe is the first game that uses two character sets switching in alternating frames to generate more colors. Not many games been made that uses this or many programmers / graphics arts mastered it. 

The probable reason for the size is that you are using std library calls like printf, fopen, etc. Those pull in the entire stdio library even if you just use 1 call. Use conio.h or write your own IOCB calls. CC65 is capable of writing extremely powerful games, all in C, but you need to do it correctly. There's lots of information about how to write optimized CC65 code.

Link to comment
Share on other sites

11 hours ago, damosan said:

I applied a few changes to a previous version in C and then passed it through cc65 (with -Osir) to see the resulting code.  It doesn't look too bad.  Instead of using X = X + 1 I used ++X (and --X)  which allows cc65 to use inc/dec vs. the lda/clc/adc.

 

That is really useful. I spent a few hours today getting a game over 35K down to 30K. I had several X = X + 1 and replaced them with X += 1. 
Sorry about hijacking this thread, however information like this does benefit everyone. Sometimes it is very time consuming to port a whole game from one language to another or replace the whole program with assembly. I started using more tables to set up different enemy types instead of a big IF-BLOCK. 
ML code comes in handy if after all your optimizations, a program still doesn't run on a 64K machine the way you work like. From what I can see, CC65 runs faster than Fast Basic or Mad Pascal, but pure optimized assembly is the way to go if you need a task to do the job in less memory and/or time. 
This is a common clear screen routine for me. I don't exactly unroll code, but write to several base locations + index within a loop. 

	LDA #0
	TAY
Clear_Screen_Loop:
	STA SCREENMEM + $0000,Y
	STA SCREENMEM + $0100,Y
	STA SCREENMEM + $0200,Y
	STA SCREENMEM + $0300,Y
	DEY
	BNE Clear_Screen_Loop:

 

  • Like 1
Link to comment
Share on other sites

> but pure optimized assembly is the way to go if you need a task to do the job in less memory and/or time

 

Put this in big letters to the entry of this forum. I see people spending time and whatnot using higher level languages, just to find out that "it doesn't work". Sure, higher level languages are good and quicker to develop a  program for and therefore have their use. But e.g. a DLI routine is something which longs for an assembly implementation.

  • Thanks 1
Link to comment
Share on other sites

No way can you do DLIs and VBIs with higher languages. If you just want to change GTIA or ANTIC registers, some will compile into LDA + LDX, just to store to an 8-bit memory location. Most of the Basic's compile and store the line number where something is coming from for debugging purposes.

  • Like 1
Link to comment
Share on other sites

1 minute ago, CuloMajia said:

No way can you do DLIs and VBIs with higher languages. If you just want to change GTIA or ANTIC registers, some will compile into LDA + LDX, just to store to an 8-bit memory location. Most of the Basic's compile and store the line number where something is coming from for debugging purposes.

But cc65 doesn't do that thankfully.  Unless you want it to...

 

11 hours ago, danwinslow said:

The probable reason for the size is that you are using std library calls like printf, fopen, etc. Those pull in the entire stdio library even if you just use 1 call. Use conio.h or write your own IOCB calls. CC65 is capable of writing extremely powerful games, all in C, but you need to do it correctly. There's lots of information about how to write optimized CC65 code.

I'm pretty sure the linker pulls in what it needs - looking at the map file shows this.  If you want printf() style output you can always build your own using variable arguments.  My lw_printf() routine handles strings, bytes, and words and weighs in at around 640 bytes.

Link to comment
Share on other sites

Hi!

44 minutes ago, CuloMajia said:

No way can you do DLIs and VBIs with higher languages. If you just want to change GTIA or ANTIC registers, some will compile into LDA + LDX, just to store to an 8-bit memory location. Most of the Basic's compile and store the line number where something is coming from for debugging purposes.

As a "shameless plug", or just to be on the contrarian side, FastBasic allows you to write DLI, even with dynamic ones ;)  .  See http://www.vitoco.cl/atari/10liner/WAZERS/ or this video: 

 

and for the original discussion on DLI support, see:

Have Fun!

 

Link to comment
Share on other sites

1 hour ago, CuloMajia said:

No way can you do DLIs and VBIs with higher languages. If you just want to change GTIA or ANTIC registers, some will compile into LDA + LDX, just to store to an 8-bit memory location. Most of the Basic's compile and store the line number where something is coming from for debugging purposes.

Of course you can. I did a mouse driver running off of a timer interrupt. I've done DLIs. I've done vblank routines. I even wrote a threading system that ran off of vbi. All in straight C. Coded properly, CC65 is more of an advanced macro assembler than a 'high level language'. MADS Pascal does a great job, too.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...