Jump to content

damosan

Members
  • Posts

    142
  • Joined

  • Last visited

Posts posted by damosan

  1. 16 hours ago, TGB1718 said:

    Try not passing parameters to functions, make as many variables global, even return values, I just tried

    a simple call and saved 30 bytes by not passing a variable and not returning a value.

     

    Use char instead of int where possible.

     

    Unsigned chars to be specific - same applies to words (or unsigned ints).  Sign extension is spendy from a time/space perspective.

  2. 14 hours ago, Wrathchild said:

    do try the 'i', 's', 'r' options (and combinations off)

     

    -O, -Oi, -Or, -Os

    Enable an optimizer run over the produced code.

    Using -Oi, the code generator will inline some code where otherwise a runtime functions would have been called, even if the generated code is larger. This will not only remove the overhead for a function call, but will make the code visible for the optimizer. -Oi is an alias for -O --codesize 200.

    -Or will make the compiler honor the register keyword. Local variables may be placed in registers (which are actually zero page locations). See also the --register-vars command line option, and the discussion of register variables below.

    Using -Os will allow the compiler to inline some standard functions from the C library like strlen. This will not only remove the overhead for a function call, but will make the code visible for the optimizer. See also the --inline-stdfuncs command line option.

    It is possible to concatenate the modifiers for -O. For example, to enable register variables and inlining of standard functions, you may use -Ors.

     

    I believe -O optimizes for space while the other options are for speed via inlining.

  3. I've been working on getting Mini-Scheme to compile using CC65 - it is a traditional not-8bit program in that is leverages a rather complex union and many getter/setter/istrue/isfalse/etc. macros.  On any 16bit processor this wouldn't be a problem.  On the 6502 on the other hand...

     

    A few CC65 observations when trying to cram code into the Atari - they are, of course, variations on previous themes.  The difference in this case is an leveraging existing code vs. building something from scratch.

     

    • Replace getter/setter/pseudo-function macros with traditional functions where possible.  Focus on the getter/setting/pseudo-function macros that are used throughout the code base first.  In one case I shaved ~2k from the binary by replacing a widely used macro with a function.  Average savings  was ~200 bytes for each macro.
    • Replace *s++ = *c; with *s = *c; ++s; - requires less assembly to do the work.  It's not idiomatic C by any stretch but when you're trying to shave bytes from the binary...
    • Don't forget to move critical globals to Page 0 - for performance.  Doing so will shave a byte each time the variable is accessed and increase performance.
    • Disable select features.  In the case of Mini-Scheme it was was neat to see basic threading with call/cc work on an 8bit it was so slow that removal was the real answer.
  4. A fun experiment is to copy the OS to RAM and then overlay your stuff in areas of the OS you don't plan on using.  I have a cc65 program somewhere that loads some code overlapping the international character set.  Not exactly what you want to do but you get the idea.

     

    It's kind of fun saying "I won't be needing any of the cassette driver stuff so....gone."

     

  5. I like the second approach - you can also shave off the JSR/RTS if you do a JMP and in your routine you simply JMP process_next_function at the end of each routine.  That would save you 6 cycles per function call (two JMP immediates = 6 cycles vs. JSR/RTS = 12 cycles).

     

    Depends on how your code is structured...

     

    fntable_lo:
    .BYTE <fn1
    .BYTE <fn2
    .BYTE <fn3
               
    .fntable_hi:
    .BYTE >fn1
    .BYTE >fn2
    .BYTE >fn3
      
    TAX
    LDA fntable_lo,X
    STA modjsr+1
    LDA fntable_hi,X
    STA modjsr+2
    modjsr: jmp $0000
    ; never gets here...
    
     fn1:
      JMP process_next_function
      
     fn2:
      JMP process_next_function
      
     fn3:
      JMP process_next_function

     

  6. I'd be a great idea to write a modern Atari assembly language book.  It would, of course, cover the basics of assembly language.  From there it can begin to cover the hardware and how to exploit it with commented examples.  Player/missile graphics, software sprites, multiplexing players, sound, DLIs, VBIs, double buffering, what you can do when you move the OS to RAM, etc.

     

    Sort of a modern De Re Atari...

  7. 4 hours ago, danwinslow said:

    I think that is fairly correct. You can write C in a manner that is very bare metal too, but you have to do it on purpose. I think Action! and C code such as mentioned are very similar in 'height' above bare metal...so then it gets down to which has better code generation and optimization...and I'm sure that's been explored in one of the language benchmarking threads here.

     

    Malloc functions are very interesting and fundamental routines, you would learn a lot if you worked your way through writing your own.

     

    It's been years since I used Action so perhaps this is off base a bit.

     

    Action, I believe, is more efficient by default - the features of the language are geared towards 8bit programming.  With most C compilers on the 8bit you have to deliberately avoid using certain language features to write efficient code.  Going so far as writing your C code as if it were a verbose assembler.

     

  8.  

    In basic you'd have something like this:

     

    200 A = PEEK (106) - 8 : POKE 54279, A : PMBASE = 256*A

     

    PM space aligns on a 1k block - this serves as an optimization of sorts.  You'll find this often on the Atari.  For example font tables have to align on a 1k block.

     

    Your example reserves 2k of ram for single line resolution player/missile graphics.  A neat hack is that you can use the first 768 bytes of this block for other reasons.

  9. The better way is to use page zero (or some other unused memory area) to set parameters.  In the assembly file you'll define equates pointing to that memory area.  This will result in smaller/faster code -- CC65 is great but passing parameters the conventional way will generate slower code.

     

    For setting Atari specific registers check out atari.h.  You can set ROWCRS / COLCRS directly from C vs. calling a routine to do so.

    • Like 1
  10. I have fond memories of Megamax C - I used that compiler to learn K&R C back in '86 or '87(?).  The limitations didn't really affect a newbie hacker that much and it compiled pretty quickly.

     

    When I mess with C on the ST today I tend to use Lattice or Pure C.

     

    If you really want to know how the platform works you'll use assembly.  :)

    • Like 1
  11. If I was going about trying to support internationalization, from a font perspective, I'd probably look at copying the OS to RAM and then overwriting the default OS font.  You can also load your custom font into any 1k aligned memory region and tell the OS to go to that region for font data.  The benefit of loading the OS into RAM is that you can basically overwrite any part of the OS you won't be needing.  You'll need to practice some care here but there are significant chunks of RAM you can free up in this way (say the floating point routines, the international character set data in XL/XE machines, the cassette driver, etc.).

     

    In any event then I'd be able to deploy one binary with several 1k font files allowing the user to select their language at runtime (saving their selection for future runs).  There are plenty of source code examples out there of loading custom fonts.

    • Like 1
  12. Yeah - I had aligns in there initially but the linker would complain about the code segment not being aligned.  It'd still work of course - I need to modify the linker config to align the data segment to a page.  I made some changes to the assembly (using res) but kept the other tables as they were as someone new to 6502 assembly and ca65 may be confused with the repeat/endrep stuff.

     

    I'm thinking of new ways to plot pixels where the LDA/ORA/STA sequence will work with absolute addresses instead of lookups.  It's going to take a lot of space of course.

     

     

     

  13. Attached is a link to a graphics mode 8 pixel plotting test harness.  It includes all sorts of ways to plot a single pixel on a graphics 8 screen from the slowest (~13 pixels / jiffy)  to the fastest I've managed (~332 pixels / jiffy) on an NTSC machine (on a PAL machine the counts are ~17 / ~427 pixels per jiffy).

     

    The various methods are all general purpose routines meant to plot a pixel at a given location.  There are two "cheat" tests that are optimized for the use case of filling a screen.  The project also serves as an example of interfacing CC65 with external assembly code, how to pass variables using the zeropage, and how (in some cases) not to plot pixels.  It also includes a routine to plot text on a gr8 screen.

     

    This little project was made possible by all the code folks have published over the years as well as the conversations I've had on AtariAge.

     

    Next step is to add sprites...

     

    https://github.com/damosan314/gr8

    gfx.xex

    • Like 2
  14. On 5/17/2022 at 4:18 AM, ilmenit said:

    Perfect, I went through the https://llvm-mos.org/wiki/Linker_Script and it should do the work. 

    I would love to see an example of this in action i.e. main program + 1-4 banks.  Ideally any data defined within that bank stays there vs. being lumped together with other vars.  The bank switcher code would have to reside in main memory outside the window of course and require a stack to remember which bank it came from when making the switch to a new bank.

×
×
  • Create New...