Jump to content

shasm65 - a 6502 assembler in sh

Posted by ivop, 30 October 2009 · 781 views

assembly 6502 sh shasm65
Here's the 6502 assembler I mentioned recently on the Atari 8-bit forum. The reasons to write this were:

1.) None of the assemblers I tried could generate correct code for code assembled to run in zero page and have forward references to other code in zero page, changing their operand in real-time.
2.) I wanted to write an assembler in sh (years ago, I came across osimplay, which I thought was pretty neat).

shasm65 is written in sh, the Unix Bourne Shell, with a few extensions used which are not available in all sh incarnations. So far, I have adapted it to work with bash, zsh (~28% faster than bash) and mksh (ksh93, ~52% faster than bash). ash, dash, ksh (ksh88) and pdksh all fail to work, either because they lack array support or do not allow function names to start with a dot. Both issues could be "fixed", but that would make it slower and/or pollute the internal namespace, so I decided against it.

Its syntax is different from all other 6502 assemblers, because an input file is treated as just another shell script, which is sourced by the assembler. Mnemonics are function calls and its arguments are the operands. Labels are defined by using the special function L and assembler directives are functions starting with a dot, like .org, .byte, .word, et cetera. Labels are referenced as shell variable names (ex. jmp $label). Numbers/memory locations can be specified in decimal, hexadecimal (ex. 0xfffe) or octal (ex. 0377).

To fix the main reason for writing this assembler (see point 1. above), shasm65 uses different mnemonics for some addressing mode. For example, loading A from a zero-page location is lda.z. This way, the assembler knows immediately exactly how much storage an instruction requires.

addressing modes:

implied           no suffix, ex. cli
accu              .a, ex. rol.a
zp                .z, ex. lda.z 0xfe
zp,x              .z, ex. adc.z 128,x
zp,y              .z, ex. stx.z 64,y
(ind,x)           .i [,x], ex. lda.i [23,x]
(ind),y           .i [],y, ex. cmp.i [017],y
immediate         ., ex. lda. 17
absolute          no suffix, ex. dec 0x6ff
absolute,x        no suffix, ex. inc 0x0678,x
absolute,y        no suffix, ex. ldx $fubar,y
(abs)             .i, jmp.i [0xfffe]
relative          no suffix, ex. beq $loop


.org start [dest]          start address of next block (optionally loaded at different location)
.byte x y z ...            include literal bytes (no comma but spaces between the arguments)
.word x y z ...            include literal 16-bit little-endian words
.ascii "ascii string"      include literal string
.screen "string"           include literal string of Antic screen codes
.space size                reserve size space
.align b                   align to b-bytes boundary
.binary filename           include _raw_ binary file filename
.                          include source file (shell script, library functions, etc.)
L name                     define label

Because both the assembler and the source files it assembles are just shell scripts, you have all of the shell functionality (including calling external applications) as your "macro" language. You can create your own functions, use for loops, tests, if/then/else/fi conditional assembly, arithmetic, all you can think of.

# lines starting with a hash are comments
# the code below demonstrates a few of the features


clear_pages() {    # start number_of_pages
    ldx. 0
L loop
    for foo in `seq $1 0x0100 $(($1+($2-1)*256))` ; do
        sta $foo,x
    bne $loop


L clear_some_mem
# inline unrolled loop to clear 0x4000-0xbfff
    clear_pages 0x4000 $((0xc0-0x40))

L host_info
    .ascii $(uname -a)

There are two built-in functions:

lsb()     least-significant byte, ex. lda. `lsb $dlist`
msb()     most-significant byte, ex. lda. `msb $handler`

Variable-, function- and label names should not start with an underscore or a dot. Both are reserved for the assembler itself. Also, all shell reserved words are prohibited.

shasm65 has the following command line options:

-oFILE      write output (Atari 8=bit $FF$FF binary format) to FILE
-v          verbose output
-?          output credits, license and command line options with a short description and their defaults

So, why a shell script? Well, because I can, it is fun, the code is short (~300 lines), it runs on many, many platforms, it provides a very powerful scripting/macro language and it's fun :)

So, no drawbacks? Yes, there are. Shell scripts are interpreted and therefore shasm65 is a lot slower than the usual assemblers written in C and compiled to native machine code.

I have probably missed describing some features or quirks, but basically, this is it. Have fun :)

Any questions, post below.

Attached Files

DASM is capable of producing zero-page, self-modifying code with forward-references in zero page. I have done this many times as have others.

While it sometimes gives me bogus warnings, it does assemble this code correctly.
  • Report
This is really neat!
  • Report