Jump to content

bugbear

New Members
  • Posts

    40
  • Joined

  • Last visited

Recent Profile Visitors

2,434 profile views

bugbear's Achievements

Space Invader

Space Invader (2/9)

8

Reputation

  1. I did what Sanny asked (above) and wrote up my project;
  2. There are/have been/were two attempts at a cross compiling version of Action! http://gury.atari8.info/effectus/ http://www.noniandjim.com/Jim/atari/Action_Compiler.html I do not know how far they got, wether they were finished, or wether they still work. BugBear
  3. Benchmark design is so hard that entire books have been published about it... BugBear
  4. If your purpose is speed (which is what Action is for) you should probably think long and hard before using floating point. Unless you're writing numerical analysis software for the 6502, in which case - good luck with that! BugBear
  5. In the modern age, a GPU can shift giga-bytes per second. But (for me) being full-on-retro is more fun. BugBear
  6. >> This is how I would bulk copy 4kB (16 pages). Unrolling the outer loop instead of the inner - I love it! BugBear
  7. The context I'm thinking of is a bulk copy - 4K or more, e.g. a whole screen being paged in during flyback that sort of thing. So - simplifying assumptions; source and dest do NOT overlap, source and dest are page aligned, size of copy is a multiple of whole pages. If we use the "natural" coding (two zero page pointers, indexed by Y, and unroll the loop enough, we can get asymptotically close to 12 cycles per byte; lda (SRC),y ; 5 sta (DST),y ; 5 iny ; 2 This is simple to understand, and (in fact) doesn't really need all my simplifying assumptions. To go faster, I think we need self-modifying code, using abs,x addressing (4 cycles). Consider the following fragment: src1: lda $aa00,x ; 4 dst1: sta $bb00,x ; 4 src2: lda $aa01,x ; 4 dst2: sta $bb01,x ; 4 src3: lda $aa02,x ; 4 dst3: sta $bb02,x ; 4 . . By using incrementing low bytes on the absolute addresses, we avoid the need to increment x. And we've saved a cycle on the load and store. This is 8 cycles per byte. So, we unroll the loop, and add "some number" to x each time round the loop. If we make the unrolling a power of 2, this will neatly come out to a page. However, we also need to perform the self modifying for each page, setting the (incrementing) page location of SRC and DST into the high byte of each absolute address; we need to set SRC at src1+2, src2+2, etc and DST at dst1+2, dst2+2, etc. Each of these STA $xxxx takes 4 cycles. If we unroll by N, where N is one of 4,8,16 etc, the copying loop will take N*8 cycles (plus a handful more for the loop back, 8 or 10). For the self-mod, we will also have N*8, although the self mod code is run once per page; the loop is run (256/N) per page. So the total cycles per page, in an unroll of N is around: N*8 + (256/N)*(N*8* + 10) So if we make N too big, we'll do too much self-mod, and if we make N too small, the copy loop won't be fast. What's best? Setting up this equation in a dirty fragment of perl, I get: 4:2720 rate 10.625000 8:2432 rate 9.500000 16:2336 rate 9.125000 32:2384 rate 9.312500 64:2600 rate 10.156250 128:3092 rate 12.078125 (rate is cycles per byte). So 16 appears best. The result (at the hand-waving level) is a block copy running at 9.1 cycles per byte. I've probably left out some overhead in this, but I think the overall analysis is OK. BugBear
  8. The main difficulty that I can't see a way round in ca65 macros is the need for each "structure" to have its own label. In a full textual macro language, like m4, one could simply have a counter (to spawn new labels) and a stack, so that each structure uses its own label on entry/exit. Since m4 can overdefine (AKA update) macro values as it goes along, this is perfectly doable. But the ca65 (and many other assemblers) implement macros at the token level, so I'm not sure it's doable. Here's an example (I'm happy with how I could implement the comparison/conditions, BTW) IF(<=, 10) mva #10, z_cnt lda z_inc IF(>, 20) mva #50, z_hit ENDIF ELSE mva #81, z_cnt ENDIF Which expands to: IF(<=, 10) mva #10, z_cnt lda z_inc IF(>, 20) mva #50, z_hit ENDIF if_2_end: ELSE if_1_else: mva #81, z_cnt ENDIF if_1_end: The difficulty is spawning, and appropriately using, the if_1 and if_2 labels. Similar issue apply to the other structures. If I'm missing an obvious implementation, I'm all ears! BugBear
  9. https://en.wikipedia.org/wiki/Assembly_language#Support_for_structured_programming http://wilsonminesco.com/StructureMacros/ BugBear
  10. I should say that this is NOT my notion of a well made ca65 project; it is merely (and only) the smallest edit I could make to the xasm project that actually assembles and runs. This will be the start point for my ca65 exploration. Sadly, it already looks as if the token based macro language of ca65 is not sufficient for me to play around much with "higher level" constructs, so I may have to use another stage of external macro-ing (m4, or something) via the Makefile. BugBear
  11. I have attached the project files, in their edited and usable by ca65 state. Just typing "make" will do everything, at least on Linux. BugBear pub_cc65_Silence.zip
  12. I should have said; I am working on this project (at the moment) by editing the .asx files. Since my perl script only replaces xasm stuff, if I put ca65 text in there, it goes straight through to the .s file unaltered, where ca65 is happy to assemble it. BugBear
  13. In a possibly ambitious bid to learn about A8 programming, I wanted to pick an assembler. In order to confirm that I'd picked an assembler that would suit me, I wanted to see what a large project would look like (1 page examples show you nothing). (thread : http://atariage.com/forums/topic/265271-recommend-me-an-assembler/) And so I decided to take a look at analmux's "Scrolling MCS, PMU & RMT" project from here: http://atariage.com/forums/topic/229764-source-code-released-scrolling-mcs-pmu-rmt/?do=findComment&comment=3203170 Using my prior experience with gnu Make, I knocked up a Makefile a pattern rule for assembly. .SUFFIXES: .SUFFIXES: .asx .inc .h .c .s .lst .map .o .com BB=/home/bugbear/code/cc65-master/bin CC = $(BB)/cc65 LD = $(BB)/ld65 AS = $(BB)/ca65 AFLAGS = -t atari %.o: %.s $(AS) $(AFLAGS) -l $*.lst -o $@ $< But the .asx file weren't going to play nice. I could have hand edited (using search/replace in my favourite editor) but wanted to be able to diff(1) my altered files against the originals to avoid typos. I don't know enough A8 to debug a project this sophisticated in the face of typos. So I created a perl script to do the bulk of the spade work of turning .asx into .s files. In some cases the generated ca65 code relies on macros, of which more anon. #!/usr/bin/perl use strict; use warnings; use Getopt::Std; use Data::Dumper; my %opt; sub usage { print STDERR "Usage: $0 -i <input> -o <output>", "\n"; exit 1; } # a symbol at the start of the line is either # being used as a label, or defined by an equ sub leading_sym { my ($sym, $tail) = @_; #print Dumper($s); if($tail =~ /\s*equ/) { my $ret = "$sym$tail"; $ret =~ s/equ/=/g; return $ret; } else { return "$sym:$tail"; } } sub quot_param { my ($x) = @_; return "{$x}" if($x =~ /,/); return $x; } sub macro2 { my ($com, $x, $y) = @_; return sprintf("%s\t%s, %s", $com, quot_param($x), quot_param($y)); } my $sym = "(?:[_a-zA-Z][_a-zA-Z0-9]*)"; sub convert { my ($s) = @_; $s =~ s/^\*/;/gm; # start of comment $s =~ s/\bdta\b/.byte/gm; $s =~ s/\borg\b/; org/gm; # ORGs are bad, mmkay? $s =~ s/icl\s+'([^']+).asx'/.INCLUDE "$1.s"/gm; $s =~ s/^($sym)(.*)$/leading_sym($1,$2)/egm; $s =~ s/(mva|mwa)\s+([^\s]+)\s+([^\s]+)/macro2($1,$2,$3)/egm; $s =~ s/\b([a-z]{3})[a-z]{3})\b/$1\r\n\t$2/gm; # multiple instructions with : $s =~ s/(ror|asl|lsr)\s*@/$1/gm; # strip implied addressing mode in @ $s =~ s/a\(($sym)\)/addr_l_h($1)/gm; # address as 2 bytes, macro return $s; } sub proc { my ($xasm, $cc65) = @_; my $ih; open($ih, "<", $xasm) or die "cannot open $xasm"; my $all; { local $/; # set input record separator to undef, but in local scope, $all = <$ih>; } close($ih); $all = convert($all); my $oh; open($oh, ">", $cc65) or die "cannot open $cc65"; print $oh $all; close($oh); } getopts("i:o:", \%opt) || usage(); usage() if(!exists($opt{o})); usage() if(!exists($opt{i})); proc($opt{i}, $opt{o}); In the main, this simply turns asx command into their equivalent ca65 commands. The macros (which I intend to continue using in standalone ca65 development) are: ; handle som XASM stuff in a cc65 style .macro mva val, addr lda val sta addr .endmacro ; taken from the documention example in ; "12.4 Detecting parameter types" .macro mwa src, dest .if (.match (.left (1, {src}), #)) ; immediate mode lda #<(.right (.tcount ({src})-1, {src})) sta dest lda #>(.right (.tcount ({src})-1, {src})) sta dest+1 .else ; assume absolute or zero page lda src sta dest lda 1+(src) sta dest+1 .endif .endmacro ; the well known missing instruction; add without carry .macro add val clc adc val .endmacro .define addr_l_h(addr) <(addr), >(addr) with this support in place, I extended the Makefile: %.s: %.asx xasm2ca65.pl -o $@ -i $< So that make will turn an .asx into a .s when needed. There are a few things this perl doesn't handle, which I hand expanded. mva dlinit_data,x (z_dest),y+ I made a macro for mva, but ca65 doesn't do y post increment. There were only 3 instances, which became (e.g.) mva dlinit_data,x (z_dest),y iny The project link line is then (with main.o automatically assembled from main.s, which is automatically made from main.asx) main.com: main.o $(LD) -o $@ -S 0x9000 -vm -m $*.map -C ./sil.cfg $^ we shall talk more of sil.cfg. :-) So far. so good. It assembles but either doesn't link, or links and crashes (TBH, I forget which). Enter ca65, ld65, MEMORY and SEGMENTS. This is where by far the largest effort went. The original .asx files don't really allocate memory. Symbols are directly assigned memory values, and code/data is ORG'd into place. It appears that XASM generates an Atari load chunk for each ORG. ca65 doesn't really put code or data anywhere. It's a "pure" assembler. (sidebar: ca65 has an "org" directive who's behaviour I consider obscure to the point of deception) ca65 puts stuff in SEGMENTS, which (at this stage) are just buckets 'o stuff floating in space. Within a segment, code and data is placed in the obvious way. Variable areas (for scratch tables, pointers etc) are reserved using .res <bytes>. So the original placement of data tables: ; now declared in cc65 TAB segment, using .res ; Dlist equ $8600 ; Regtab equ $8700 ; Lintabl equ $8900 ; Lintabh equ $8980 ; Bartabl equ $8a00 ; Bartabh equ $8b00 ; shapemaskx equ $8c00 becomes: .pushseg .segment "TAB" dlist .res $100 Regtab .res $200 Lintabl .res $80 Lintabh .res $80 Bartabl .res $100 Bartabh .res $100 shapemaskx .res $400 .export dlist, Regtab, Lintabl, Lintabh .export Bartabl, Bartabh, shapemaskx, helpscreen_dlist .popseg I only exported (made public) the symbols so I could see them in the link map, and confirm that they have gone where they should. NB throughout the code, dlist is referred to in lower case, but was defined in XASM as mixed case!! XASM is case insensitive, ca65 is case sensitive. So; how to get stuff to an actual place in a memory map using ca65/ld65. I won't repeat the general docs, just give what I did. Since ld65 is VERY general, it just outputs files. It doesn't output Atari files. However, it's general model is powerful enough that an Atari file can be nicely expressed. Actual area of memory are described by lines in the MEMORY block: here's the memory for TAB (above, and ignore TAB_H for the moment) TAB_H: file = %O, start = $0000, size = $0004; TAB: file = %O, start = $8600, size = $0A00; Lines in the SEGMENTS block of the cfg file put SEGMENTS (from the source files) into MEMORY areas (again, ignore the TAB_H) TAB_H: load = TAB_H, type = ro, optional = yes; TAB: load = TAB, type = rw, define = yes, optional = yes, align=$100; The names of MEMORY and SEGMENT don't need to match, but they can; it spares my creative faculties. The practical upshot of this is that the TAB segment from the .s file is placed at $8600, which can hold upto $A00 bytes, which is what we want. But the Atari load model doesn't just haul a file into RAM starting at zero. It consists of chunks, each of which define memory locations (start and end). About which ld65 knows absolutely nothing. Building headers manually: what TAB_H does. The "define=yes" means that symbols for the memory used by the TAB segments are available during linking. This allows headers to be built like this, in the TAB_H segment. .import __TAB_LOAD__, __TAB_SIZE__ .segment "TAB_H" .word __TAB_LOAD__ .word __TAB_LOAD__ + __TAB_SIZE__ - 1 So that's just a two word thing, calculated from the symbols exported during the link. If you glance up again, you'll see that the TAB_H segment goes in the TAB_H memory location. So this just puts those two numbers in the file, followed by the actual memory image. So we've just made a loadable chunk, with a header. Kewl! It involves a lot of typing though. The original project used 10 ORG directives, and I didn't fancy it. Enter another perl script: #!/usr/bin/perl use strict; use warnings; use Getopt::Std; sub mk_seg { my ($s) = @_; print qq@\t.import __${s}_LOAD__, __${s}_SIZE__ .segment "${s}_H" .word __${s}_LOAD__ .word __${s}_LOAD__ + __${s}_SIZE__ - 1 @; } print "\t.pushseg\n\n"; foreach my $s (@ARGV) { mk_seg($s); } print "\t.popseg\n"; This just spews out header-calculating stuff as above. The Makefile has seg.inc : mk_seg.pl FONT MAPY MAPX PMU TAB > $@ Originally I had more segments than this to correspond to the multiple ORG in the MAPY data. I also wrote a perl script to dump out the segments from the final executable, and used a binary compare to chase out some typos in my macros and perl converter. I kept at this until my main code chunk ($9000) was identical to the xasm compiled one. And it all works! Now I have a large, working, ca65 project. BugBear
  14. I eventually declared enough cc65 SEGMENTS (I used a tiny perl script to print them) to fully emulate the load chunks being used in the XASM source. This meant that each chunk should be identical. A quick binary compare of the chunks chased out a couple of bugs in my xasm->cc65 textual converter, and a couple more in the cc65 macros I'm using to emulate some of XASM's handy features. IT LIVES! (and my thanks to all who helped, especially sanny) BugBear
×
×
  • Create New...