# bugbear

New Members

39

8 Neutral

• Rank
1. ## Problem using Lightspeed C with Altirra.

Why do you want floating point? BugBear
2. ## Tutorial on Action!

There are/have been/were two attempts at a cross compiling version of Action! http://gury.atari8.info/effectus/ http://www.noniandjim.com/Jim/atari/Action_Compiler.html I do not know how far they got, wether they were finished, or wether they still work. BugBear
3. ## What is a good benchmark for comparing the speed of languages?

Benchmark design is so hard that entire books have been published about it... BugBear
4. ## What is a good benchmark for comparing the speed of languages?

If your purpose is speed (which is what Action is for) you should probably think long and hard before using floating point. Unless you're writing numerical analysis software for the 6502, in which case - good luck with that! BugBear
5. ## Fastest Possible bulk copy?

In the modern age, a GPU can shift giga-bytes per second. But (for me) being full-on-retro is more fun. BugBear
6. ## Fastest Possible bulk copy?

>> This is how I would bulk copy 4kB (16 pages). Unrolling the outer loop instead of the inner - I love it! BugBear
7. ## Fastest Possible bulk copy?

The context I'm thinking of is a bulk copy - 4K or more, e.g. a whole screen being paged in during flyback that sort of thing. So - simplifying assumptions; source and dest do NOT overlap, source and dest are page aligned, size of copy is a multiple of whole pages. If we use the "natural" coding (two zero page pointers, indexed by Y, and unroll the loop enough, we can get asymptotically close to 12 cycles per byte; lda (SRC),y ; 5 sta (DST),y ; 5 iny ; 2 This is simple to understand, and (in fact) doesn't really need all my simplifying assumptions. To go faster, I think we need self-modifying code, using abs,x addressing (4 cycles). Consider the following fragment: src1: lda \$aa00,x ; 4 dst1: sta \$bb00,x ; 4 src2: lda \$aa01,x ; 4 dst2: sta \$bb01,x ; 4 src3: lda \$aa02,x ; 4 dst3: sta \$bb02,x ; 4 . . By using incrementing low bytes on the absolute addresses, we avoid the need to increment x. And we've saved a cycle on the load and store. This is 8 cycles per byte. So, we unroll the loop, and add "some number" to x each time round the loop. If we make the unrolling a power of 2, this will neatly come out to a page. However, we also need to perform the self modifying for each page, setting the (incrementing) page location of SRC and DST into the high byte of each absolute address; we need to set SRC at src1+2, src2+2, etc and DST at dst1+2, dst2+2, etc. Each of these STA \$xxxx takes 4 cycles. If we unroll by N, where N is one of 4,8,16 etc, the copying loop will take N*8 cycles (plus a handful more for the loop back, 8 or 10). For the self-mod, we will also have N*8, although the self mod code is run once per page; the loop is run (256/N) per page. So the total cycles per page, in an unroll of N is around: N*8 + (256/N)*(N*8* + 10) So if we make N too big, we'll do too much self-mod, and if we make N too small, the copy loop won't be fast. What's best? Setting up this equation in a dirty fragment of perl, I get: 4:2720 rate 10.625000 8:2432 rate 9.500000 16:2336 rate 9.125000 32:2384 rate 9.312500 64:2600 rate 10.156250 128:3092 rate 12.078125 (rate is cycles per byte). So 16 appears best. The result (at the hand-waving level) is a block copy running at 9.1 cycles per byte. I've probably left out some overhead in this, but I think the overall analysis is OK. BugBear
8. ## Moving a project from XASM to ca65; a beginner writes

The main difficulty that I can't see a way round in ca65 macros is the need for each "structure" to have its own label. In a full textual macro language, like m4, one could simply have a counter (to spawn new labels) and a stack, so that each structure uses its own label on entry/exit. Since m4 can overdefine (AKA update) macro values as it goes along, this is perfectly doable. But the ca65 (and many other assemblers) implement macros at the token level, so I'm not sure it's doable. Here's an example (I'm happy with how I could implement the comparison/conditions, BTW) IF(<=, 10) mva #10, z_cnt lda z_inc IF(>, 20) mva #50, z_hit ENDIF ELSE mva #81, z_cnt ENDIF Which expands to: IF(<=, 10) mva #10, z_cnt lda z_inc IF(>, 20) mva #50, z_hit ENDIF if_2_end: ELSE if_1_else: mva #81, z_cnt ENDIF if_1_end: The difficulty is spawning, and appropriately using, the if_1 and if_2 labels. Similar issue apply to the other structures. If I'm missing an obvious implementation, I'm all ears! BugBear
9. ## Moving a project from XASM to ca65; a beginner writes

https://en.wikipedia.org/wiki/Assembly_language#Support_for_structured_programming http://wilsonminesco.com/StructureMacros/ BugBear
10. ## Moving a project from XASM to ca65; a beginner writes

I should say that this is NOT my notion of a well made ca65 project; it is merely (and only) the smallest edit I could make to the xasm project that actually assembles and runs. This will be the start point for my ca65 exploration. Sadly, it already looks as if the token based macro language of ca65 is not sufficient for me to play around much with "higher level" constructs, so I may have to use another stage of external macro-ing (m4, or something) via the Makefile. BugBear
11. ## Moving a project from XASM to ca65; a beginner writes

I have attached the project files, in their edited and usable by ca65 state. Just typing "make" will do everything, at least on Linux. BugBear pub_cc65_Silence.zip
12. ## Moving a project from XASM to ca65; a beginner writes

I should have said; I am working on this project (at the moment) by editing the .asx files. Since my perl script only replaces xasm stuff, if I put ca65 text in there, it goes straight through to the .s file unaltered, where ca65 is happy to assemble it. BugBear