-
Content Count
798 -
Joined
-
Days Won
1
Content Type
Profiles
Member Map
Forums
Blogs
Gallery
Calendar
Store
Everything posted by Sheddy
-
Here's a lift from the AtariWinPlus emulator Help about the different cart types and how to use them, below. Megacart format is the closest fit if you want 16K banks rather than 8K. I think you'll find Sunmark can make a real physical Megacart if you need, and I believe Bryan has some experience making XEGS carts. Note that 16K cart address starts at $8000, rather than $4000 as with the 5200 ($4000 tends to be used for extra RAM banks which is what Rybags is explaining about) I can tell you how to assemble carts with ATasm, but don't know how to with other assemblers. What one are you using - is it DASM? If so, then the 2600 guys will know how to do that - may be the REORG statement you need?
-
If you don't want to convert from MAC/65 you could try ATasm which is compatible with MAC/65 syntax Edit: OK. it's better with a working link!
-
thank you, I just begun to get a little worried when you did not reply back quickly, as this is a topic dear to my heart - English is supposed to be my first language, but it often still comes out wrong I was trying to think how best to place and set up the pointers (zero page pairs) for your 16x16 sprites on the 32 byte wide screen. On your quick calculations you were allowing 8 cycles additional set up for each pointer. I was struggling to see how to do it that fast: I got stuck on trying to set the pointers all on the left side of the sprite,each spaced out further down the column, and having trouble working out how to set them up quickly. If they are spaced exactly 128 bytes there may be a quick way to set them up though. My initial code attempt, example in post showing "adc #64" rather than "adc #128" was not really a very good explanation! This is where the pointers would be - spaced on the line left 4 lines apart (P is a pointer): P>XXXX (4 lines) P>XXXX "" P>XXXX "" P>XXXX "" [Edit: cycle count for above 4 pointers needs 16 ldy # or iny. set up of zero page for the special case of 128 spacing can be 9 per extra pointer I think. add in the 64 lda # sta (),y (64x7=448) cycles and I make the total about 507 cycles!] I think I see now that what you're saying is that if we set pointers up across the screen then they are quicker to set up. For some reason I didn't think that would be any good, but it is not bad! PPPP vvvv XXXX XXXX XXXX XXXX With the 16x16 sprite (all pointers at top) there is still the problem with the bottom half of the sprite with only the 4 pointers. We would have to "inc" all 4 pointers, and do all the "ldy #", "iny" for the bottom half again. Even with the very quick pointer set up, I think that would work out to be about 524 cycles total. If there are an extra 4 pointers set up the same for the bottom half it is better - no more pointer "inc"'s and all the extra "ldy"'s. the pointer set up is again very quick (I make it an extra 26 cycles). I think the total cycles will then be down to 514 for the whole sprite. With different pointer positions, can things be better though? that is the question... [Edit: I still don't quite see where you are going in your example in the quote above, but anything is worth a try for some more speed!]
-
good point. I work out a 256 byte wide screen 16x16 strip sprite would take 531 cycles. even with no extra cycles for going over page boundaries and maybe avoiding more accumulator loads because of duplicates in the strip, it's just not going to be worth it because of those LMS's
-
Hmm, NRV, converting my old 2 buffer zero page set up to a single buffer, I can't get to 8 cycles per pair when they are not spaced 256 apart. Did you have something special in mind for that? I'm thinking along these lines. Is there a better way?: (assume a has 1st pointer position low byte and x has high) adc #64 sta zp+2 bcc b1 inx clc :b1 stx zp+3 adc #64 sta zp+4 bcc b2 inx clc :b2 stx zp+5 ...etc. Maybe the 256 spaced zero page pairs is actually quicker for the 16x16 sprite then? [Edit: I suppose when they are spaced 128 apart we can do an eor #80 rather than add, so no clear carry is needed, but that doesn't help enough if there is arbitrary value in 1st pointer low byte?] [Next Edit: Ah - maybe if we had different routines for odd and even y position...] [Final Edit: not odd and even, just +ve/-ve, then it can be done in just over 9 cycles avg. so yes, 128 spacing is still better!]
-
sorry for any potentially jerkish comments in there. It's amusing that you found the better zero page pair spacing when I should have been able to use it as the example first time. I'll use the excuse that I'm not used to working with a 32 byte wide screen Please share if you find better places to position the zero pages than down the left side. By the way, I don't think the 256 wide screen will give you any great advantages with 16x16 sprites (or 8x8) and drawing only 1 screen buffer. It works better for me because I have to setup zero pages for 2 screen buffers otherwise.
-
By "fully strip sprite route" I just mean every line of the sprite has a zero page pair [edit: missed "with a 256 wide screen"], allowing the y to be set only once per column (strip) Yes, the memory does fill up fast, and there will be (a lot of) wasted space that is not very useful. I use 85 lines of screen, so have to use 21.25K of contiguous RAM to do it. For me there is maybe less waste due to using the much loathed screen flickering for more colours. each of my screens is 48 bytes wide (for clipping) and there are 4 screen buffers instead of the usual 2. some of the 64 bytes remaining on each line will just be used for other buffers/variables etc. but unfortunately most will be wasted. the big benefits I see from this width of screen is that assuming you can spare the zero pages it is really great for tall sprites: your low byte of each zero page is fixed - never needs changing (y register can do it all) setting up each zero page high byte only takes 5 cycles (inx, stx zp+3, inx, stx zp+5...for however many lines high) as you've already seen, setting the y only once per byte of sprite width will make a huge difference in size and cycles the y register can pass the screen buffer offset plus the x position. you just do an "iny" between each column of sprite there's a much higher probability of duplicated non-mask (opaque) bytes where the accumulator doesn't need loading again (I often get hundreds of duplicates in the larger space harrier sprites) there need be absolutely no extra cycle penalties for crossing page boundaries (this can add up to a lot for large sprites)
-
That's a nice optimization! but you have opened that door, now we must get in (what have you done..) I thinked about using more pointers, but I was too lazy to make the cycle counting.. you are right about the 16x16 version, but not only that, the 8x8 can also be improved: Assuming your sprite has 16 bytes, your unrolled code will have 16 "ldy #", and every time you add a new pointer you eliminate the half of the "ldy #" still present in the code. With that you save 2 cycles and 2 bytes for every "ldy #" eliminated. But you also add the 8 cycles and 5 bytes (approx.) of the new pointer initialization. Balancing that for the sprite of 8x8, 16 bytes: - adding the first extra pointer: --> we save (16/2)*2 = 16 cycles, and (16/2)*2 = 16 bytes --> we add 8 cycles and 5 bytes = we have a net gain of 8 cycles and 11 bytes per sprite (useful for the 64 sprites demo!) Balancing for a sprite of 16x16, 64 bytes: - adding the first extra pointer: --> we save (64/2)*2 = 64 cycles, and (64/2)*2 = 64 bytes --> we add 8 cycles and 5 bytes - adding the second extra pointer: --> we save (32/2)*2 = 32 cycles, and (32/2)*2 = 32 bytes --> we add 8 cycles and 5 bytes - adding the third extra pointer: --> we save (16/2)*2 = 16 cycles, and (16/2)*2 = 16 bytes --> we add 8 cycles and 5 bytes = we have a net gain of 88 cycles and 97 bytes per sprite (we also eliminate the "inc zp+1" that was in the middle of my code, that's an extra saving) (if we add another pointer we will only save some bytes, but we will complicate the code too much) by the way (for the 16x16 sprite) I could distribute the 4 pointers in this way: pointer1 = top left byte offset pointer2 = pointer1 + 256 pointer3 = pointer1 + 1 pointer4 = pointer1 + 256 + 1 or this other way: pointer1 = top left byte offset pointer2 = pointer1 + 256 pointer3 = pointer1 + 2 pointer4 = pointer1 + 256 + 2 please get out of your corner (and throw rocks at mine, now ) Thanks NRV *throwing snowball rather than rock for fun* you're right about the 8x8 sprite, I made a quick calculation mistakenly thinking the cycles would work out the same as the original code. on the 16x16 sprite: yep, quite right - I'm glad you worked that out yourself (just dropping every nuance of an idea in one go is no way to properly explore it) [Edit: Rereading: Oops - no sarcasm was intended (and I didn't intent to come across as a pretentious jerk by saying that)]. as you've worked out, it's all a case of balancing the size of the sprite versus how many zero pages it is worth setting up beforehand for the screen width. for a very wide sprite it sometimes works out best to have a zero page pair every line: smaller ones, every other line, etc. this is why I refer to it as not a fully strip sprite method (but it is when you have a pair every line) . since I've been using very variable size sprites, I let a utility work out the best line spacing for me for the maximum amount of zero pages I can spare. I would have all the zero pages pointing to the left side of the sprite, that way you can always do an "iny" instead of a "ldy #" as you go horizontally across the sprite. it's not so much of a problem with the sprite sizes you're using, but the extra cycles from going over page boundaries (I just call them "page faults") has to be taken into account for deciding optimal line spacing - sometime it may be worth using less of the y register range: for example, with arbitrary positioning of the original 16x16 sprite, you'll get just over 28 cycles on average extra due to page faults. with the 4 pointer version, you'll only get on average a little over 12 cycles
-
that soon adds up to quite a lot if you have to do it every scan line. for a 40 byte screen it'd probably be better to reference 6 lines using y reg, then add 240 to the pointer.
-
I would think dedicating some more zero pages to the job still makes sense for the larger 16x16 (and bigger) sprites even if you didn't want to go down the fully strip sprite route. (This is exactly what I've been doing until fairly recently, before moving over to the 256 byte wide screen strip sprites). For the 8 line high sprites I don't see any advantages over what you're doing already - like I said before - very nice! - it would seem to be optimal. Using a 16x16 non-mask (opaque) sprite as an example: The way things are at the moment, it can be best case done in 581 cycles (no extra cycles for page "faults"): 9 cycles per byte*4*16 bytes=576 5 cycles for the "inc zp+1" in the middle 581 total Setting up another zero page pair instead of the "inc zp+1" gives the bonus of not having to load the y register at all for the lower part of the sprite. The extra overhead for set up of another zero page pair is minimal as the low bytes are the same, and the high byte can take the same cycles as an "inc zp": 9 cycles per byte*4*8 bytes=288 7 cycles per byte*4*8 bytes=224 3 cycles for sta zp+2 5 cycles for inx, stx zp+3 (instead of inc zp+1) 520 total You'd also save some memory as well of course I can also recommend that it is definitely worthwhile writing a utility to help compile the sprites (especially with bigger sprites, it's too easy to miss duplicates and other optimizations) As always, everyone feel free to rip this apart if I've made any incorrect assumptions! *Puts on tin foil hat and hides in corner*
-
Very nice! I'm doing something similar on the Space Harrier cart (for 64K Ataris), but using a strip sprite method (the 130XE only demo doesn't do this). I haven't fully examined your code (too many macros for me at the moment!), but if it's pure speed you're after, you may get better milage dedicating some more zero pages to the job, combined with a 256 byte wide screen? Works well for me, especially on large objects - set the y register only once per column, and if you've got any duplicate non-mask data bytes in the stripe it only needs loading once. Anyway, just a thought, my way might not be as good for what you're doing.
-
Thanks Chris, It's Always a pleasure to work with a coder as talented as you. I'm really looking forward to seeing the game with the music in it. hang in there chris. I know you'll figure out the RMT Quirks. =) more perseverance than talent. Just to let you know it's working fine now - speed testing soon. Other testing shows we won't need to have separate 60Hz versions of the songs, which is good
-
...I'm just wondering: RMT takes appr. 5 to 10 % of CPU time. Did you reserve this much time for playing music? Especially the Stereo mode songs take nearly double CPU time (maybe 15% or something). maybe, afterwards, when the tunes are finished, there should be written a faster musicplayer (off course only for the ingame songs) it's a good point, and I haven't managed to get RMT into the game yet (I've been trying to get it working for a few days now - me using ATasm, not XASM doesn't help!), but the overhead for 4 track songs looks like it won't cause a major problem. You are right about the Stereo mode though - even using a minimal set of RMT features "rmt_feat.a65" for each song and assembling the "optimised player" (see RMT asm_src folder) it does need a lot of CPU time. There probably would be some noticeable slowdown. Until I get the code working in there I won't know if it's a big problem or not.
-
I'd just like to say thanks again, Sal. You've saved me many months of messing around and ending up with something not as good. I'm really impressed how close things are to the original now - better than I thought POKEY would be able to do. You and RMT are a great combo! Keep up the good work
-
I should say thanks too. I'll bear in mind your comments when deciding how to best implement the music
-
very nice. drums are really good now on Outrun tune and this
-
Thanks - this is a really good idea
-
Yes, working on it regularly now - moving it over to work on Atarimax carts at the moment. maybe in a month or so, I'll be able to show a couple more stages.
-
Ahhh...
-
I haven't tried this with RMT yet. It has a SFX feature, but I don't know how it works, or if it needs a separate channel. Not sure what you mean about CONSOL? If you mean use it similarly to a sample playback, then there isn't really time left in the code to do that. A lot of the SFX are high frequency and would need a high sample rate. There are a whole bunch of moving dlis going on and the main code is pretty tied up with drawing most of the time for something like a kernel. If that's not what you meant, then I'm curious!
-
Interesting, Sal. thanks! Pretty close version - Emkay's version has the awkward HardSynth tuning issues to work around (not knocking it - I love your instrument experiments, Emkay). for those interested, attached is original arcade version for comparison sharrir1.zip
-
don't worry - you can do DLIs on the blank lines too BTW - nice - looking good!
-
well... it's moving along, slowly. I do keep changing how things are done in my code though, which doesn't help either... back on topic: Zone Ranger has a fantastic sprite engine, but maybe not best suited to this game? the sprites aren't solid, so only the outline needs to be done, which could speed things up over the ZR method.
-
very nice
-
it's a shame it causes a blank line when used - it would be much more useful if it didn't.
