Jump to content

Tursi

Members
  • Content Count

    7,205
  • Joined

  • Last visited

  • Days Won

    8

Posts posted by Tursi


  1. 6 hours ago, Asmusr said:

    One of the things I learned from watching "HIGH SCORE" on Netflix was that some of the games for the SNES (Star Fox) had a co-processor on the cart. I have several ideas where something like that could be useful. The thing I don't understand if how a co-processor on a cart would interact with the normal CPU.

    On the SNES and the Genesis, the system was basically able to get a frame buffer image from the co-processor (over-simplified, but essentially). On the TI we don't have the bandwidth to do exactly that.

     

    Interaction between the CPUs isn't too hard though... a common approach is to have a small amount of shared memory that both processors can access (so on the TI, it'd have to be on the cartridge) that can be used to pass information back and forth. It's probably enough to have space for data, and a single byte for control of that data (so that as long as you write that control byte last, you don't need to be too worried about the integrity of the data). For data that must be carefully controlled, I've used Peterson's algorithm successfully in the past and found it pretty simple: https://en.wikipedia.org/wiki/Peterson's_algorithm

     

    • Like 3

  2. Looking really sharp, and great speed.

     

    For the sprites, I know some systems like the Apple sometimes used compiled sprites - rather than copying a texture they used code to draw the object directly. Not sure off the top of my head how you'd put scaling into that, but with that concept transparency isn't a concern - that's just part you don't draw. ;)

    • Like 1

  3. 18 hours ago, apersson850 said:

    Ehhm, you can't use the same register as a pointer and counter (R2). MOV also moves two bytes at a a time, so it takes four MOV to move eight bytes.

     

    But I understand what you intended to illustrate.

    Yeah, yeah, yeah. I don't think I'll debug my code anymore, I'll just post it to the internet. ;)


  4. It's exactly what you're doing, I just run GROMCFG. In fact I thought I was the only person using GROMCFG and was quite salty about it. ;)

     

    You enable UberGROM emulation in Classic99 by just defining a file to load in the INI with type 'U'. If you are building a new one from scratch, it can just be a dummy file. However, Classic99 doesn't emulate the embedded Editor/Assembler loader, so you /also/ need to include an EA GROM (or just use Playground to load it, which is easier now).

     

    This is the config I was using:

     

    [usercart32]
    name=Ubergrom test
    ; The UberGROM code load doesn't work today, but the U is needed
    ; to activate the emulation. Included is an E/A GROM relocated to
    ; >2000 so it can replace TI BASIC. A future version of Classic99
    ; will fix the UberGROM powerup emulation so that you can properly
    ; get into it. Copy the EA2K.BIN into the Classic99 mods folder.
    ; Do not load editor or assembler from this, they will not function.
    ; This is actually a part of my MPD project. ;)
    rom0=U|6000|2000|D:\classic99\MODS\EA2k.BIN
    rom1=G|2000|2000|D:\classic99\MODS\EA2k.BIN

    I'm not sure if the comment there is still true (about the code load not working), but you'll notice that my U line is a dummy load anyway. The EA2K.BIN is, as the comment notes, Editor/Assembler running from TI BASIC space so it won't conflict with whatever you're building.

     

    EA2K.zip

     

    Once that much is in place, you can run GROMCFG, do the configuration and save the result all on the PC.

     

    (Edit: Actually, I'm quite sure that comment is not true, cause I run some UberGROM configs from the INI without issue ;) )

     

     


  5. 16 hours ago, GDMike said:

    What I'd enjoy would be a "9900 assembly" BEST practices video series.

    That's probably something a lot would be interested in. How cool

    It ultimately ends up simpler than you'd think on the TI-99/4A. Like @apersson850 said, most of the time the fastest code is that with the fewer number of instructions - no matter how complex those instructions are. The basic tricks that work on most CPUs are true on the 9900 as well, so long as they do not increase instruction count (for instance, a shift is usually faster than a divide - but a divide tends to take more set up so it also wins on the instruction count). Sometimes you just have to think out of the box.

     

    This also explains why unrolling a loop is faster, though on the surface it looks like more instructions - but it's actually less. If you code this:

    LP
     MOV *R1+,*R2+
     DEC R2
     JNE LP

    ... and use it to move 8 bytes, then you get 8 hits on the MOV, 8 hits on the DEC, and 8 hits on the JNE for a total of 24 instructions. But if you unroll it only once:

    LP
     MOV *R1+,*R2+
     MOV *R1+,*R2+
     DECT R2
     JNE LP

    ... then you have 4 hits on each MOV (total of 8), 4 hits on the DECT and 4 hits on the JNE - a total of 16 instructions.

     

    Ermm... back to your original program. ;)

     

    • Like 1
    • Thanks 1

  6. 17 hours ago, BillG said:

    In your opinion, is it accurate?

    In my opinion, it's accurate. I used that cycle counting to play back audio and video at correct rate on hardware.

     

    GROM cycle delay is the only thing that is approximated right now. I don't take into account the differences between the various internal GROM states.

     


  7. But MOV and MOV4 will write to the wrong address... assuming DST is odd, it's not incremented before the first MOV, only (src) is, so the mov r3,*dst+ will actually write to (dst-1). Actually, since the assumption was src is even, and a movb is used to increment it, the MOV will actually read from the wrong address too, and you'll get two copies of the first byte written starting at (dst-1). The gotcha that hit many of us was remembering that MOV is incapable of accessing an odd address, it will always truncate to 15 bits.

     

    * assume src = A000 (bytes 11,22,33,44), dst = B001
    
    * cache first byte
     movb *src+,r3            * read A000, r3=11xx, src=A001
    loop:
    * fill two src words:
    * grab a word
     mov  *src+,r4            * read A000, R4=1122, src=A003
     movb r4,@r3lb            * MSB 11, r3=1111
     mov  r3,*dst+            * write 1111 to B000, dst=B003
    

     

    Even though the addresses are right, MOV is not capable of accessing them.

     

    I decided to run it a little farther, cause it seemed silly to me that you'd miss that... and as long as you can overlook the first byte being early... the data does land in the right place. Sorry about that! Fire alarm testing today, that's my excuse. ;)

     

     

    • Like 1

  8. Hmmm... that's probable. TIDIR will cache information from a disk image, so if Classic99 changes it underneath (due to a write), TIDIR will not know about the change and will perform its next operation based on the cached data. (Classic99 doesn't cache anything on purpose, to minimize that risk -- but that doesn't mean the TI software you are running doesn't!)

     

    This tends to be less of an issue with files than disk images, but I think in the end your advice is fair - be careful if multiple programs are accessing the same files.

     

    • Like 3

  9. 14 hours ago, InsaneMultitasker said:

    I played with the classic99 window a bit more, does this look right now?  I'll apply the fixes from #64 tomorrow or later this week. 

    You can use the fixed scales in the view menu to have it calculate an integer scaled size. :) At least if 1-4x is acceptable.

     

    • Like 1

  10. The simplest way I know to size memory is to map page 0 and the page under test (to different addresses, of course). Write to the page under test and see if it appears on page 0. If so, you just wrapped around - the previous page was the last one. That's probably the fastest way if it works.

     

    In either case, you only need to write once to each page. The goal is not a memory test, just to see how much there is. 32MB would only be 8192 pages... but yeah, I guess even that is getting up there for our little CPU. The GPL version could drop a tiny assembly test into scratchpad though, that would be quick enough.

     

     

     

    • Like 2

  11. For a start we can assume it's not your computer. If you get a chance, spend some time on trying to come up with a 100% reproduction case. If you can do that and I can follow the steps, I can fix it! But right now I don't know where to start. 😕

     

    • Like 1
    • Thanks 1

  12. 1 hour ago, mizapf said:

    I just checked WHTech. The cassette is on https://ftp.whtech.com/Cassettes/Mini_Memory/

     

    Lines_LBLA.wav: original recording

    Lines_LBLA_remastered.wav: digitally recreated (not sure whether it loads on real iron)

     

    Of course, disk is faster, so this is for the "original feeling". MAME does load each of them, I can't say whether Classic99 likes them. Tursi?

    Should work, never tried. Cassette is implemented in a development tool only as a curiosity, my interest in supporting it is low. ;)  But I figured if the 1200bps loader worked (and it did), most things should.

     

    • Like 1

  13. 3 hours ago, Asmusr said:

    That's cute, but back in the day I recorded my NES composite video directly into the line-in on my boombox, and my video was nearly as watchable as his with no effort at all. ;) (Nearly because the sync signals were pretty much non-existent, so it slipped all the time, and definitely would not play on a modern device).

    • Like 2

  14. 4 minutes ago, jonesypeter said:

    Sorry to resurrect a very old thread, but could anyone share the cassette tape with the line-by-line assembler for the Mini Memory?  I'm using Classic99.  Many thanks.

    It's included with Classic99 as a disk file rather than a tape. It's "DSK1.MM_LBLA.OBJ", assuming you haven't moved the disks around (DSK1 folder otherwise). Steps for loading on are page 53 of the manual.

     

    • Like 3

  15. This is where I got the Dragon's Lair boxes. They have a fun 3D designer. I didn't want to use the thin cereal-box cardboard so many releases get, so I went full corrugated. The downside is that the print is /not/ as sharp, but if you keep your text big enough it looks decent and is nice and solid. :)

     

    https://packola.com/

     

    (Edit: dug out my records, and they worked out to under $4 each.)

    • Like 7

  16. The GROM banking that is the default on the console works by also capturing a few of the CPU address lines, and determining which "base" address was used to access the GROM - treating each base as a unique 40k bank of GROM. (It's 40k, because the console GROMs 0,1,2 respond to /all/ bases, and so unless you want to force override them, they aren't included). So the normal access address is >9800, the next base would be >9804, then >9808, and so on until we run out of address space in the reserved memory. (All the various addresses are likewise incremented). The console itself scans the first 16 bases for GROM headers.

     

    One nice thing about your concept is that you don't need to tie in the extra address lines, and in fact all that memory can live in a single base - so you could still have multiple bases of very large GROM. The address increment isn't really a huge deal - after changing bank you'd be well advised to set the GROM address again anyway.

     

    I might steal the idea if there's no objection. ;)

     

    • Thanks 1

  17. 31 minutes ago, Asmusr said:

    But that won't work if the source and destination words are not aligned 🙂.

    Is there nothing you can do to change that? You're leaving nearly 50% of your performance on the floor otherwise. Maybe two sets of texture data, one even aligned and one odd aligned? ROM is cheaper than cycles. :)

     

     

     

     

     

     


  18. This is one of the very, very few cases I'd suggest using something like Duff's Device (although it's only called that in C, where it's an abomination).

     

    You can write a lead-in and a lead-out to handle initial and trailing bytes, then jump to the correct place in an unrolled loop (calculated with masking) that copies words. Since you're incrementing source and destination, you want to copy as many words as you can in any case.

     

    pseudo code only at the moment, but something like:

     

    -odd start address? deal with single byte to make it even

    -calculate div 8 count (mask)

    -calculate mod 8 of count (mask), use to jump into the correct start point of the loop

    -repeat:

    -  move 1 word

    -  move 1 word

    -  move 1 word

    -  move 1 word

    -  move 1 word

    -  move 1 word

    -  move 1 word

    -  move 1 word

    -  decrement count, loop if not zero

    - still an odd byte left? Deal with single final byte

     

    Not sure if I explained it well, but looking up the device will explain better. If you normally copy fewer than say 8 bytes, a mod 4 might work better than 8.

     

     

    • Like 4

  19. Note that warning is an indication that some of our disk devices won't work right with that DSRLNK, since the CRU base isn't loaded where it's expected. Not sure if I ever wrote down /which/ devices, though. ;) But nearly all the warnings are a result of being bitten one way or another!

     

     

    • Thanks 1

  20. 1 hour ago, Toucan said:

    Parsec maybe? With the multi-colored twinkling star background?

    The Parsec stars use actual colors. :) The vertical stripes on the ground show substantial aliasing, but the consensus was that wasn't done on purpose.

     

    Nothing on the TI I'm aware of uses the fringing on purpose.

     

    • Like 1
×
×
  • Create New...