Jump to content

Tursi

Members
  • Content Count

    7,205
  • Joined

  • Last visited

  • Days Won

    8

Posts posted by Tursi


  1. Ah, but it's not over until the Strip Poker game is released...

     

    I took a stab at this actually.... but couldn't quite get the results I wanted (she kept winning!!)

     

    Hehe, actually, I didn't do the card side, though it looks like there's more than enough room in 30 lines of code. ;)

     

    Anyway, there are way too few characters available in XB to get the graphics I was going for, so I dropped it. Also, couldn't decide on the best colors (so I put in options! heheh...) Even though she didn't turn out well, not sure if I /should/ post screenshots.. maybe source then (NSFW):

     

    1 DATA 1F3FF7E4F8B8C6ECE0A81C8CCCE4E4671CE6FB2D98485F79F3776F0FCDE76004,00000080E061678FE7E1E9EFEFC70F0F010F0D1D7FFFFFFEFFFFFFFFFDFDFDFF
    2 DATA F0F0ECE834F4F4F47EBFBDDFDDCFEFFE000000000000000000000080C0E06011,00000000000000000000000000071FFF0000000000000000000000007EFFFFBF
    3 DATA 6323313918181C0C0E0606070303010126B380CCE1F379E4473F7F3F9F8FCEEF,3F7FFFFEFFFFFFFF7FCFDFDEEEF0FCFFFFF3FBFF7E98E3FFFFFF7F7FFFFF7F1F
    4 DATA FEFF7F7E7EFFEFF3FCFFFFFFF7FFFEFE314F8F00C19FBD10CCCCD1F1F3939133,F3FFFFFFFBFDFEFFFFDFDFFFDBEDFC06FFFFFFFFFFFF7F9FCDFBFFFDFFFFFF7F
    5 DATA 00000000000000000000000000000000F1586D3C16080C070000000000000000,FF3FF7FF7FB31FDF0000000000000000C3F0FFFFFFFFFFFF0000000000000000
    6 DATA FCF81EF7F1FCFFFF0000000000000000666FC0C18F4FE7E30000000000000000,03007FFFFFFFFFFF0000000000000000FFFF1FC7F1FCFFFF0000000000000000
    7 DATA 1E0663C09010004000000080000000060C2600080840030911130F0C0C802004,00000000006003070700000040410000010A010D3CFEFDF0C60FC3C98919380C
    8 DATA 9070CCE8242464D01218199888C8C8C600000000000000000000000040002010,00000000000000000000000000031F7B0000000000000000000000003EFFF182
    9 DATA 630300310000080C0000040202000000000100C0E04108E047073B3500010C87,14304446C2C180786C8686CEC46060E61E22607A3C00C1FF0C00014140002000
    10 DATA 6463202040C0C100080F0CCC262706042104040080040C1084C00030A0800000,A202031F0340C00000000000000864040E5F3D8CCDEF6E0C0D01010000000000
    11 DATA 0000000000000000000000000000000080000120000000060000000000000000,2000E0E030300C18000000000000000000200C01000000000000000000000000
    12 DATA 08000087200006040000000000000000000000808104E0E00000000000000000,00003CC000000000000000000000000000000000000000000000000000000000
    50 C(0)=9 :: D(0)=10 :: C(1)=5 :: D(1)=6 :: C(2)=3 :: D(2)=4 :: C(3)=11 :: D(3)=12 :: C(4)=15 :: D(5)=16
    51 PRINT "0-CRAB":"1-AVATAR":"2-FROG":"3-SIMPSONS":"4-ANTIQUE" :: ACCEPT VALIDATE("01234")SIZE(1):A$ :: B=C(VAL(A$)):: F=D(VAL(A$))
    100 CALL CLEAR :: FOR A=12 TO 35 :: READ A$ :: CALL CHAR(A*4,A$):: NEXT A :: FOR A=0 TO 47 :: R=INT(A/16)*2+(A AND 1):: G$(R)=G$(R)&CHR$(A+48):: NEXT A
    110 FOR A=3 TO 14 :: CALL COLOR(A,B,1):: NEXT A :: G$(0)=SEG$(G$(0),1,5):: G$(4)=" "&SEG$(G$(4),2,7):: CALL SCREEN(2):: CALL MAGNIFY(3)
    120 FOR A=0 TO 4 :: PRINT G$(A):: NEXT A :: FOR A=1 TO 12 :: CALL SPRITE(#A,92+A*4,F,145+INT((A-1)/4)*16,17+((A-1)AND 3)*16):: NEXT A
    130 GOTO 130

     

    As for the smack talk, I get enough of that in the Jaguar section. ;)

    • Like 1

  2. Parsec is still somewhat technically unsurpassed. Scroll, speech, scratchpad, pixel perfect terrain collision detection while trying to maintain gameplay, that’s something ! The prototype Vanguard had very nice full screen scrolling using some of the VDP “tricks”.

     

    Moon Patrol offers similar smooth scrolling of its mountains with parallax to boot, but it doesn't get much attention. Of course, it doesn't offer the rest of that, which was indeed well done. I never saw Vanguard, I wouldn't mind seeing what tricks it used.

     

    We (well, I) took Parsec's scroll routine apart on the Yahoo list and I think it could have been optimized some. It hasn't been beaten, I feel, more because nobody's done it than because it's the best that can be done. Parsec achieved some kind of legendary status for that little piece of smooth scroll code. ;)

     

    I take apart the Parsec code here: http://tech.groups.yahoo.com/group/ti99-4a/message/62752

     

    Since that's the point of interest, I'll copy and paste that one. The point was that someone mentioned that Parsec copied the scroll routine to scratchpad RAM every single frame.

     

    Parsec does indeed write and execute a small piece of scratchpad RAM

    every frame, from 0x83E0. However, there's a lot of code executing in

    the scratchpad, which does not get written repeatedly. Here's a function

    at >8354:

     

    65D0 bl @>8354
    8354 movb @>8800,@>833c(R3)
    835A inct R3
    835C jlt >8354
    835E b *R11
    8360 mov @>833c(R4),R1
    8364 src R1,0
    8366 movb R1,@>8c00
    836A inct R4
    836C jlt >8360
    836E b *R11
    

    >8360 contains another subroutine:

     

    6714 bl @>8360
    8360 mov @>833c(R4),R1
    8364 src R1,0
    8366 movb R1,@>8c00
    836A inct R4
    836C jlt >8360
    836E b *R11
    

    Parsec does not appear to write the scroll routine to scratchpad every

    frame. I tested this by adding code that would flag all executed bytes

    in the scratchpad RAM, then emit debug if they were written to. After

    the game was up and scrolling, I breakpointed the emulator and cleared

    those flags, then resumed. There was no debug emitted. When I started

    the game, that's when I saw the messages about data at >83Ex being

    executed and written.

     

    The code that ends up there, and thus gets rewritten, looks like this at

    game start:

     

    7F34 bl @>83e0
    83E0 movb @>9000,R10
    83E4 jmp >83e6
    83E6 jmp >83e8
    83E8 jmp >83ea
    83EA b *R11

     

    Note those three JMPs are NOPs, so all this does is a delay. Since it's

    reading from >9000, this is probably just a little routine to safely get

    the speech status without touching the 8-bit bus. This is the only code

    that is written to scratchpad every frame, the rest lives there

    permanently. It's only updated when it thinks it might be speaking.

     

    By breakpointing on that code, it looks like Parsec has a 5-frame scroll

    sequence, and it writes and calls this function for every frame. The

    five frames seem to be:

     

    -Scroll first quarter of scenery

    -Scroll second quarter of scenery

    -Scroll third quarter of scenery

    -Scroll fourth quarter of scenery

    -Scroll stars and animate ship flame


  3. K, short on space, what's the cycles situation? With TI graphics being highly compressible, even simple compression like RLE would make a lot of room. If cycles are tight, since you're just short of fitting, compress/process just enough, x line/characters needed to fit everything without sacrificing characters.

     

    You're sort of crossing ideas here, though. For the scroll trick I am describing to work, you have to have everything predefined. Each table /must/ take 2k because that is the granularity of the VDP. Pre-defining it allows a very fast scroll - just a couple of cycles until you cross a character boundary and have to redraw it.

     

    You could try compression, but even full uncompressed writes to VDP would be tough to redescribe the entire character set in a single frame, unfortunately. (2k of data won't fit in vblank, making the scroll visible). You could do it if you restricted the number of characters - but then you are in the same situation as I described above. ;)

     

    Restricting the number of characters is not necessarily a killer, it just depends on your background detail. Also, you can do a combination of the two - you have enough characters to define one screen, and dynamically update the character set as you move through the world. You just have to leave space in your character sets for the other VDP tables.. the SIT is the largest at 768 bytes, so that's 96 characters, leaving 160 characters available for the pre-defined scroll. You'd likely want sprites... you could use those 96 characters from one of the other tables for a fully distinct sprite table. That's enough characters for 24 of the 32 sprites, if you don't mind dynamically loading the patterns (which can be more easily staged in vblank). You could likewise double-buffer the sprite patterns, or even more than that, since you'd probably have 6 of the tables with unused space. :)


  4. Sorry that there is not yet a more intuitive way... core functionality has always been high priority in the very limited time I get to work on it. :)

     

    There is one shortcut - if you are loading ROMs in the 'V9T9' format, which is also what MESS uses, you'll note they end with C.BIN, D.BIN or G.BIN. If all the files in the cart end like that, Classic99 has an auto-loader under Cartridge->User->Open, just select any one of the three files and it will find the others.

     

    If it's not in that format, or it's one of the newer bank switch carts, then there's no autodetect, and you must use the INI file method. It's on the TODO list to add the MESS RPK format, which will simplify things nicely (if I can find a non-GPL unzip library....)

     

    As for why they are separate files.. just legacy. Each file is a dump from a single type of memory, usually. C is the Cartridge ROM at >6000, up to 8k. D is the second bank of Cartridge ROM at >6000, up to 8k, used for Extended BASIC (and Atarisoft carts). G is GROM space, usually concatenated together into a single file, loading at >6000 and supporting up to 24k.

     

    The MESS RPK format zips the files together and includes an XML layout file, which is surely the best way to go forward. XML is nicely extensible so we should be able to add manuals, screenshots, and cover scans to it as well. ;)

     

    (edit: and I need to update that doc, apparently.. as far as I remember it supports up to 16 ROMs per cartridge now ;) )


  5. To put a little more detail... in SSA, the bosses get characters 121-255. Because they are a block of graphics in a fixed layout, I only actually need one extra character per horizontal row to do the smooth scrolling - the largest boss is 12 rows by 10 columns, which is 120 characters. Since there are 12 rows, I need 12 extra characters for it to shift into, so it uses a total of 132 characters.

     

    My game defines pattern tables at >1000, >1800, >2000, and >2800. The offset of >0800 is chosen because this is the minimum step in the VDP.

     

    When the boss needs to come in, I load the default patterns into the first table. I then have a small piece of code that preshifts by 2 pixels into each of the other character sets. (The patterns from 0-120 are static in all four tables, so they don't appear to change). It causes a noticable hiccup in the game, but I will smooth this out by doing it over a number of frames.

     

    Ultimately, when moving the boss across the screen, I track it's column by pixel value divided by 2 (so, from 0-127). We need to get two values from that: the pattern table to select, and the horizontal column to draw at. The pattern table is easy, we just take the two lower bits with an AND. The value to write to the VDP register is the base value (>1000 = >0800*2, so '2' is the base) + that, so : (0x02 + (bc&0x03)). That gives the correct sub-character offset. The actual character is easy, too... it's the column divided by 4, which is just a shift operation (bc>>2). Draw the boss normally at that column, and it's done.

     

    With the calculations done that way at draw time, I never need to consider the scroll mechanism when moving the boss, I just update the 'bc' (boss column) as I choose.

     

    The other nice thing... even with all that, there's still plenty of room for a full sprite pattern table in VRAM, too. So this doesn't limit anything, really, it just uses a little more of the often-unused VRAM space. My VRAM map looks like this:

     

    // VRAM map:

    // >0000 Screen Image Table

    // >0300 Sprite Descriptor Table

    // >0380 Color Table

    // >03A0 (Unused)

    // >0800 Sprite Pattern Table

    // >1000 Pattern table (scroll 0)

    // >1800 Pattern table (scroll 2)

    // >2000 Pattern table (scroll 4)

    // >2800 Pattern table (scroll 6)

    // >3000 (Unused)

     

    There's still a bit over 5k left there. Unfortunately each pattern table takes 2k, so you can't do 8 full tables, it would use all 16k of VRAM (unless you were willing to sacrifice some characters for the other tables, of course! Might not be a bad alternative to my half-table suggestion above.) Scrolling by 2 pixels only needs 4 tables, which is just 8k.


  6. I wonder what could be done by just storing 8 versions of the background so you can scroll smooth? I think you would need a background with a repetitive nature too.

     

    I've been messing around with this concept.

     

    I did the math out of curiousity to see what it would take to do a fast smooth scroll. There's enough VRAM in Graphics0 for 4 copies of the full character set. If you wanted, then, you could have 4 different shift positions for 128 characters (since a single character needs two slots when it's shifted). Then you can "scroll" the screen by 2 pixels at a time just by changing the base address of the pattern table. I'm successfully using this technique in my Coleco Super Space Acer game right now to move the bosses smoothly horizontally (next release will show that).

     

    If you wanted single pixel accuracy, you could divide the character set in half, and have each table define "two" sets of 128 characters (in theory only, not in hardware). Scrolling would then entail changing the pattern table base and redrawing the screen (768 bytes). A little more expensive, but still feasible in vblank. However, with the pre-shifting, you'll only have 64 characters to work with.

     

    There are a couple of limitations with this. The first is that if you have two characters touching each other, you always need them touching each other through the scroll, so each combination would need to be accounted for. The second is that you're somewhat out of luck when it comes to color sets - first because they'll be artificially smaller due to the character overlap, second because you can't put two different colors side-by-side (you'll need at least one character of background color between them), otherwise colors will distort as they "scroll".


  7. The problem with whtech is they require prepending http:// to an ftp url, which is.. unexpected.

     

    Yeah... I don't like that interface... but the true FTP interface is still available too. :) ftp://ftp.whtech.com ;)

     

    digisynt.dsk isn't in the subdirectory anymore, and whtech's search and a google site search come up with no results for it. There's still reconstructing it from peekbot if need be.

     

    Maybe.. but I don't really see the need. I'd rather spend time on a 4-bit playback routine than resurrect an old 1-bit one. ;) I guess some people would like to play with it.. but the most basic loop is only a dozen or so assembly instructions. :)

     

    What's the highest precision timer available?

     

    The 9901 has a timer countdown mode, but if you want it to trigger an interrupt you run into a big problem - the interrupt is completely handled inside the console ROM with no user hooks. The console assumes that /any/ interrupt can only come from one of three places. Either it comes from the VDP for vertical blank, a peripheral card for peripheral interrupt, or the 9901 timer for cassette operations. This is all hard-coded.

     

    Jeff Brown documented a trick whereby you could get control from the cassette routine for the 9901 timer. It requires disabling VDP interrupts and setting a few flags to confuse the system into jumping to your code. There's a fair bit of overhead in using this trick as the system checks numerous interrupt sources (including all peripheral cards) before giving you control. As always, Thierry Nouspikel documents it quite well here: http://nouspikel.group.shef.ac.uk//ti99/tms9901.htm#Timer%20mode

     

    I remember Jeff using it for playback of sampled sound, but I seem to recall he wasn't impressed with the performance of that method.


  8. http://ftp.whtech.com is up right now.

     

    Lately people have been reporting it up and down a lot... I wonder if there's a problem with the server.

     

    This topic comes up every couple of years on the groups, yeah. Most TI topics do come around in cycles, it seems. :) (That's probably true with all systems).

     

    Hm. Digisynt, huh? 1994 would post-date the free package I uploaded to the Ottawa TIUG BBS (which would have been '92 or so), but I doubt that it's a rip-off (since it's an obvious idea and his utility set differs, plus the author wasn't trying to profit by making it commercial). Just disappointing that it was reviewed as "the only" solution. I never was very good at getting my stuff out there. ;)

     

    I can't try the game right now... I would expect that it freezes the game while it plays the sample, especially since it's running from BASIC (without a good interrupt timer, which we don't /really/ have, there's no other way). I did a test once that showed I could still get reasonable sound quality leaving sprite automotion enabled, though... ;)


  9. The floppy is the slowest that I've found, yet its 31.5KB/s is more than the 12K needed for 12bit @ 8KHz, so bandwidth doesn't appear to be the limiting factor.

     

    You're assuming some kind of DMA, I think. There isn't any.. the CPU needs pretty tight control over the FDC to read data from the disk. Have a read at http://nouspikel.group.shef.ac.uk/ti99/disks.htm

     

    In addition, seek time, finding the sector you want on the diskette, and stepping the head all take very notable amounts of time. 31.5K/s is great but there's no practical case in which you can sustain it. The entire floppy disk itself is only 90K per side.

     

    If you can make it interleave useful work with the disk controller, that would be very awesome code to see. There's plenty of delay loops in there to fill, and nobody has attempted it on the TI to my knowledge. Unfortunately, there are other controller cards than just this one (Corcomp and Myarc also made common disk controllers, I believe with different chips), so you'd be limited. But even tied to a TI controller, it would be difficult. There are no useful interrupts you can use so you'll have to cycle count everything, and to get the precision needed for audio playback as well I suspect may be too difficult, even is trying to playback just a single track (ie: without stepping the head).

     

    Nothing is impossible, of course. One thing that I see that is interesting, is that the FDC /does/ support a "read track" command. It appears you tell it to read the read, wait 15ms, then come in and read the data out a byte at a time. Potentially you could use that, interleaving the playback of audio with copying read data from the FDC to the playback buffer (or stepping the head to the next track).

     

    The story isn't over of course, you had said the CPU was already at 100%, so the possible playback rate is going to be less from adding steaming handling.

     

    The CPU can be at whatever you want it to be for sample playback, it just affects your sampling rate. So long as it doesn't drop too low, as to make it unrecognizable.

     

    Thierry's doc really makes me wonder if streaming from disk might be possible after all. A 90k SSSD floppy has 2k of sector data on a track. Some parsing of the track data is needed, but if you picked out one byte between each sample, and did 4-bit samples, then you'd have 4096 samples (at least). At 8kHz that gives you half a second to step, read, and parse a track. As long as it can be done without halting the playback process....

     

    Damn you, now I'm thinking about how to do it. ;)

     

    3.3MHz/34 cycles per byte~=95KB/s tr for preformatted streaming from cartridge, yes? Next up would be estimated cycles for parsing packed.

     

    Playback from cartridge will be the same speed as playback from (normal) RAM (they both have the sample multiplexer/waitstate interface) - at this point just build it and see what you get. ;)


  10. After looking at the pics of the motherboard it's obvious I was over thinking what TI did. I was expecting a custom chip and found something more like a simple 16 bit IDE to 8 bit CPU interface. What I suggested with the RAM upgrade was way more complex and didn't make much sense since the buss doesn't go through a special chip that does both the multiplexing and wait states. Not sure why it takes so many wait states but then I'm not familiar with the 9900.

     

    Have a read through http://nouspikel.group.shef.ac.uk/ti99/titechpages.htm - nobody has documented the hardware of the machine better. He has a fantastic description of the wait state generator, including what happens when you disable each of the waits and a way to run the entire system with NO wait states at all.

     

    BTW, on the 128K upgrade... I'd try modding it to bank out the ROMs leaving RAM in it's place. Then bank the upper portion of RAM as well. That might prove most useful. That would leave 3 RAM banks in the RAM area and 1 in the ROM area.

     

    There's plenty of ways to do it. However, banking out the ROMs is a much bigger task. All I did was attach spare RAM pins to spare IO pins. It was never meant to be a useful mod and my document attempts to emphasize this.

     

    The reason it's not stable is that 'high' on the 9901 is only 2V.. not quite high enough to be reliable on the RAM chips. I recently disabled that mod in my machine so my chips just run as normal 32k.

     

    So how much difference in speed is there on a machine with the 16 bit RAM upgrade?

     

    The speed boost is usually considered to be about 50% on average, since all access to external devices, sound, and VDP still trigger wait states.

     

    One thing about the 9900, it's register setup would probably work well with how GCC works if someone wanted to do a port.

     

    I started it, based on the PDP-11 port, and had some success, but got busy with other projects and left it half-finished: http://harmlesslion.com/hl4m/viewtopic.php?f=1&t=324

     

    My problem is I really didn't follow a lot of what GCC was doing. The point at which I left it I knew what the next issue was, but I had to go back and redo the conditionals, because I'd backed myself into a corner. To pick it up again I'd probably be mostly starting over, since I don't remember much of what I learned.


  11. If I followed the info correctly, the 9900 is missing A15, but technically it's missing the bit that would normally select the odd or even BYTE since the CPU has a 16 bit data buss. FWIW, on most CPUs that's A0.

     

    More or less correct. TI numbered the CPU bits backwards, so on the 9900 the "missing" bit (the LSB) is A15.

     

    To get around this there is interface hardware between the CPU and RAM that inserts wait states while it multiplexes the buss to load two consecutive BYTEs from the 8 bit buss to form a 16 bit word the CPU normally sees.

    External logic tells the multiplexer whether an address is 8 or 16 bit.

     

    Close enough for government work. ;)

     

    The RAM expansion linked to above replaces the two 128 x 8 bit scratchpad RAMs on the 16 bit buss with two larger (32K x 8 bit?) RAMs. At the very least that provides access to the 16 bit data buss, many of the address lines and some control logic. Then the other interface & board mods alter the memory mapping hardware that tells the multiplexer what addresses are 8 or 16 bit and maps the new RAM based on switch settings.

     

    (edit) My mistake, I didn't read the mod description well enough. This version DOES replace the onboard RAM chips. This isn't too big a change from the discrete logic version - the PAL would make it easier. The new 32k RAM chips provide enough RAM for the entire memory space, it's just when you enable it that counts.

     

    The other changes change the circuitry that enables/disables the multiplexer and wait state generator, yes.

     

    The 8/16 bit mode switch must trigger several things.

    First, it tells the multiplexer if the RAM address being accessed is 8 or 16 bit so it goes into the desired mode.

    <edit> technically it tells the PAL to tell the multiplexer.

    Second, it changes how the RAMs respond. The extra address line (chip select, whatever) from the multiplexer is ignored if the mode is 16 bit and both chips trigger at once to form the two haves of a 16 bit word on the data buss.

     

    No. All the switch does is restore the original performance of the machine by optionally allowing the wait state generator to still respond to 32k memory accesses. The memory access remains a 16-bit access after doing the modification, the machine just waits the usual amount of time before continuing. It has to, because the multiplexer circuit can't get access to the full memory on the 16-bit bus.

     

    However, if the mode is 8 bit, the PAL uses this new address line to determine which chip responds on the data buss.

     

    As far as I know, the guts of that modification are the same as the discrete logic version, which Mainbyte also covers here:

     

    http://www.mainbyte.com/ti99/16bit32k/32kconsole.html

     

    Part 1 doesn't cover the speed switch, but I posted a detail of how this mod works here:

     

    http://harmlesslion.com/text/128k%20On%2016%20Bit%20Bus.pdf

     

    (disclaimer - I need to update this - the 128k is not stable and I know why, just haven't updated the doc. The 32k description is solid.)

     

    You are totally thinking software... I'm thinking about what's going on at the hardware level. To the program it's transparent except for the wait states and some differences in the memory map.

     

    Byte access on the 9900 is always a word access. The processor reads the entire word, and modifies the requested byte internally, then writes the whole word back out. The CPU itself has no concept of the multiplexer circuitry or even any need of it, it just knows SOMETHING keeps telling it to slow down! ;)


  12. Matthew, me, and a couple of others put those questions together, Matthew led the action. Karl's pretty awesome.

     

    What you mention as incorrect information I think is a misunderstanding of what is being said.

     

    The NES PPU is probably best described as "inspired" by the 9918 -- it's not a clone and is not compatible in any way at all, but it functions in a similar manner and even back in the day there were rumours that it was based on the 9918.

     

    Sega used their own variants of the 9918, first in the Master System, then the Game Gear, then the Genesis. These three chips all vary some. The Master System and Game Gear are reportedly compatible with the 9918, the Genesis drops the 9918 modes. But they are a different line than the Yamaha 9938 and 9958, and are not compatible with those either.


  13. VDP bandwidth estimates are a different trick than audio chip estimates -- the ColecoVision and MSX comparisons are only really useful for the VDP, which has its own timing requirements. The sound chip doesn't (well, it actually does, but one issue at a time), so only the TI comparisons are really valid, because it's entirely a CPU/architecture issue, that is, how fast can you move a byte from point A to point B.

     

    Samples won't usually be stored in bytes. If you wanted to preformat them for the sound chip, though, you could achieve that max throughput. Most of the timing discussion occurred on one of the Yahoo groups six months ago, and I had the logic analyzer out to get the true values. Unfortunately I only put a tiny summary at the end of the thread at 99er.net:

     

    The fastest instruction for moving data to the VDP is something like MOVB R1,*R13, and takes 26 cycles to execute, which is just over 8.6uS.

     

    This would be valid for the sound chip, too, which is subject to the same issues with the wait states and multiplexer. Of course, that instruction as it stands is not very practical, you'd need a way to get different data into R1. It's more likely you'd use MOVB *R1,*R13 as the fastest useful instruction, and that would still need external hardware (likely) to put different data at R1's pointer. But if you did that, the indirection adds another 4 cycles, IIRC, and another memory access. If that's to cartridge memory, then add another 4 cycles for wait states, and we're up to 34 cycles per sample.

     

    In most useful scenarios, though, the audio data is going to be packed, either 4 bits or 1 bit, meaning you will need to split the byte and merge in the sound chip control bits.

     

    IMO theorhetical maximum is not very useful in this instance. Practical maximum (ie: an actual application which we can group optimize, maybe) is probably more useful.

     

    Disk access is pretty tough.. I would be surprised if anyone could do a system streaming audio from the floppy. I would be very sure that such an action would require direct control of the FDC.


  14. Note that memory is an issue. The TI doesn't have a lot, and sampled sounds eat it fast. At 8kHz, a 4-bit sample consumes 4000 bytes per second (1-bit samples would be 1000 bytes per second). A 12-bit sample (using the proposed logarithmic approach) would consume 12000 bytes per second. You can lower or raise the sample rate, of course.

     

    The second issue is there is no way to accurately time the sample rate in the TI. You will have to do it entirely in software. To that end, it's usually easier to time your code and resample the audio outside of the TI to match what you're playing. ;) The only timer in the machine is the 9901, but the overhead of tricking it into interrupting your own code generally costs too much time. (Maybe polling it would be okay, I've never tried that..)


  15. TMNT exactly used the cassette port to sample it's audio, and a small assembly routine to play it back. As such, it's 1 bit audio and takes 100% CPU. I didn't measure the bit rate at the time, but playing with it in the last couple of years suggested around 9kHz. Sound F/X, Barry was saying he got around 11kHz, and he plays 4-bit audio. Both use the same technique - set the sound generator to the highest frequency, and then toggle the volume level.

     

    Emulators may need support, because for the most part, they can't accurately emulate the 40kHz-or-so output at the top of the sound chip's range, causing aliasing and other issues (to do so, they would have to be sampling at 80kHz or better, which most sound cards won't do). By default, Windows itself only mixes at 22kHz, meaning anything over 11kHz will be screwed up. There are ways around this if the emulator is clever, of course. I don't know if MESS manages to work with these programs, but I believe in the past at least, it did not.

     

    Classic99 used to work around this by emulating a DAC mode... if the frequency on the sound chip was set above a certain threshold (I think it was 22kHz), then audio changes were put into a DAC stream instead of trying to manipulate the frequency that high. This worked. However, all that code has been ripped out in favor of the new sound engine. The new sound engine should be able to reproduce the sampled sound, but it doesn't have a high enough accuracy right now, as it processes only on the vertical blank. So Classic99 can not play any sampled sound right now, pending my replacement of the timing system.

     

    Note that all this is less of a problem on some other systems that can simulate a true DAC, instead of just trying to set the frequency so high you don't notice. The TI sound chip can ONLY output square waves. Even clones of the same chip, like that used in the Sega Master System, can output a flat line which is better for playback of digital audio.

     

    As far as the hardware goes, sampling from the cassette port works, but at 1-bit, just like the Apple 2. (It was the Apple that inspired the digitizer I wrote and used for the TMNT title music). These days, though, it seems silly to reduce quality that much, when PCs make it so easy to create 4-bit samples. These will sound a lot better. For better volume, Sound F/X drives all three voice channels with the same volume at the same time.

     

    An interesting theory (recently proven on the ColecoVision and long ago proven on the MSX) is that the sound chip is capable of better than 4-bit accuracy. This is because the attenuation on a channel is in dB, which is a logarithmic scale, rather than linear. This means that different combinations of volume across the three voice channels will in fact produce "in-between" volume levels. My first pass at the math showed good density of evenly spaced volumes up to about 9 or 10-bits of resolution, but the ColecoVision fellow (was it Daniel? Sorry.. I forget, but he's here!) disagreed with my math, and I wasn't sure of his, so I need to revisit it ;). Anyway, if you want to pursue that, start with the datasheet and do your own math. :) Either way it sounds even sharper than 4-bit. My own thinking was that you could improve your apparent sample rate too, by "chasing" the correct level and changing just one channel per sample interval, doing the largest changes first, and then the smaller ones. That would be a bit lossy and would need testing (but would be a little ADPCM-like ;) ).

     

    There's one other way to output 1-bit audio -- toggling the 'Audio Gate' pin on the 9901, which controls whether audio from the cassette port is mixed with the output audio. Turning it on and off causes a tiny click, even if nothing is connected, so you can output 1-bit audio this way. The game Perfect Push uses this technique on it's title page to say "Golden Games Presents Perfect Push".


  16. Complaining about the lack of comments, especially in a retro project? ;)

     

    Not every contrary post is a complaint. :) I think people are just showing their own take on a concept.

     

    This project is, of course, one of the most impressive ones I've had the pleasure of seeing running. There are so many neat little features in it.


  17. Clarification needed..... The new Cl99 runs great on the machine at the house. On my office machine it runs incredibly slow but that is due to the machine.

     

    If the old version runs well and the new version runs slowly, the only place that is likely at fault is the audio system. Options->Audio. Just make sure the SID card is off (very CPU intensive) and set the audio sampling rate back down to 22050Hz -- that's what the old version ran at. Hell, kick it down to 11025Hz if you still need some performance back. Nothing else in there should be taking more CPU than the old version.


  18. So how exactly to you indicate to the linker where you want the segments? Seems a bit cumbersome. Wouldn't it be easier to have the segment address as part of the statement?

          DSEG >8320
    SCORE  BSS  2       * Uninitialized data, 2 bytes
    TIMER  DATA 0       * Initialized data, this probably can't be done in a ROM-based program
    .
    .
    .
    

     

    Also, thinking about it now, for a cartridge you would not be able to have initialized data outside of the ROM address space. How would the assembler know what to produce without knowledge of what was RAM and what was ROM? Probably should start a different thread for this...

     

    How you would specify would depend on the linker but the whole point of segments is that the code doesn't worry about where it goes. It's like having multiple RORG bases. (Otherwise, you would just use AORG anyway).

     

    It may seem cumbersome after working solely with a single address, but it solves problems nicely when your needs get more complicated, like with overlays.

     

    Usually initialized data is handled by a piece of startup code that copies the values into the right place - I've seen that used more in C than anywhere else. But it was just an example.

     

    I don't plan on going into much further detail.. I know I've hijacked a lot of people's threads, sorry about that! :)

×
×
  • Create New...