Jump to content

R0ger

Members
  • Content Count

    753
  • Joined

  • Last visited

  • Days Won

    3

Everything posted by R0ger

  1. I was talking about how to start the kernel. I use DLI, this active waiting is the other option. It's just important to rememeber, that even DLI won't fire EXACTLY. It will wait for the current instruction to finish. In the beginning of this project I made a mistake. I had simple main thread .. just one jump loop. I fine tuned the kernel, and then I added RMT to the main loop. And whole kernel just started to dance ! In my test case I could only get 0..2 cycle delay, but with real code running in the main thread, it could be 0..5 (7 cycle instructions are rare). That's what lead me to better kernel syncing and it allowed me more control in the end. Moral of the story is, if you tune kernel, DLI or even IRQ, make sure the main thread uses all possible instruction sizes.
  2. Yes, this trick using GTIA modes is well known, and used even back in the day. Doing the same in 4 color mode, that's the difference. I'm not even saying it's new, it was done before. My contribution I think is experimenting with 6 color modes (charmode+PMG) and flickering, and focus on true color source images, ie. photos.
  3. By manual waiting I mean actively waiting for some specific VCOUNT value. Typically you have CMP adrres, BNE, which should be 8 cycles loop. DLI can only be triggered on instruction end, so that's possible 7 cycles. So the precision you get is similar. You have to use WSYNC anyway. As for available time, yes. Lets talk bitmap mode. That's 40 cycles for the field. 9 for refresh. 1 for DLIST, and 5 for PMG IIRC. But that doesnt mean you have 60 cycles to do whatever you want. For 40 of those remaining cycles you are already displaying the data. You can't store colors, or charset (well unless you want to make change in the middle of the line). You can load values into registers though, and write them just as the line finishes. Also most of the values has to be written like 2 cycles ahead, before they are being used. I don't analyze it much beforehand. I just try something simple and then I'm adding changes, till it breaks. Then I know what I can do, and I can fine tune things like order, if I can reuse some value, and so on. Altirra with max overscan turned on is great for this, as debugging the process is well .. not simple .. but doable. You can see the image as it's being generated, you see the position of the beam, and in history window you see cycle number on the current scanline. And then there's Altirra hardware reference, which describes the timing in great detail. Basically without Phareon, nothing of this would be possible 🙂 Good luck fine tuning kernels on real hardware.
  4. I think on badline I can do 5 changes, that is 30 cycles. 5 times LDA #constant, STA address. On normal line I do 1 more. Not sure how much more is possible. In many cases you don't have to do all the changes before the image starts. But in this mode I have to. I have about 20 cycles reserve on normal lines, but that has to cover the time when Antic is reading bytes, which is also 20 cycles. So take this as rough figures. You have to fine tune it anyway. Also you have to stabilize the kernel on the beginning. It's started with DLI in my case, but it could be done by simply waiting too. In both cases there will be some range of cycle offset between frames. So you have to do WSYNC on first line, or even 2. I think in some cases WSYNC can return 1 cycle off. But after that, with no IRQ, the CPU to ANTIC is 100% stable and repeatable. Which helps immensely. Often in DLI or kernel you manage 1 change less just because the jitters. If you look at these images in Altirra, you disable artifacting and frame blending, and enable full overscan, you will see small jitter on the upper left corner. Thats the 1 cycle difference after first WSYNC. But after that you can see the changes in COLBK from black to blue .. and it's rock solid. You can also see how the timing is just so slightly different around badline.
  5. Well if you do several changes on every line, WSYNC and kernel is better than DLI. That's less predictable and takes a lot of time too. It's more like .. fill all 3 registers with values. Do WSYNC. Write registers to hardware, and move whatever more you need. On normal lines Antic eats cycle for every byte of image (and PNG and DLIST and memory refresh). That corresponds to every second CPU cycle. On badline Antic eats 2 cycles for every character. It reads what character there is on the screen, and also first microline of character set. So it eats EVERY CPU cycle, when it's displaying characters. You have time just during HBLANK. It might not seem big difference, as I have to do all changes outside the picture anyway .. but I can load the registers for the next line. That's 6 cycles. I can't do that on badline. And as I said, I don't do WSYNC on badline. I don't have time for that. Also in character mode I have to change not only 4 color registers, but also charset (every 3 lines, I do it on every line anyway). That's on the line before badline though. I don't use WSYNC on chset lines either, there is 16 cycle reserve on these lines (even more on normal lines, WSYNC is perfectly fine). On badlines there is no reserve. Also it's all LDA #, the colors are constants in the kernel code. Reading it from memory wouldn't be possible either. I suggest debugging it in Altirra. Turn on max overscan, so you can see where the beam is when the changes are being made. I think it's not 100% utilized in this version. I think I had one more hardware change in there at some point. Not sure. It is pretty close to 100% though. I was thinking about where it could be pushed further. First the palette could be changed for every line. That wouldn't even cost any CPU, as the numbers are part of the kernel already. Bud besides that ? Not sure.
  6. There will be convertor. And I think I'm getting close. The biggest problem is high amount of options and parameters every step has, and which have to be fiddled with for best results. But I'm settling down, abandoning some experimental features which didn't work, finding universal sets of parameters and so on, so it's getting simpler. At the moment it's nowhere near being usable by normal people, but that won't be that hard. Not promising anything, of course.
  7. There is no problem with end of line jitter. There is wsync on every line, except badlines. And the kernel is static, same for every image. Only the bitmaps, charmaps (what char is inverted) and PMG data are different. It is however unrolled in character modes, as I said. Badlines are well .. bad.
  8. Sorry 😁 I wanted to make proper thread, but since I didn't do it in several months, and you started asking .. it's easier to simply dump it here.
  9. More samples. Only the best mode. Sorry for the dump face.8x11F.xex ww.8x11F.xex arle.8x11F.xex coco.8x11F.xex mandel.8x11F.xex nier4.8x11F.xex pikachu.8x11F.xex predator.8x11F.xex robocop.8x11F.xex terminator.8x11F.xex test.8x11F.xex tt.8x11F.xex
  10. Let's take it in steps. It's something I worked for months, and the final stage is quite complex unfortunately. First step is 4 hues, 4 brightness values. Odd lines are hues, even lines are values. PAL mixing gives me 16 colors. My first approach was red, green, blue, black for hues. But then I realized I don't need black (which would give me various grays). Gray is not very common color, and can be easily dithered using the other colors. So instead black I use yellow, as the yellow i get by dithering red and green is quite bland. You can also use other hues if the picture misses one color. Like that Pong demo .. it doesn't have green, so the palette uses yellow, red, blue and orange IIRC. This basic mode does not use char mode (but it can), and does not use PMG. Also it does not flicker, and has the usual memory size. There is kernel routine, which changes 4 color registers every line, so it's pretty CPU intensive. It's the simplest mode though. The Lambo picture in this mode looks like this (xex bellow). I call this mode RGBY.4x4. Now .. 4 hues is OK. 4 values is not. That's what makes the picture look rough. Using PMG and char mode we can get 6. 4 in every individual character, of course. You can look at it as every character is either dark, average, or bright. If we had levels 1,2,3,4,5,6 .. the average chars will use 2,3,4,5. Bright char will swap 2 for 6. And dark will swap 5 to 1. Hues are the same in all characters, I tried to somehow swap them too, but it's not really needed. This mode obviously only works in char mode, and it consumes all PMG (stretched to max size). It needs extra memory for the PMG and character map. The kernel is pretty much the same, I don't have to change any extra color. Badlines complicate it hugely though. In this mode the kernel has to be unrolled, the loop cannot be fitted into it. Maybe it could be done on normal lines, there is always room for improvement. Btw. the Pong demo uses character mode and 5 brightness values, as the PMG is used for the ball. So it's something between 4x4 and 4x6. Character mode, amount of changes needed per line, and scrolling, make the kernel code nightmare level complex. Lambo in this mode looks like this. I call this mode RGBY.4x6 Another step is the interlacing. There isn't really interlace in Amiga sense, the hue and values lines do not swap. I don't want to increase resolution. The previous modes give reasonable 160x100, square pixels. Ideal. I want to increase color and brightness depth. Lately I call these mode flicker modes instead if interlaced. By flickering red and blue, I get extra purple hue. By flicker 2 neighbor brightness levels, I get something in between, and so on. This approach doubles amount of hues, it gives me one more between each neighbors on hue circle. And I get twice as much minus one brightness values. I only flicker neighbor values, to improve perceived flickering. So instead of 4 values I have 7, and instead of 6 I have 11. If you use this flickering approach for brightness values, there is one problem. One of the field (one of the two pictures you flicker) covers even values from the source image, and the other covers odd values. One of the two is on average 1 step brighter. That makes the picture flicker quite visibly. But it can be drastically improved by exchanging pixels between field in checkerboard pattern. This then can collide with dithering used in the picture itself, but since I used Floyd-Steinberg, specially modified to break any repeated patterns, it's no problem. The kernel and working of the individual frames is the same as with non flickered variants. The palettes are the same too. It all takes twice as much memory. All the smart stuff is done during the conversion. Here is the Lambo in flickered 4x4, which I call RGBY.8x7F. And here is flickered 4x6, which I call RGBY.8x11F. Recently I made some more small improvements, but it's nothing drastic. I can make better gray in flickering mode, and I improved character boundaries artifacts in character modes. But let's not deep too dive into this now. Also I had these pictures prepared for my talk about the mode on Atariada 2019. And here are the XEXes. In emulation you need PAL artifacting, and the same refresh rate as your screen. Or frame blending. 50Hz emulation on 60Hz screen will create unrealistic 10Hz flickering, it's not supposed to look like that ! On real hardware, keep in mind that it wont flicker at all on modern LCDs. It should look decent on CRT TVs. The flickering is somewhat more visible on CRT monitors. lambo.4x4.xexlambo.4x6.xexlambo.8x7F.xexlambo.8x11F.xex
  11. They all merge better if they are same brightness. Problem is you want yellow brighter, and red darker to get more saturation. So it's most visible with this pair. But you can fine tune it for your application 🙂
  12. Well .. 4 colors is too few. But 6 ? Enough for many cases. I'm using 6 color mode in my PuyoPuyo remake: Full thread here: https://atariage.com/forums/topic/284327-puyo-puyo-wip It's character mode with something like c64 color map. I have black and white in every character. Then red and blue, which can be swapped for every character. Blue can be swapped for yellow with inversion, and red can be swapped for green with PMG. It also uses DLI to load the PMG data, instead of DMA. It takes some extra time, but the data manipulation is way faster that way, as I only use 1 byte height-wise per character row (DMA would need 4). And you can achieve more colors with patterns and PAL mixing. Especially the purple comes out very nice.
  13. By other you mean you, right ? 🤣
  14. Make your original song, send it into compo, maybe next time you can win too 😀 Yes, compo rules are not same as in-game music. Duh.
  15. Hardly. Yet again you have no idea what you are talking about. 😀
  16. Right, that might be. We will know when there is XEX released. I thought the mode transition is quite on the left side .. I did it in Sails of Doom splashscreen, and it wasn't so easy. But I did 2 mode changes per line, and here is only one, so it won't be that much of a problem it seems.
  17. Hm .. interesting ! The Medical Machine is quite clear, but that Invasion ? How is it done ?
  18. Some 'special' modes can be used in games too. Like that 4 colors hue/brightness interleaved mode. Here is small demo. It's character mode, 4 hues, 5 brightness levels, which makes 20 colors. No flickering, it doesn't go well with the movement. Btw. this has the most complicated kernel handler I have ever written 😁 pong.xex
  19. That's pretty much impossible I'm afraid. So far I can handle simple dlis with severe penalty to timing precision (see irq/nmi thread in programming section). Rasta converter seems out of question atm.
  20. Added simple drums loop to demonstrate the 4th channel really can be used for anything 🙂 IRQ.xex
  21. If by 'powerless' you mean low volume, it can be, it depends a lot on the audio system you use. Those are rather low frequencies. TV or cheap earphones will not play it well. You need subwoofer or decent earphones.
  22. Some new progress ! I can now somewhat combine this new bass with RMT. At the moment I use simple approach: I expect RMT to play A distortion on channel 1. I then take the frequency a volume, and play this new bass on channel 1 instead. There is some basic PWM, but there is no control over it, and it is not even reset based on notes in RMT. That will come later. I can use E or C bass in RMT .. and then change the instrument to use A before export. I play the notes 2 octaves below the original frequency. I don't use the naive 'skip 7 play 1' approach anymore. I convert every source wavelength into new 16 bit wavelength. Let's say I get divisor 9F from RMT. That means wavelength $A0, 4 times that is $280. Which means I should do IRQ with FF divisor twice, and then one IRQ with 7F divisor. There is still one complication. In some cases I could get realy small divisor in the lower byte. For example if I wanted 201 wavelength. I wouldn't be able to serve the IRQ fast enough. So instead of 2xFF+0 I can do FF+80+80. So I actually use table with 3 bytes for every source frequency. This gives me nicely spaced IRQs, with very little CPU overhead. I'm also after all noises and crackling which can occur. One source was I played RMT in VBI, as usual. That leads to very long NMI handler, which delays IRQ too much. So this example calls RMT from main thread, syncing using VCOUNT. Short DLI or VBI handlers are no problem. I also use shadow registers for all 6 values I use in IRQ bass, and they are updated only on down edge. Changing them directly disregarding the actual phase of the bass was another source of minor crackling. At the moment it should sound rather clear. I suggest turning off channels 2 and 3 in Altirra, and to turn on the audio monitor, to clearly see and hear what's going on. Please note you still need real HW or Altirra. We discussed Atari 800 IRQ timing with Petr Stehlik for like 2 hours this spring, it's rather clear where the problem is. It just wouldn't be easy to fix, and it might break easy portability to old systems like Atari ST. Who knows, maybe one day 🙂 The song is just some unfinished idea, not even sure how much original it is. I mean I don't know what song it is, but I still have feeling it is some song 🙂 But it proved useful for this testing. The lowest frequency is about 30Hz. Probably too low. But hey, it's a test. IRQ.xex
  23. Not really. IMHO C64 is generally superior. Yes it has way less colors. Yes it has less flexible modes. But for most games, you need 4 color GFX mode and sprites. Both is way better in C64. And it's certainly not problem of few less kB. There are areas where Atari has theoretically an edge, especially 3d graphics, but it's typically the same game with somewhat better FPS on Atari.
  24. You can also mix 4 hues and 4 brightness levels in 4 color mode. If you add character mode and PMG, you get 6 brightness levels. And using 2 frame flickering you can get 8 hues and 11 brightness levels. That can make images like this: Also check the Rasta convert thread ..
  25. And there are much faster methods compared to bit by bit multiplication. We just need to know what kind of numbers are these.
×
×
  • Create New...