Jump to content

Chilly Willy

Members
  • Content Count

    842
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by Chilly Willy


  1. I'm reminded of the moon landing hoax conspiracy jockeys whose inability to believe in or comprehend the reach of human attainment or the capability of old computer technology makes them impervious to overwhelming physical evidence. And they can never be convinced of the truth, since they will retro-fit everything they are told or shown so that it adheres to their reality of what is possible and what is not. If NASA physically transported them to the moon and showed them the US flag planted in the soil, they would still find something to complain about.

    But the moon landing WAS fake! It was CG generated at Area 51 on alien computers that all modern computers are (poorly) derived from!

    tinfoilhat.gif


  2. thx. tried... but got garbage of the first pm data line?

     

    but vdelay set once? or each scanline via DLI?

    Once in the vblank is fine.

     

     

    VDELAY doesn't actually make anything move, it just tells GTIA to ignore PM data on even lines.

     

    Antic does PM DMA on every scanline whether in single or double mode, making it available for GTIA to snoop the data. Ignoring the data on even lines provides the same function as a downward 1-pixel scroll.

    Using VDELAY in single line res isn't really useful, it's just like throwing half the data away.

    I wonder if that could be useful for something... use vdelay to hide half the lines...


  3. All Y motion is done by actually moving the data for the players/missiles. The vdelay is to allow single line movement when using two-line resolution players/missiles. Set vdelay for one line, then move the data one line and reset vdelay, then set vdelay, then move the data one line and reset vdelay, etc. When in one-line resolution, you just always move the data one line.


  4. Hi,

     

    Well it's a nice idea - I might do something along those lines for engine programming, although so many techniques covered in those threads have since lost value for all but developing on retro hardware (maybe not quite, but close).

     

    Still it's good practice to crunch these problems, for working in other areas - there may be some middle ground that has some interest value.

    Besides the educational value, it does indeed have real value for those of us still working on old consoles. For example, this would be awesome for Saturn homebrew as the hardware acceleration is of warped quads. The SH2 seems like a good target given the small tidbits tossed out in the thread at AF. You have a fast multiplier, and multiply and accumulate operations. You also have two SH2s, allowing some work to be offloaded - kinda like the DSP in the Falcon.


  5. Regarding thrift stores, of course the supply depends on what people recently donated. How many years are "a number of years ago", are we talking 5, 10, 15 years ago? I'm sure every now and then you might find a fair priced vintage computer, but it is not like you should expect them to generally have a stock of those. I can only speak about the thrift store I visit most frequently, just within the last year they've had a loose (and overpriced at the time I visited) Atari 1040ST, a boxed Amiga 1200 that I grabbed as soon as I saw it, a Victor 286 laptop that at first was so expensive I almost cried out laughing but eventually was discounted in steps so a friend of mine bought it. That is besides all the modern, 21th century PC's, the random modern games console and so on. Still given a reasonable output, I don't count on finding any 16-bit or other vintage computers/consoles when I visit the store bi-weekly.

    About 10 years. Yes, it's not like they keep a "stock" of old computers, it's just what's been donated that hasn't been sold or thrown away yet. The store I visited in Havasu was in the process of throwing out all the 8-bit computers and everything below a Pentium as far as PCs were concerned. So I also got an Apple IIc+ with 5.25" external drive and a 3.5" external drive and a B&W monitor and a color monitor for $10 total. I got a C64 for free. I got an SX-64 "portable" C64 for $5. I really should have picked up a IIgs, but wasn't much interested at the time.

     

    The most commonly donated computer will be systems that are almost worthless at the time of donation, but the person still spent a lot on. At the time, that was Pentium systems. They would clean them up, reinstall Windows 95 on them, pair them with a 15" VGA monitor, and sell them for $50. These days, it's probably P3 and K7 systems, along with any kind of CRT monitor.


  6. Well thanks for the comments on progress, especially given this a Jag area and not a Falcon/ST forum - although I should say thanks also to all the A.F. members who put up with me over there :)

    I read through the thread over at A.F. - you are doing some awesome things. I was wondering if perhaps you could write up a little article on a modern approach to a software renderer... like that old Dr. Dobbs series Chris Hecker did so long ago showing the proper way to do an affine mapper, but now updated to pseudo-perspective correct. A whole new generation could use tips from the pro in a sadly neglected area of graphics.

    • Like 1

  7. Where is this St. Vincent store located? IIgs's for $5.00, that sounds like a source of cheap 65816 chips and other parts.

    http://www.svdpusa.net/find/find.thriftstoredt.php

     

     

    I wonder if this is a regional thing. St Vincent DePauls is not active in Canada, but last month I was in the (huge) store in Great Falls, Montana, and there was utterly nothing related to computers or video games. (FWIW, the Salvation Army store located just down the street had a few PC/PSX/Xbox games, but no consoles or computer hardware either.)

    It's a US thing as far as I know. Won't help some folks...

     

    Goodwill is probably better for old consoles and computers than Salvation Army. I pick up keyboards and shit sometimes at Goodwill. What you'll find will vary wildly from one place to another, so all I can suggest is check and cross your fingers. :)


  8. Copying small bloc with blitter is a bad idea. Writing all registers for copying a small bloc is slower than doing a 'brute force copy'.

     

    Big bloc no problems, but smaller........................

    Yeah, that was my main concern about blitter acceleration - the size of the blocks. Maybe for 8x8 blocks, but not smaller. I was going to get it going with just the GPU to start, then experiment with the blitter. Of course, the GPU has some provisions for moving phrases at a time. I'll have to see if that can be applied here as well. One issue about using 16-bit pixel (CRY or RGB565) is that motion compensated blocks can be on any word boundary, which on many processors means you have to move by words to avoid alignment restrictions. I used 32 bit RGB on the N64, so all moves are longs. If I optimize the N64 more, I'll probably take advantage of the 64 bit registers for moving when the alignment is right for that. I need to go back through the Jaguar hardware manual and review all the various rules on alignment for the GPU.


  9. I never ask anyone to buy me stuff, but I gladly accept donations. :) I have bought most of my consoles off ebay for cheap... except the Dreamcast BBA, which is the most expensive thing I have (even more than my PS3), and the Atari Jaguar. I'm saving towards buying a JagCD from B&C, but it'll be a while... maybe next spring.

     

    And performance issues is what makes a task fun! If it's easy, it's not nearly as fun. :D

    • Like 2

  10. Are you porting a video decoder just for fun? Good, that's the spirit!

     

    Are you porting a video decoder and hoping someone else than you will actually use it? Don't get your hopes up too much. The whole "tools are too complicated to use" excuse is a red herring. Easy-to-use tools are a great things, but even cranky tools won't stop anyone who actually wants to create something. If that is too much of an obstacle for you*, the Jaguar isn't the console you should be coding for in the first place.

     

    (* generic "you", not Chilly Willy)

    I've ported quite a few things that nobody ever wound up using... and some things a lot of people use. You never really know ahead of time, so yes, it's better if you port things for yourself and/or fun, and then if someone else uses it, all the better. The main purpose behind porting ROQ is actually to make a "universal" format for old consoles. 32X, Saturn, Dreamcast, Jaguar, PS1, PS2, PSP, N64, NGC... I encode whatever into a ROQ avi and I can then play it on any of my old consoles. Don't know if I'll use for anything more than a video player, but you never know. I've got a bunch of videos like Hellsing Ultimate Abridged I've been watching on my N64. It's pretty cool to see this stuff on these old consoles. I've encoded some music videos that I can leave running while I do other things... my player handles video files just like audio files - it plays the whole directory (in order or shuffled). I start it playing episode 1 of a series and it just keeps on playing until I stop it or it runs out of files.


  11. It's like always in the Jaguar community, too much talking about what we could do, and so little people actually DOING and releasing something. (like, a cinepak player ready to use for coders maybe ? did that ever happen before my package release ?)

    So let's see, coding a realtime ROQ decoder in GPU assembly, not talking about the Jag CD data streaming (which is, not that obvious, trust me)

    I just wonder how much time this will take to code, for someone who, I guess, almost never coded for the Jaguar/JagCD.

    But you can surprise me :)

     

    JagChris > I use the cinepak player in my last 3 Jaguar CD Games, so it's already convenient to use for me :) (seems like I'm the only user anyway)

    I'm not COMPLETELY reinventing the wheel, I'm just trying to make it a little nicer which HOPEFULLY entices some folks into using it. We have a very recent thread in this forum from someone having real problems getting the current cinepak playing to work. Part of that was problems in even making a compliant video file.

     

    I already showed working C code for ROQ, and it's damn simple. Making a GPU assembly version, which might not even be needed, is straight-forward to anyone who has done any assembly. As for streaming CD code, yes, that is a difficult part of the task, but maybe that part I can reuse directly (provided non-conflicting licenses on the different code libraries). You did an excellent job on making alterations to the existing player code to allow ADPCM while still keeping the original cinepak decoder. I basically just want to make the video part a bit better just as you wanted to make the audio part a bit better. If we keep making little parts a little better, the whole becomes better and easier for folks.

     

    And yes, I've not written anything for the Jaguar before. I've only written stuff for the SNES, SMS, Genesis, Sega CD, 32X, Saturn, Dreamcast, N64, GameCube, PSX, PS2, PS3, PSP, Amiga, Mac (68K and PPC), PC (from 286 to modern), Atari 8-bit... yeah, I'll NEVER be able to make something for the Jaguar. ;) :D

     

    My main limitation is available time for all the tasks I set myself. If anything, I try to do too much. Some things wind up on the back burner while I work on something else that has caught my fancy. Right, that something else that caught my fancy is ROQ video.

    • Like 1

  12. If you have a charitable organization in your town like St Vincent DePauls (Catholic version of Salvation Army), they often have donated computer equipment at ridiculously low prices. The last one I visited had Pentium PCs with a 15" monitor for $50. I got a complete Apple IIc+ with extra drive, two monitors (color & B&W), and some games for $10! They had an entire wall of Apple IIgs systems for $5 each.


  13. Did anyone used my cinepak "library" to produce a game ?

    So, don't bother spending weeks on a ROQ decoder, nobody will use it :)

    I don't care if no ever uses it, I just want something other than the binary blob cinepak driver for folks. It's also something "fun" to work on.

     

    Also, the hoops you have to jump through to get a stream you can use with the binary blob driver is probably WHY no one uses your cinepak tools. It's a pain. I want a player that takes a plain avi with regular cinepak/roq. No conversions needed. Which answers TXG above - I COULD make a tool that goes through the ROQ stream converting YUV to CRY before hand, and it would make playback faster; however, it's one more thing that encourages people not to try it at all. We want support for PLAIN streams with no jumping through hoops.

    • Like 2

  14. Your setting the flags early doesn't crash because the int doesn't occur until much later. If other ints could occur, it could fail. What I think might be the problem with the correct order is this (from the doc):

     

    If Jerry asserts DSP bus request one cycle after a previous bus request it is possible for it to

    see the end of the previous bus grant for one cycle, and this can mean that Jerry writes occur

    with the wrong data. The work-around is to ensure that Jerry is off the bus before performing

    a write, either by leaving a long period of bus inactivity, which is usually greater than the

    maximum possible period of object processor bus ownership; or to perform a load and perform

    an operation on the loaded data so that the score-board unit can ensure the load has completed.


  15. Remember that decoding the frame IS just moving data. The GPU has a much better bus than the DSP. You could feed the compressed audio to the DSP for decompressing to take a little pressure off the GPU, but you really want the GPU (or GPU+BLITTER) handling the video. The code I posted above for decoding the frame moves 32-bit RGBA entries from the codebooks. You'll halve the bandwidth by using CRY mode instead. That was why I posted about converting RGB to CRY when decoding the codebooks. If that takes too long, maybe going with 16-bit RGB would be better. My point was you only need to convert the codebook entries to CRY, not every single pixel.


  16. Don't forget you'll have to run this for every output pixel, and the GPU has only 4 KB of local memory where you have to fit code, data, and LUTs. And you can only read/write it with 32 bits accesses; if you need smaller data, either you pad it and waste memory, or have to waste cycles with shifting & masking.

     

    I don't want to rain on your parade, but I wouldn't be too optimistic about this until you've actually written and benchmarked the code.

    Actually, the code I posted is for each entry of the 2x2 codebook, making it a max of 1024 times per frame. Actually unpacking the frame is like this:

     

    static int roq_unpack_vq(unsigned char *buf, int size, unsigned int arg, quit_callback quit_cb)
    {
        int status = ROQ_SUCCESS;
        int mb_x, mb_y;
        int block;     /* 8x8 blocks */
        int subblock;  /* 4x4 blocks */
        int i;
    
        /* frame and pixel management */
        unsigned int *this_frame;
        unsigned int *last_frame;
    
        int line_offset;
        int mb_offset;
        int block_offset;
        int subblock_offset;
    
        unsigned int *this_ptr;
        unsigned int *last_ptr;
        unsigned int *vector;
    
        /* bytestream management */
        int index = 0;
        int mode_set = 0;
        int mode, mode_lo, mode_hi;
        int mode_count = 0;
    
        /* vectors */
        int mx, my;
        int motion_x, motion_y;
        unsigned char data_byte;
    
    	if (dc)
    	{
    		sync_audio();
    		display_show(dc);
    	}
        while (!(dc = display_lock()));
        current_frame = (int)dc - 1;
        if (current_frame == 0)
        {
            this_frame = frame[0];
            last_frame = frame[1];
        }
        else
        {
            this_frame = frame[1];
            last_frame = frame[0];
        }
    
        mx = (arg >>  & 0xFF;
        my =  arg       & 0xFF;
    
        for (mb_y = 0; mb_y < mb_height && status == ROQ_SUCCESS; mb_y++)
        {
            line_offset = mb_y * 16 * stride;
            for (mb_x = 0; mb_x < mb_width && status == ROQ_SUCCESS; mb_x++)
            {
                mb_offset = line_offset + mb_x * 16;
    			/* macro blocks are 16x16 and are subdivided into four 8x8 blocks */
                for (block = 0; block < 4 && status == ROQ_SUCCESS; block++)
                {
                    block_offset = mb_offset + (block / 2 * 8 * stride) + (block % 2 * ;
                    /* each 8x8 block gets a mode */
                    GET_MODE();
                    switch (mode)
                    {
                    case 0:  /* MOT: skip */
                        break;
    
                    case 1:  /* FCC: motion compensation */
                        GET_BYTE(data_byte);
                        motion_x = 8 - (data_byte >>  4) - mx;
                        motion_y = 8 - (data_byte & 0xF) - my;
                        last_ptr = last_frame + block_offset +
                            (motion_y * stride) + motion_x;
                        this_ptr = this_frame + block_offset;
                        for (i = 0; i < 8; i++)
                        {
                            *this_ptr++ = *last_ptr++;
                            *this_ptr++ = *last_ptr++;
                            *this_ptr++ = *last_ptr++;
                            *this_ptr++ = *last_ptr++;
                            *this_ptr++ = *last_ptr++;
                            *this_ptr++ = *last_ptr++;
                            *this_ptr++ = *last_ptr++;
                            *this_ptr++ = *last_ptr++;
    
                            last_ptr += stride - 8;
                            this_ptr += stride - 8;
                        }
                        break;
    
                    case 2:  /* SLD: upsample 4x4 vector */
                        GET_BYTE(data_byte);
                        vector = cb4x4[data_byte];
                        for (i = 0; i < 4*4; i++)
                        {
                            this_ptr = this_frame + block_offset +
                                (i / 4 * 2 * stride) + (i % 4 * 2);
                            this_ptr[0] = *vector;
                            this_ptr[1] = *vector;
                            this_ptr[stride+0] = *vector;
                            this_ptr[stride+1] = *vector;
    
                            vector++;
                        }
                        break;
    
                    case 3:  /* CCC: subdivide into four 4x4 subblocks */
                        for (subblock = 0; subblock < 4; subblock++)
                        {
                            subblock_offset = block_offset + (subblock / 2 * 4 * stride) + (subblock % 2 * 4);
    
                            GET_MODE();
                            switch (mode)
                            {
                            case 0:  /* MOT: skip */
                                 break;
    
                            case 1:  /* FCC: motion compensation */
                                GET_BYTE(data_byte);
                                motion_x = 8 - (data_byte >>  4) - mx;
                                motion_y = 8 - (data_byte & 0xF) - my;
                                last_ptr = last_frame + subblock_offset +
                                    (motion_y * stride) + motion_x;
                                this_ptr = this_frame + subblock_offset;
                                for (i = 0; i < 4; i++)
                                {
                                    *this_ptr++ = *last_ptr++;
                                    *this_ptr++ = *last_ptr++;
                                    *this_ptr++ = *last_ptr++;
                                    *this_ptr++ = *last_ptr++;
    
                                    last_ptr += stride - 4;
                                    this_ptr += stride - 4;
                                }
                                break;
    
                            case 2:  /* SLD: use 4x4 vector from codebook */
                                GET_BYTE(data_byte);
                                vector = cb4x4[data_byte];
                                this_ptr = this_frame + subblock_offset;
                                for (i = 0; i < 4; i++)
                                {
                                    *this_ptr++ = *vector++;
                                    *this_ptr++ = *vector++;
                                    *this_ptr++ = *vector++;
                                    *this_ptr++ = *vector++;
    
                                    this_ptr += stride - 4;
                                }
                                break;
    
                            case 3:  /* CCC: subdivide into four 2x2 subblocks */
                                GET_BYTE(data_byte);
                                vector = cb2x2[data_byte];
                                this_ptr = this_frame + subblock_offset;
                                this_ptr[0] = vector[0];
                                this_ptr[1] = vector[1];
                                this_ptr[stride+0] = vector[2];
                                this_ptr[stride+1] = vector[3];
                                GET_BYTE(data_byte);
                                vector = cb2x2[data_byte];
                                this_ptr[2] = vector[0];
                                this_ptr[3] = vector[1];
                                this_ptr[stride+2] = vector[2];
                                this_ptr[stride+3] = vector[3];
                                this_ptr += stride * 2;
    
                                GET_BYTE(data_byte);
                                vector = cb2x2[data_byte];
                                this_ptr[0] = vector[0];
                                this_ptr[1] = vector[1];
                                this_ptr[stride+0] = vector[2];
                                this_ptr[stride+1] = vector[3];
                                GET_BYTE(data_byte);
                                vector = cb2x2[data_byte];
                                this_ptr[2] = vector[0];
                                this_ptr[3] = vector[1];
                                this_ptr[stride+2] = vector[2];
                                this_ptr[stride+3] = vector[3];
                                break;
                            }
                        }
                        break;
                    }
                }
            }
        }
    
        /* if client program defined a quit callback, check if it's time to quit */
        if (quit_cb && quit_cb())
    		return ROQ_USER_INTERRUPT;
    
        /* sanity check to see if the stream was fully consumed */
        if (status == ROQ_SUCCESS && index < size-2)
        {
            status = ROQ_BAD_VQ_STREAM;
        }
    
        return status;
    }
    
    Notice that it's merely a bunch of moves. Should be pretty easy to convert to assembly and maybe use the blitter... if it's needed. I'll try it without the blitter to start and see what I can get away with.

  17. Nice explaination can't wait to see your first implementation running :-)

    A pure optimized risc decoder would rule can roq also do 24-bit color?

    ROQ is YUV, so it can easily do 24-bit color. My N64 player runs in 32-bit output. The code book conversion looks like this:

     

            /* unpack the YUV components from the bytestream */
            for (j = 0; j < 4; j++)
                y[j] = *buf++;
            u  = *buf++;
            v  = *buf++;
            u -= 128;
            v -= 128;
    	/* CCIR 601 conversion */
            u1 = (88 * u) >> 8;
            u2 = (453 * u) >> 8;
            v1 = (359 * v) >> 8;
            v2 = (183 * v) >> 8;
            /* convert to RGBA8888 */
            for (j = 0; j < 4; j++)
            {
    		/* CCIR 601 conversion */
    		r = y[j] + v1;
    		g = y[j] - v2 - u1;
    		b = y[j] + u2;
                    if (r < 0) r = 0;
                    else if (r > 255) r = 255;
                    if (g < 0) g = 0;
                    else if (g > 255) g = 255;
                    if (b < 0) b = 0;
                    else if (b > 255) b = 255;
    

    At that point, you can make the codebook entry 24 bit RGB, 16 bit RGB, 15 bit RGB, 16 bit CRY... RGB to CRY conversion is pretty fast:

     

    uint16_t rgb2cry(int32_t r, int32_t g, int32_t b)
    {
    	uint16_t	intensity;
    	uint16_t	color_index;
    
    	intensity = r;						/* start with red */
    	if(green > intensity)
    		intensity = g;
    	if(blue > intensity)
    		intensity = b;					/* get highest RGB value */
    	if(intensity != 0)
    	{
    		r = (uint32_t)r * 255 / intensity;
    		g = (uint32_t)g * 255 / intensity;
    		b = (uint32_t)b * 255 / intensity;
    	}
    	else
    		r = g = b = 0;					/* R, G, B, were all 0 (black) */
    
    	color_index = (r & 0xF8) << 7;
    	color_index += (g & 0xF8) << 2;
    	color_index += (b & 0xF8) >> 3;
    
    	return (uint16_t)(((uint16_t)cry[color_index] <<  | (uint8_t)intensity);
    }
    

    We've already got R, G, and B as 8-bit ints, so then converting to CRY is simple. I'll probably replace that 255 / intensity calculation with a table lookup... use a small fixed point number so that you get a multiply and a shift instead of the divide.

    • Like 1

  18. All Atari code restores the flags AFTER the jump. In fact, it (correctly) points out that if you enable ints any other place but after the jump, another int can occur and corrupt the registers (in particular, r30 - the register you're getting ready to jump through), leading to a crash. For example, here's the int exit code for the Jerry CD audio:

     

    clean_up:			; do the housekeeping, per Leonard
    	bclr	#3,r29		; clear IMASK
    	bset	#10,r29		; set I2S interrupt clear bit
    	load	(r31),r28	; get last instruction address
    	addq	#2,r28		; point at next to be executed
    	addq	#4,r31		; update the stack pointer
    	jump	(r28)		; and return
    	store	r29,(r30)	; restore flags
    
    Something else must be causing the crash, not the restore after jump.
    • Like 2

  19. So when you use something like

    #define A1_BASE     (long *)(BASE+0x2200)    /* A1 Base Address */

    That memory area is in the domain of the blitter and is part of how the blitter is brought into action?

     

    Yes. The blitter has two pointers used as a source and destination for blit operations: A1 and A2. You can switch which is the source and which is the destination, depending on whether you're "just blitting" or if you're trying to do affine mapping. Note, the Jaguar has no rasterizing hardware - you use the GPU to do that, while the blitter draws the rasterized line in affine mode.

    • Like 1

  20. Wow, we do have one... :-o Thanks, Chilly! ;-)

    No problem. Like theloon said, the forum can be a little slow, and the program is kinda spread over a number of releases, so some folk need a little help getting going. There's no real docs on it, but the old release has examples. So don't skip downloading the old arcs or you'll miss the examples, which are about all the documentation you'll find. Also read threads where people ask about issues - you'll find good info in those threads. It could really do with a nice wiki page, though. That's one of the best things about Batari Basic - it has a REALLY great doc wiki.

×
×
  • Create New...