Jump to content
IGNORED

Useful features to programmers


Delicon

Recommended Posts

I have been working with Glenn Saunders developing hardware to support the programming contest he is sponsoring. The goal is to produce a cheap SuperCharger cartridge with a VCS accessible serial port (to be used in place of the audio input of the original), to help programmers to develop their games and players to play the games on real hardware. So far we have succeeded in our serial interface design, the bonus is the design facilitates easily adding a couple other features; mainly a queuing system, a la Pitfall II. Where queues are loaded with data during non screen drawing time and then read back while drawing the screen. 100 circular queues of up to 256 bytes each would be available for use. Accessing the queues would be very similar to normal SuperCharger SRAM reading and writing.

 

Glenn is also interested in adding math coprocessing queues. These would function the same as the normal queues except they would be loaded with values you wish to be processed and then the answer would be read back. My question is what functions would be needed? Specifics would be nice, for example 16bit multiplication with a 32bit answer. Also how fast would you expect results? I am using a relatively fast processor, so basic calculation would be available by the VCS next clock cycle. More complex calculations would not be, for example square roots. What is a reasonable amount of VCS clock cycles you would expect an answer?

 

Also feel free to suggest other features, no promises as low cost is critical, but our design is very versatile and many features may be added with simple software additions.

 

Thanks,

 

Vern

Link to comment
Share on other sites

Hi there!

 

Uihjah... do you think you may be able to add highly customized functions more or less on request? Like for example if someone finds that for game x algorithm y could amazingly sped up if some coprocessing queue z would exist, could you just add it then if you get the specs? Or are we talking about a final design, that needs to be nailed to the point right now?

 

Greetings,

Manuel

Link to comment
Share on other sites

Uihjah... do you think you may be able to add highly customized functions more or less on request? Like for example if someone finds that for game x algorithm y could amazingly sped up if some coprocessing queue z would exist, could you just add it then if you get the specs? Or are we talking about a final design, that needs to be nailed to the point right now?

 

That would be possible, but I think it would go against the spirit of Glenn's design. The idea being, a cheap, simple, easy to use solution. Custom processing queues would require the user to reprogram embedded software to play a specific game. Although possible, this is what we are trying to avoid.

 

The way the design is setup now queues will be activated in banks by a similar means as the SuperCharger control byte. This will keep older games compatible and newer games able to access new features.

 

If your coprocessor queues are generic enough I am sure they can be accommodated. Things that I expected people to want would be multiply, divide, roots, maybe some 2D transformation functions, maybe some general sound manipulation stuff. I dont program the VCS, so I dont really know what would be helpful.

 

Let me know what you had in mind.

 

Vern

Link to comment
Share on other sites

I think I have a solution Cybergoth.

 

No promises here, its just an idea, not tested. How complex is your custom function? What if I could give you a couple queues that would be user programmable? They would work in pairs, a program queue and a function queue. So in the program queue you would load a Reverse Polish Notation function, which would never have to be changed again. Then simply load the corresponding function queue correctly and BAM you should have a custom answer. Of course the cost of all this, depending on the length of the program queue, could be a large increase in time to calculate.

 

Give me some examples so I can put together some testing and get some values.

 

Vern

 

RPN link

Link to comment
Share on other sites

Also feel free to suggest other features, no promises as low cost is critical, but our design is very versatile and many features may be added with simple software additions.

925504[/snapback]

 

This sounds like a very interesting project. I would personally find the following functions very useful:

  1. A fast method for reversing the bit order of a byte (PF calculations)
  2. A Binary->BCD convertor
  3. Divide & Mod functions for non-powers of 2 (also on BCD numbers)
  4. Fast multiplication
  5. Proper random number generation
  6. Collision detection (point inside a region, and region overlap)
  7. PAL<->NTSC colour value convertor
  8. String -> Sprite Data Convertor (for displaying text)
  9. A time clock

Chris

Link to comment
Share on other sites


  1.  
     
  2. A fast method for reversing the bit order of a byte (PF calculations)
     
  3. A Binary->BCD convertor
     
  4. Divide & Mod functions for non-powers of 2 (also on BCD numbers)
     
  5. Fast multiplication
     
  6. Proper random number generation
     
  7. Collision detection (point inside a region, and region overlap)
     
  8. PAL<->NTSC colour value convertor
     
  9. String -> Sprite Data Convertor (for displaying text)
     
  10. A time clock
     

 

Chris these are great ideas.

 

First four seem very possible.

 

Random numbers though can be difficult no matter where you are. I may be able to pull a sufficiently pseudo random number for you.

 

Give me some more specs on your collision detection algorithm.

 

I dont know much about color conversion, but if its just a lookup table I dont see a problem with that.

 

For the String to Sprite, seems like you could just use the normal queues for that. Load the queues up with your sprites then just read them out.

 

Clarify a time clock. Do you mean general timers? A real time counter? An actual date and time clock? We wont be able to do the actual real world clock. Timers and second counters would be possible though. Give me some more info.

 

Vern

Link to comment
Share on other sites

Would it be possible to make a queue that let you push opcodes onto it? That seems like it would offer an easy way to 'program' the other processor from the Atari side. Not sure if thats possible.

 

What I suggested above isnt actually the same as your suggestion. I thought about your way, its possible, but not practical. It would mean allowing a cartridge to change the actual embedded code. My gut just screams bad news.

 

My way is to let someone load a function into a queue then the data into another queue, then the answer gets spit out. This method will be slower, but its much safer. It still seems like it would be super useful for performing some more complex calculations. My main question about this is what type of precision is needed? Are these 8bit calculations with 8bit answers, 8bit calculations with 16bit answers, maybe 16bit and 16bit? Thats were I need programmers help. Give me the specifics.

 

Vern

Link to comment
Share on other sites

The goal is to produce a cheap SuperCharger cartridge with a VCS accessible serial port (to be used in place of the audio input of the original), to help programmers to develop their games and players to play the games on real hardware.  So far we have succeeded in our serial interface design, the bonus is the design facilitates easily adding a couple other features; mainly a queuing system, a la Pitfall II.  Where queues are loaded with data during non screen drawing time and then read back while drawing the screen.  100 circular queues of up to 256 bytes each would be available for use.  Accessing the queues would be very similar to normal SuperCharger SRAM reading and writing.

 

I am currently working on a 2600-compatible cartridge which will hopefully be cheap (under $20, maybe under $15) and will have 32K RAM, 64K flash (programmable in-circuit via header) and a Xilinx 95C36xl (initially) or 95C72xl CPLD to run off a 14.3818Mhz oscillator. The former processor should be able to implement most individual types of bank-switching (except Supercharger). The latter would have allow some super-cool features.

 

Even on the former, I'm planning on implementing a bank-switching mode with some special features to help people who want to use a Stella-Sketch-style kernel. To start with, address space will be divided into three partitions:

 

-1- 1000-10FF : Any 256 byte page out of RAM (or possibly out of flash)

-2- 1100-17FF : Most of any 2K block out of RAM or flash

-3- 1800-1FFF : Any 2K block of flash

 

When blocks 1 and 2 are selected as RAM, they may be write-protected or not as convenient. When write-protected, RAM may be read via any addressing mode, and code may be run from it.

 

When RAM is not write-protected, code CANNOT be run from it, but it may be read via any addressing mode EXCEPT those involving a page crossing, and may be written using either indexed or indirect-indexed [but not indexed indirect] addressing mode. Read-modify write operations are permitted using absolute and indexed-indirect [yeah, I know nobody uses that one] addressing modes only. Note that all RAM may be written and read at the same address.

 

Now for the fun part: when the Hires Helper function is turned on, addresses in the range $1F00-$1FFF will update bits 8-11 of the page address for the address range $1000-$10FF: bit 3 of the address will go to bit 11 and bits 4-6 will go to bits 9-11.

 

If addresses $1F10-$1F6F contain a repeated sequence $01,$02,$04,$08,$10,$20,$40,$80 etc. and $1F90-$1FEF contains $FE,$FD,$FB,$F7,$EF,$DF,$BF,$7F, then a pixel play be plotted at screen coordinate X,Y of a Stella-Sketch style screen (assuming screen X coordinates 16-175) via the instruction sequence:

  LDA $1F00,X
 ORA $1000,Y
 STA $1000,Y

and erased via

  LDA $1F80,X
 AND $1000,Y
 STA $1000,Y

Additional functions would provide additional accelleration for object drawing uses pre-shifted shapes.

 

BTW, for those who think that approach to line-drawing is too slow, I may do a 95C72-based cart with a Hires Helper II feature. The sequence for drawing pixels there would be:

  CMP $1F00,X
 CMP $1000,Y

Can't beat that one, especially since it doesn't even trash the accumulator! The implementation would be that a read operation $1F00-$1FFF latches A7 and the data. The next write to $1000-$10FF would perform a one-cycle read-modify-write using either an AND or OR operation and the indicated data. A little tricky, but should be feasible with a 14Mhz clock. If this sort of approach were supported in hardware, a game like Battlezone or Red Baron could probably be done at about 10fps.

Link to comment
Share on other sites

- A fast method for reversing the bit order of a byte (PF calculations)

Just use a 256-item lookup table.

- A Binary->BCD convertor

What would this accomplish that could not be done easily via other methods.

- Divide & Mod functions for non-powers of 2 (also on BCD numbers)

Could be handy for some things, but would seem expensive relative to the benefit received. Note that division by a constant may be easily done via table lookup.

- Fast multiplication

A little cheaper in hardware than division, but table lookup is good here, even for multiplication of non-constants.

- Proper random number generation

Unless there's a clock source which is asynchronous to the 2600, this isn't going to be any better than what could be accomplished with good code.

- Collision detection (point inside a region, and region overlap)

Can be done with code about as easily as the numbers could be gotten to external hardware.

- PAL<->NTSC colour value convertor

Just use a 256-byte lookup table.

- String -> Sprite Data Convertor (for displaying text)

Not sure what you mean here. Displaying text isn't all that hard if the character sets are in ROM, unless you're expecting to hack more than one character per sprite (e.g. show eight 6x8 characters). Seems a bit too specialized to justify special hardware.

- A time clock

A battery-backed clock might be fun, but I don't see much that would justify the expense. Any other sort of time-keeping should be manageable within the kernel itself.

 

Others may disagree with me, but hardware "helpers" are only worthwhile if they allow one of three things:

  1. A task can be performed with so much less code than would otherwise be necessary, that a task which would be unfeasible even in 64K of code becomes feasible in a small fraction of that.
  2. A task within the kernel becomes faster.
  3. A task outside the kernel which would otherwise take a prohibitive amount of time is doable in a small fraction of that time.

Note that the more complex a hardware helper is, the more effectively it needs to perform one of the above tasks to be worthwhile.

 

Pitfall II has perhaps the most famous "hardware helper" chip. The music part of the chip performs calculations that would otherwise take about 40-60 cycles/scan line and reduces them to IIRC seven [if DC had done it right, it would have been six]. This clearly takes them from the real of non-feasible to feasible. The sprite data fetcher accellerates sprite drawing by a few cycles--necessary because even the seven-cycle burden imposed for the music would otherwise pose a problem.

Link to comment
Share on other sites

Others may disagree with me, but hardware "helpers" are only worthwhile if they allow one of three things:

  1.  
     
  2. A task can be performed with so much less code than would otherwise be necessary, that a task which would be unfeasible even in 64K of code becomes feasible in a small fraction of that.
     
  3. A task within the kernel becomes faster.
     
  4. A task outside the kernel which would otherwise take a prohibitive amount of time is doable in a small fraction of that time.
     

Note that the more complex a hardware helper is, the more effectively it needs to perform one of the above tasks to be worthwhile.

 

I think our goals are different here. You are trying to make things that were previously impossible, possible through help and optimization. Glenn's goal is just to make it easier to program games, which lines up well with his reasons for sponsoring the programming contests, more and better games for everyone to enjoy. We have an opportunity to add a bunch of features for no increase in cost. Even if its as simple as reversing a bit order that could be done with a lookup table, it makes sense. I have 2K of microprocessor space to fill with functions that would help people make games, I want to add things that programmers actually want.

 

Vern

Link to comment
Share on other sites

[*]A task within the kernel becomes faster.

925790[/snapback]

I have to agree with this one, and I think this can be done in such a way that it's both easier to program and much, much faster.

 

For instance, if we could set up a bunch of independent 256-byte stacks, maybe 10 of them or so, we could display sprites both fast and easy. With just 4 stacks, we could push, say, 192 bytes into each of them for player 0, player 1, and the colors of each, which would mostly be zeros but we'd strategically place our sprite data in the right place as well. It would be easy to create these stacks outside the kernel, then in the kernel, you'd just do:

 

LDA stack1 ; pull from stack 1(Absolute=4 cycles. Or could this be mapped to zero page

somehow?)

STA GRP0

LDA stack2

STA GRP1

LDA stack3

STA COLUP0

LDA stack4

STA COLUP1

 

This assumes we can map the stacks to fixed absolute addresses. No need for skip/switchdraw here, easier to program and understand than either, and this saves cycles over both. We'd have a whopping 48 cycles left for other stuff after both players with color change every line, and this is a 1LK.

 

If we do more stacks, we could even have a bitmapped display in RAM, using something like the 6-digit score display, but we'd have some cycles left over since we don't need indirect addressing.

Link to comment
Share on other sites

For instance, if we could set up a bunch of independent 256-byte stacks, maybe 10 of them or so, we could display sprites both fast and easy.  With just 4 stacks, we could push, say, 192 bytes into each of them for player 0, player 1, and the colors of each, which would mostly be zeros but we'd strategically place our sprite data in the right place as well.  It would be easy to create these stacks outside the kernel, then in the kernel, you'd just do:

 

You should get about 100 stacks of up to 256 bytes each, set at absolute addresses. That much is pretty much set. The issue is that the cart has more room to expand and all thats needed is to write some code. Give me some features you want in addition to the queues/stacks. Do you think the custom functions idea would be useful at all? What kind of math coprocessing would be useful?

 

The features that are pretty much set are the queues/stacks, normal SuperCharger functions, and serial port access. The serial port is handled the same as writing and reading any SuperChanger SRAM location. The only difference is, to read a byte, you have to make a separate read to see if there is even data to be read. This read will tell you how many bytes are waiting to be read by the VCS.

 

Vern

Edited by Delicon
Link to comment
Share on other sites

You should get about 100 stacks of up to 256 bytes each, set at absolute addresses.  That much is pretty much set.  The issue is that the cart has more room to expand and all thats needed is to write some code.  Give me some features you want in addition to the queues/stacks.  Do you think the custom functions idea would be useful at all?  What kind of math coprocessing would be useful?

925869[/snapback]

Oh... I missed that part about the queues, which would do basically the same as a stack except they would be LIFO instead if FIFO. Yes, these would definitely allow us to do great things.

 

As far as coprocessing goes, maybe a waveform generator would be nice. Like we give it a couple of frequencies that we want, the wave shape or even digitized samples if there's space, use the "always on" AUDCx, and all we do is pull a value off of the wave and depending on when we pull it, it "samples" the waveform and gives us back a value that we can store into AUDV0, i.e.:

 

LDA wave

STA AUDV0

 

in every kernel line will give us ~15.6 kHz audio, or every other line for ~7.8 kHz, though it's only 4-bit sampling, it should still sound decent.

Link to comment
Share on other sites

I think our goals are different here.  You are trying to make things that were previously impossible, possible through help and optimization.  Glenn's goal is just to make it easier to program games, which lines up well with his reasons for sponsoring the programming contests, more and better games for everyone to enjoy.  We have an opportunity to add a bunch of features for no increase in cost.  Even if its as simple as reversing a bit order that could be done with a lookup table, it makes sense.  I have 2K of microprocessor space to fill with functions that would help people make games, I want to add things that programmers actually want.

925821[/snapback]

 

I'm not opposed to adding features that can be done without cost, but fail to see how something like "LDY myshape,x // LDA bitflip,y // STA PF2" could possibly be made much easier via anything reasonable you'd put in the hardware. I suppose you could have a hotspot which would read, bit-flipped, the last byte read from zero page (in which case, if myshape was in ZP, the sequence would be "LDA myshape,x // LDA magicbitfliphotspot // STA PF2" which would have the advantage of not trashing Y. Doesn't seem like a big win, though.

 

If your goal is to provide microprocessor-ish features that would be useful within a kernel, I can list a few:

  • A somewhat generalized music-generation system. A bit of a pain to use, no matter how friendly you may make it, but for things like title screens or intra-level music, it could be handy. Let me know what sort of processing power you have and I'll suggest an implementation.
  • An ADPCM decoder. Not sure how well such a thing can work with a 4-bit output, but it may be handy anyway.
  • For Stell-a-sketch kernel games, a set of graphics primitives (line drawing and area fill) that could run on a multi-buffered screen. Include options for drawing on even rows only, odd rows only, or both; when line drawing on even- or odd-rows only, allow 'wrong' pixels to be pulled to either the "paired" pixel or the nearest.
  • Rather than having queues that need to be reloaded every frame, have a number of 'fetchers' that read data from a common address space. Each fetcher should include a 16-bit pointer register, an 8-bit mask, and an 8-bit match register. If ((ptr xor match) and mask) is non-zero, the fetcher should return a zero regardless of the data at ptr. This would allow all needed objects to be put in memory once, at the start of the program, and called upon at will.
  • Have a range of hotspots that, when read, would produce a 5-260-cycle delay by stuffing the succeeding 5-255 read cycles with data required to delay 2-257 cycles (i.e. a sequence of "nop" and/or "nop 0" instructions) followed by a jump to the address of the read before the hotspot read, plus one (e.g.. if the hotspot range was $800-$8FF and the code at $1234 executed an "LDA $800,x" with X=3, the next eight memory fetches should be "EA 04 04 [repeated] 00 XX [zero-page read] 4C 37 12".
  • Include a handy timer which counts scan lines (units of 76 main CPU clocks).
  • Following a read of a particular ZP address, if a timer has expired, latch the address of the previous code fetch, then make the next three code fetches be "4C xx yy" (where xx and yy are programmable). Make the captured address, plus one, visible at a pair of consecutive hotspots (so code can return from interrupt via JMP to that location). This would allow code to test for an 'interrupt' condition using only two bytes/three cycles instead of five bytes/six cycles.

Those are just a few ideas to get started. I have no idea which of those would fall within the abilities of whatever chip you're using, but all of those would allow things to be done that would otherwise not be possible.

Link to comment
Share on other sites

I'm not opposed to adding features that can be done without cost, but fail to see how something like "LDY myshape,x // LDA bitflip,y // STA PF2" could possibly be made much easier via anything reasonable you'd put in the hardware.  I suppose you could have a hotspot which would read, bit-flipped, the last byte read from zero page (in which case, if myshape was in ZP, the sequence would be "LDA myshape,x // LDA magicbitfliphotspot // STA PF2" which would have the advantage of not trashing Y.  Doesn't seem like a big win, though.

To be honest I dont see the benefit of it either. Thats because I dont program for the VCS so I only have a high level of understanding why some of these suggestions would be useful. My plan is to gather as many suggestions as possible, then prioritize them, and implement them. If one turns out to not be possible to implement, it gets tossed.

 

As for the overall design I am using an Atmel ATtiny2313 microcontroller running at 14.7456MHz. The controller has access to the 8bit VCS data bus and the lower 7 or 8 bits of the VCS address bus. That means the processor is limited to recognizing 128 or 256 contiguous address 'hotspots'. This setup allows the queues/stacks and the serial port interface. I can do a lot with the processor, my only limitation is when that one of the address hotspot gets hit, the processor has to respond within about 7 of its clock cycles. So that means that all microcontroller processes must be interruptible. Another limitation is that the actual bulk data is stored in cartridge's SRAM, not the microcontroller's SRAM. So the microcontroller cant randomly access the bulk data. It can snatch bytes along the way as it see necessary.

 

 

A somewhat generalized music-generation system.  A bit of a pain to use, no matter how friendly you may make it, but for things like title screens or intra-level music, it could be handy.  Let me know what sort of processing power you have and I'll suggest an implementation.

Would this be anything like batari's suggestion? Give me some more details.

 

An ADPCM decoder.  Not sure how well such a thing can work with a 4-bit output, but it may be handy anyway.

 

I would imagine that this would require way too much code space to implement. I only have about 2K. Would the data set be too large using straight PCM?

 

For Stell-a-sketch kernel games, a set of graphics primitives (line drawing and area fill) that could run on a multi-buffered screen.  Include options for drawing on even rows only, odd rows only, or both; when line drawing on even- or odd-rows only, allow 'wrong' pixels to be pulled to either the "paired" pixel or the nearest.

I need more information to know if this is possible. Are you saying that the entire screen would be built using vector type functions?

 

Rather than having queues that need to be reloaded every frame, have a number of 'fetchers' that read data from a common address space.  Each fetcher should include a 16-bit pointer register, an 8-bit mask, and an 8-bit match register.  If ((ptr xor match) and mask) is non-zero, the fetcher should return a zero regardless of the data at ptr.  This would allow all needed objects to be put in memory once, at the start of the program, and called upon at will.

I am not sure I fully understand this one either, but I think my limited address range would make this impossible.

 

Have a range of hotspots that, when read, would produce a 5-260-cycle delay by stuffing the succeeding 5-255 read cycles with data required to delay 2-257 cycles (i.e. a sequence of "nop" and/or "nop 0" instructions) followed by a jump to the address of the read before the hotspot read, plus one (e.g.. if the hotspot range was $800-$8FF and the code at $1234 executed an "LDA $800,x" with X=3, the next eight memory fetches should be "EA 04 04 [repeated] 00 XX [zero-page read] 4C 37 12".

My limited address range makes this impossible.

 

Include a handy timer which counts scan lines (units of 76 main CPU clocks).

I may be able to so this. I may have enough registers in my CPLD to create a clock pulse off changes in the VCS address bus.

 

Following a read of a particular ZP address, if a timer has expired, latch the address of the previous code fetch, then make the next three code fetches be "4C xx yy" (where xx and yy are programmable).  Make the captured address, plus one, visible at a pair of consecutive hotspots (so code can return from interrupt via JMP to that location).  This would allow code to test for an 'interrupt' condition using only two bytes/three cycles instead of five bytes/six cycles.

My limited address range makes this impossible.

Link to comment
Share on other sites

As for the overall design I am using an Atmel ATtiny2313 microcontroller running at 14.7456MHz.  The controller has access to the 8bit VCS data bus and the lower 7 or 8 bits of the VCS address bus.  That means the processor is limited to recognizing 128 or 256 contiguous address 'hotspots'.  This setup allows the queues/stacks and the serial port interface.  I can do a lot with the processor, my only limitation is when that one of the address hotspot gets hit, the processor has to respond within about 7 of its clock cycles.  So that means that all microcontroller processes must be interruptible.  Another limitation is that the actual bulk data is stored in cartridge's SRAM, not the microcontroller's SRAM.  So the microcontroller cant randomly access the bulk data.  It can snatch bytes along the way as it see necessary.

 

I'm not quite sure I understand how your queues are supposed to access the SRAM on the cart. Otherwise, would there be any problem using 14.31818Mhz? Serial baud rates might be 3% slow, but that's well within tolerance for a normal PC and it would make some things (like a 76-cycle timer) much easier.

 

(Music generation)

Would this be anything like batari's suggestion?  Give me some more details.

 

Batari posted his post while I was writing mine.

 

For a four-voice generator, the basic algorithm would be:

 ui16 freq[4],phase[4];
 sb8 *table[4],*table2;

 int i;
 ui16 sum;

 sum=0x8080;
 for(i=0; i<4; i++)
 {
   phase[i]+=freq[i];
   sum += table[i][(phase[i] >> 8)];
 }
 outputsample = table2[sum >> 8];

In practice, the loop should be unrolled of course. The >>8 may be replaced with a larger shift value if 256-byte tables would be too large.

 

(ADPCM)

 

I would imagine that this would require way too much code space to implement.  I only have about 2K.  Would the data set be too large using straight PCM?

 

Probably not too bad, though in that case it would be helpful if the micro could unpack data stored as two nybbles/byte so the 6502 didn't have to worry about doing a shift or table-lookup on every other byte.

 

(Graphics primitives)

I need more information to know if this is possible.  Are you saying that the entire screen would be built using vector type functions?

The Stella-sketch kernel displays a bitmap screen. My Supercharger minigame, SDI, uses something similar but adds color by turning alternate scan lines red and green. To make this work, your CPU would have to have at least 2K (and preferably 8K) of RAM that it could access fairly randomly.

 

(Fetchers instead of queues)

I am not sure I fully understand this one either, but I think my limited address range would make this impossible.

 

How exactly were you planning to implement your queues? Tell me that and I'll tell you how my idea would be different.

 

(scan-line counting timer)

I may be able to so this.  I may have enough registers in my CPLD to create a clock pulse off changes in the VCS address bus.

 

Simpler just to use a 14.31818Mhz crystal. Have a certain hotspot initialize your counter at least once/frame, and you should be able to keep pretty accurate time for the next 1/60 second without resynchronization.

 

(magic delay function)

My limited address range makes this impossible.

 

Didn't think that would work either, though there would have been something sorta neat about implementing interrupts on a system with no /IRQ or /NMI.

Link to comment
Share on other sites

OK, it appears that my suggestions were not very popular! In my defense, I was intending the bit-reversal to be used in conjunction with the queues, i.e. you load a queue with data and then read it back in bit-reversed form during the kernel. As noted, a single bit-reversal operation would not be particularly efficient! On reflection, I think it would probably be best if the functions were user-programmable, as we are unlikely to come up with a set that will make everyone happy.

 

Chris

Edited by cd-w
Link to comment
Share on other sites

I'm not quite sure I understand how your queues are supposed to access the SRAM on the cart.  Otherwise, would there be any problem using 14.31818Mhz?  Serial baud rates might be 3% slow, but that's well within tolerance for a normal PC and it would make some things (like a 76-cycle timer) much easier.

No problems with the frequency. I am cheating with the way the microcontroller interfaces to the VCS. Thats why it only has access to 7 or 8 address bits (this will be decided after I figure what features to implement). Actual baud rate connected to the PC will be up to 115k (testing was done at 115K). Bytes can be read by the VCS as fast as it can fetch. Glenn wrote some bursting test code to make 100 reads to the serial port as fast as possible. No problems found. I dont know that actual throughput to the VCS, but its only limited by how fast the VCS can read an address in cart space.

 

For a four-voice generator, the basic algorithm would be:

 ui16 freq[4],phase[4];
 sb8 *table[4],*table2;

 int i;
 ui16 sum;

 sum=0x8080;
 for(i=0; i<4; i++)
 {
   phase[i]+=freq[i];
   sum += table[i][(phase[i] >> 8)];
 }
 outputsample = table2[sum >> 8];

In practice, the loop should be unrolled of course.  The >>8 may be replaced with a larger shift value if 256-byte tables would be too large.

This seems possible, I will have to run a couple tests. You may have to wait a couple VCS clocks to get the answer (you would start the sampling then come back a couple clocks later to retrieve it), is that a huge problem?

 

How exactly were you planning to implement your queues?  Tell me that and I'll tell you how my idea would be different.

Its not so easy to explain. The microcontroller identifies that the VCS has accessed a queue hotspot and then massages the address that the SRAM sees with its queue counter (current location in the 256 byte queue). The micros pretty removed, the work is done by the VCS and the CPLD (which handles the timing of the control lines, same as normal SRAM access).

 

Simpler just to use a 14.31818Mhz crystal.  Have a certain hotspot initialize your counter at least once/frame, and you should be able to keep pretty accurate time for the next 1/60 second without resynchronization.

My clock is 14.7456MHz, optimized for serial communications. I can make some calculations and see if my clock intersects the VCS clock close enough that it could be useful.

 

Thanks for the suggestions,

 

Vern

Link to comment
Share on other sites

OK, it appears that my suggestions were not very popular!

Chris, the suggestions were great. This is all unknown territory and the only way to get a decent picture is to feel out everything. It should be possible to implement a few of your suggestions. At least you made a few suggestions.

 

Keep them coming, comment on others suggestions, please! This is difficult for me because of my absence of VCS programming experience. I just dont know what will help or what is needed. The more discussion we have the better the outcome.

 

Vern

Link to comment
Share on other sites

This seems possible, I will have to run a couple tests.  You may have to wait a couple VCS clocks to get the answer (you would start the sampling then come back a couple clocks later to retrieve it), is that a huge problem? 

926038[/snapback]

A game would probably read one sample value per scanline at most. So you'll have at least 76 cycles to generate a new sample value. But in the display kernel scanlines it would be difficult for a VCS game to spare the cycles needed to trigger the sample generation. Therefore it would be better, if you could return the current sample value immediately and then generate a new sample value for the next read later.

 

The game would need 3 cycles to write the sample value into the TIA volume register, and it takes a couple more cycles to set up a new instruction to read from any of the hardware registers on your cartridge, so you should have enough uninterrupted time to generate a new audio sample after a value was read.

 

 

Ciao, Eckhard Stolberg

Link to comment
Share on other sites

The game would need 3 cycles to write the sample value into the TIA volume register, and it takes a couple more cycles to set up a new instruction to read from any of the hardware registers on your cartridge, so you should have enough uninterrupted time to generate a new audio sample after a value was read.

Thats a great idea. A read would trigger the processing of the sample for the next read. I think that would work. The only issue is the first value ever read would be garbage. I dont think this will be a problem though.

 

Vern

Link to comment
Share on other sites

Hi there!

 

How about something for fractional movement?

 

Like you feed it with a x and a y starting position and an angle and a stepwidth and you could just read a new pair of resulting X and y values each frame. (Or just x or just y in case the angle is 0 or 90 degrees)

 

Admittedly you can live without a function like that as well, but it'd greatly simplify a lot of things.

 

Greetings,

Manuel

Link to comment
Share on other sites

Another thought...don't know if this would work or not - but a playfield scrolling helper would be nice. Possible use: You feed it the starting zero-page RAM location and it scrolls the next six RAM locations like so (example of left scroll):

 

   asl PFNew
  ror PF6Data
  rol PF5Data
  ror PF4Data
  lda PF4Data
  lsr
  lsr
  lsr
  lsr
  ror PF3Data
  rol PF2Data
  ror PF1Data

Since that takes over half a scanline to execute (46 cycles) it can eat up a lot of time in a hurry.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...