Jump to content
IGNORED

7.16mhz 1200XL


bob1200xl

Recommended Posts

I don't see the XL7 as a platform to run existing games at high speed.

 

Oh I don't know about that. Some games should continue to run at the same speed but just *better* overall. Particularly games that render frame by frame (Flight Simulator II comes to mind, or perhaps the converted Knight Lore?), should be able to render more FPS making for a smoother gameplay experience.

Link to comment
Share on other sites

Something I am wondering about as a retro-game developer. How fast can you write to the ANTIC/GTIA/PIO registers now with a 7.16CPU. I know these chips still need to run on the original bus. Do they like get updated like once per clock cycle now? I am asking this because I am wondering if all the color, player/missile position, and other registers can be changed with a DLI and do it fast enough where it will not go by several scan lines. You usually can change between 3 and 4 registers before the next change starts appearing on another scan line. I have experimented with instructions that take up less CPU, like storing the X and Y registers in page 0 instead of pushing them to the stack, doing a self modifying NMI routine that jumps directly to the DLI routine instead of being vectored at memory location 512. But you still can only do so much with a DLI or else you notice a color changing mid line or on the next line. Someone said that a 65816 CPU can change these as fast as an equivalent of 1 clock cycle on a main bus. I know it takes 4 cycles to store something outside of page 0 (absolute addressing, no indexing). So loading immediate values and storing them probably won't be faster than that limit. I know people were looking to run these 65816 chips faster, but now I can see where it may start running into issues with the hardware area by attempting to change more than one register per main bus cycle.

Link to comment
Share on other sites

Hardware registers would be the same story as ROM - 1 "long" or 4 "short" cycles.

 

So, what I said before about being able to do much more colour changes per scanline isn't quite as rosy as we might think.

 

I guess it'll bring up a whole new aspect of cycle counting. Potentially you could "lose" 3 or 7 short cycles for a store operation depending on when the instruction starts.

 

In any case, GTIA always has that half "long" cycle lag between when you store a colour and when it comes into effect.

Edited by Rybags
Link to comment
Share on other sites

Can you make some code that will test this? You will not be able to store consecutively to I/O hardware but I would expect that you will complete more changes than a stock machine. You write the routines - I'll try it for you...

 

Bob

 

 

 

Something I am wondering about as a retro-game developer. How fast can you write to the ANTIC/GTIA/PIO registers now with a 7.16CPU. I know these chips still need to run on the original bus. Do they like get updated like once per clock cycle now? I am asking this because I am wondering if all the color, player/missile position, and other registers can be changed with a DLI and do it fast enough where it will not go by several scan lines. You usually can change between 3 and 4 registers before the next change starts appearing on another scan line. I have experimented with instructions that take up less CPU, like storing the X and Y registers in page 0 instead of pushing them to the stack, doing a self modifying NMI routine that jumps directly to the DLI routine instead of being vectored at memory location 512. But you still can only do so much with a DLI or else you notice a color changing mid line or on the next line. Someone said that a 65816 CPU can change these as fast as an equivalent of 1 clock cycle on a main bus. I know it takes 4 cycles to store something outside of page 0 (absolute addressing, no indexing). So loading immediate values and storing them probably won't be faster than that limit. I know people were looking to run these 65816 chips faster, but now I can see where it may start running into issues with the hardware area by attempting to change more than one register per main bus cycle.
Link to comment
Share on other sites

You'd need "optimised" RAM-based code, such that your stores to harware regs don't waste the "before" cycles, ie 1,2,3.

 

Normally, if you want to do multiple colour changes then you load 2 or 3 registers and store in quick succession.

 

I'd guess in this instance, you might benefit most by just using 2 registers in some cases, since LDA #data and LDX #data will use 4 cycles exactly.

Then your store instruction is another 4 cycles each.

 

Assuming we've started that sequence on cycle 0, then the STA gtiareg instruction would also start on cycle 0, with the actual store on cycle 3 which means it is delayed until cycle 0 again (1 cycle wasted).

 

Same again with the STX gtiareg.

 

If for example we've optimised the code such that the first store happens exactly on cycle 0, we still have the situation where the second store would happen on cycle 3. But, we have saved 1 short cycle compared to the first situation.

 

Maybe using A, X and Y and staggering stores and subsequent reloads in a certain fashion, the cycle wastage could be minimised.

 

Of course, there's also DMA and Refresh to consider, so it all gets a bit complicated.

 

Still, significant saving so far as that you can execute a LDA # and LDX # instruction sequence in half the time the normal 6502 does a single LDA # instruction.

 

 

With this, I am assuming that any read/write operations to HW Regs or ROM are always delayed such that they occur on Cycle 0.

Edited by Rybags
Link to comment
Share on other sites

You probably can just use one register, do LDA #xx, STA $Dxxx, which takes 6 cycles total. If you send to the hardware area 4x speed, it would be updating the registers every 1.5 clock cycles. We can have Bob or someone do DLI tests on a 65816 and see what happens, after all we are all guess at this point.

Link to comment
Share on other sites

First of all, after reading the answers to some of the questions posted here, I have to say that I agree with keeping this project simple instead of trying to add features like extra memory, etc. It seems like a shame not to use the 65816's extra addressing ability, but keeping the scope of the project smaller just seems better. That said, there are still a few questions that I thought I'd throw out there:

 

-I understand why there may be problems with this board working with a 130XE because of it's memory scheme, but why would it not work with the 400, 800, 65XE, and XEGS? If it won't work, it won't work... I was just wondering what differences there were in the hardware.

 

-Is PBI access done at high speed or slow speed? If high speed, could there be problems with devices being too slow or cables being too long?

 

-How hard would it be to un-install? Do I just stick the RAMs back in their sockets and the old CPU back in? Are there any board mods (such as cut traces)?

Link to comment
Share on other sites

Bob,

 

Here is a ZIP file with few games on an ATR files... can you test then on the XL7 machine?

is Colusus chess calculate moves faster?, Does Mercenary Game draws the wireframes graphics faster?, is the Writeframes graphics in Assult force move faster?, does snooker or leaderboard golf games improove its performace?

 

Games_to_Test_on_XL7.zip

Link to comment
Share on other sites

OK - I'd like to do DLI tests but STA $Dxxx is going to give me an error, I think. Does anyone have the code to try DLIs?

 

Bob

 

 

 

You probably can just use one register, do LDA #xx, STA $Dxxx, which takes 6 cycles total. If you send to the hardware area 4x speed, it would be updating the registers every 1.5 clock cycles. We can have Bob or someone do DLI tests on a 65816 and see what happens, after all we are all guess at this point.
Link to comment
Share on other sites

The XE machines should work, it's just that they will need different mechanical and electrical layout. I don't think you will be able to use the same board and you may have some added or modified logic. I don't know - have not looked in an XE. Same thing goes for a 400/800, only more of a challenge.

 

The PBI is an open issue. You could run them either from RAM or ROM, at high speed or low, but existing devices would all run low. The 1200XL7s we have so far do not have PBIs.

 

You have to cut traces on the 1200XL in order to isolate the SO pin. You also have to disable the old 3.58mhz clock by removing components. (you could cut a trace if you prefer) Five wires are then added to the bottom of the board. Pull the memory circuits. Plug in the new CPU board.

 

To remove the upgrade, plug the old memory ICs back in, remove the added wires, add a wire jumper to SO, re-install the clock components (a transistor and a resistor), and plug in the old CPU.

 

Bob

 

 

 

First of all, after reading the answers to some of the questions posted here, I have to say that I agree with keeping this project simple instead of trying to add features like extra memory, etc. It seems like a shame not to use the 65816's extra addressing ability, but keeping the scope of the project smaller just seems better. That said, there are still a few questions that I thought I'd throw out there:

 

-I understand why there may be problems with this board working with a 130XE because of it's memory scheme, but why would it not work with the 400, 800, 65XE, and XEGS? If it won't work, it won't work... I was just wondering what differences there were in the hardware.

 

-Is PBI access done at high speed or slow speed? If high speed, could there be problems with devices being too slow or cables being too long?

 

-How hard would it be to un-install? Do I just stick the RAMs back in their sockets and the old CPU back in? Are there any board mods (such as cut traces)?

Link to comment
Share on other sites

Quick and dirty DLI program (load from BASIC - the ATR only contains the SAVEd Basic program, and not DOS).

The DLI attempts to reuse the same player in 5 instances per scanline.

 

I guestimated the delays so it probably won't work perfectly without some tweakage on your part.

 

To make it easy, just change the values of H1 through H5 in the program. They correspond to Player X positions onscreen.

Just edit lines 30-70 and GOTO 30 once you've run the program already and want to tweak the values.

 

By tweaking the HPos values, you might be able to get them all onscreen. Probably best to do them in left to right order (ie H1, H2 etc).

If anything, I'd probably guess that they need to be seperated a little more from what they are currently set at - onscreen values range from 48 to 200.

 

XL7Test.zip

 

The DLI code is fairly simple too - here's hoping it will kinda work.

 

0600: 48		PHA
0601: 8A		TXA
0602: 48		PHA
0603: 8D 0A D4  STA $D40A  ;WSYNC
0606: 98		TYA
0607: 48		PHA
0608: A0 20	 LDY #$20
060A: A9 30	 LDA #$30
060C: 8D 0A D4  STA $D40A  ;WSYNC
060F: 8D 00 D0  STA $D000  ;HPOSP0
0612: A9 50	 LDA #$50
0614: A2 08	 LDX #$08
0616: CA		DEX
0617: D0 FD	 BNE $0616
0619: 8D 00 D0  STA $D000  ;HPOSP0
061C: A9 70	 LDA #$70
061E: A2 04	 LDX #$04
0620: CA		DEX
0621: D0 FD	 BNE $0620
0623: EA		NOP
0624: EA		NOP
0625: 8D 00 D0  STA $D000  ;HPOSP0
0628: A9 70	 LDA #$70
062A: A2 04	 LDX #$04
062C: CA		DEX
062D: D0 FD	 BNE $062C
062F: EA		NOP
0630: EA		NOP
0631: 8D 00 D0  STA $D000  ;HPOSP0
0634: A9 90	 LDA #$90
0636: A2 04	 LDX #$04
0638: CA		DEX
0639: D0 FD	 BNE $0638
063B: 8D 00 D0  STA $D000  ;HPOSP0
063E: 88		DEY
063F: D0 C9	 BNE $060A
0641: A9 00	 LDA #$00
0643: 8D 00 D0  STA $D000  ;HPOSP0
0646: 68		PLA
0647: A8		TAY
0648: 68		PLA
0649: AA		TAX
064A: 68		PLA
064B: 40		RTI

Link to comment
Share on other sites

I agree with simple too.

 

Just got done catching up on this thread.

 

I like it! IMHO, the extra speed will enable lots of interesting and new things without really changing the machine all that much.

 

Watching for now to see what works and what does not. At some point in the future, I would very much like to mod a machine, maybe an 800 XL, or 130 XE, and enjoy this.

Link to comment
Share on other sites

post-14708-1220813526_thumb.jpg

 

This is what you see if you run it under Atari BASIC using a ROM OS... the 'players' blink at maybe 10hz.

 

post-14708-1220813704_thumb.jpg

 

This is what you see if you run a RAM OS using Atari BASIC... no blinking.

 

Bob

 

 

 

 

 

Quick and dirty DLI program (load from BASIC - the ATR only contains the SAVEd Basic program, and not DOS).

The DLI attempts to reuse the same player in 5 instances per scanline.

 

I guestimated the delays so it probably won't work perfectly without some tweakage on your part.

 

To make it easy, just change the values of H1 through H5 in the program. They correspond to Player X positions onscreen.

Just edit lines 30-70 and GOTO 30 once you've run the program already and want to tweak the values.

 

By tweaking the HPos values, you might be able to get them all onscreen. Probably best to do them in left to right order (ie H1, H2 etc).

If anything, I'd probably guess that they need to be seperated a little more from what they are currently set at - onscreen values range from 48 to 200.

 

XL7Test.zip

 

The DLI code is fairly simple too - here's hoping it will kinda work.

 

0600: 48		PHA
0601: 8A		TXA
0602: 48		PHA
0603: 8D 0A D4  STA $D40A ;WSYNC
0606: 98		TYA
0607: 48		PHA
0608: A0 20	 LDY #$20
060A: A9 30	 LDA #$30
060C: 8D 0A D4  STA $D40A ;WSYNC
060F: 8D 00 D0  STA $D000 ;HPOSP0
0612: A9 50	 LDA #$50
0614: A2 08	 LDX #$08
0616: CA		DEX
0617: D0 FD	 BNE $0616
0619: 8D 00 D0  STA $D000 ;HPOSP0
061C: A9 70	 LDA #$70
061E: A2 04	 LDX #$04
0620: CA		DEX
0621: D0 FD	 BNE $0620
0623: EA		NOP
0624: EA		NOP
0625: 8D 00 D0  STA $D000 ;HPOSP0
0628: A9 70	 LDA #$70
062A: A2 04	 LDX #$04
062C: CA		DEX
062D: D0 FD	 BNE $062C
062F: EA		NOP
0630: EA		NOP
0631: 8D 00 D0  STA $D000 ;HPOSP0
0634: A9 90	 LDA #$90
0636: A2 04	 LDX #$04
0638: CA		DEX
0639: D0 FD	 BNE $0638
063B: 8D 00 D0  STA $D000 ;HPOSP0
063E: 88		DEY
063F: D0 C9	 BNE $060A
0641: A9 00	 LDA #$00
0643: 8D 00 D0  STA $D000 ;HPOSP0
0646: 68		PLA
0647: A8		TAY
0648: 68		PLA
0649: AA		TAX
064A: 68		PLA
064B: 40		RTI

Link to comment
Share on other sites

See now this would be quite desirable for BBS operations or potentially (with the ethercart) making the 1200XL as an FTP or webserver, if the CPLD code could be made available it would allow us to all tinker around and potentially create variations of the code to have mega-speed 1200XL's were graphics/compatibility aren't an issue.

 

 

As for the board as a whole - I LIKE IT !!! :-)

 

Curt

 

There is no such provision for doing that - it would mean re-programming the CPLD. You'd be on your own there...

 

Bob

 

 

I'd buy one too!

 

Would it be possible to vary the speed? I can imagine situations where 200% or even 125% of original speed would be desirable...

Link to comment
Share on other sites

I have no problem with releasing the code but I will wait until ABBUC is over. If someone would like to make these things for people, they may not want the code out there - then, we have to decide if we want products or playthings.

 

Glad you like it!

 

Bob

 

 

 

See now this would be quite desirable for BBS operations or potentially (with the ethercart) making the 1200XL as an FTP or webserver, if the CPLD code could be made available it would allow us to all tinker around and potentially create variations of the code to have mega-speed 1200XL's were graphics/compatibility aren't an issue.

 

 

As for the board as a whole - I LIKE IT !!! :-)

 

Curt

 

There is no such provision for doing that - it would mean re-programming the CPLD. You'd be on your own there...

 

Bob

 

 

I'd buy one too!

 

Would it be possible to vary the speed? I can imagine situations where 200% or even 125% of original speed would be desirable...

Link to comment
Share on other sites

I haven't looked at this in hardware, but I think what is happening is that we are executing the code ahead of the scan line. If I set line 30 to H1=40 and line 50 to H3=44, I see the whole player at 44 and the player at 40 in only the first scan line. It appears that we set HPOS way before the scan line reaches that point - eventually setting it to zero before any players are drawn.

 

Bob

 

 

 

Quick and dirty DLI program (load from BASIC - the ATR only contains the SAVEd Basic program, and not DOS).

The DLI attempts to reuse the same player in 5 instances per scanline.

 

I guestimated the delays so it probably won't work perfectly without some tweakage on your part.

 

To make it easy, just change the values of H1 through H5 in the program. They correspond to Player X positions onscreen.

Just edit lines 30-70 and GOTO 30 once you've run the program already and want to tweak the values.

 

By tweaking the HPos values, you might be able to get them all onscreen. Probably best to do them in left to right order (ie H1, H2 etc).

If anything, I'd probably guess that they need to be seperated a little more from what they are currently set at - onscreen values range from 48 to 200.

 

XL7Test.zip

 

The DLI code is fairly simple too - here's hoping it will kinda work.

 

0600: 48		PHA
0601: 8A		TXA
0602: 48		PHA
0603: 8D 0A D4  STA $D40A ;WSYNC
0606: 98		TYA
0607: 48		PHA
0608: A0 20	 LDY #$20
060A: A9 30	 LDA #$30
060C: 8D 0A D4  STA $D40A ;WSYNC
060F: 8D 00 D0  STA $D000 ;HPOSP0
0612: A9 50	 LDA #$50
0614: A2 08	 LDX #$08
0616: CA		DEX
0617: D0 FD	 BNE $0616
0619: 8D 00 D0  STA $D000 ;HPOSP0
061C: A9 70	 LDA #$70
061E: A2 04	 LDX #$04
0620: CA		DEX
0621: D0 FD	 BNE $0620
0623: EA		NOP
0624: EA		NOP
0625: 8D 00 D0  STA $D000 ;HPOSP0
0628: A9 70	 LDA #$70
062A: A2 04	 LDX #$04
062C: CA		DEX
062D: D0 FD	 BNE $062C
062F: EA		NOP
0630: EA		NOP
0631: 8D 00 D0  STA $D000 ;HPOSP0
0634: A9 90	 LDA #$90
0636: A2 04	 LDX #$04
0638: CA		DEX
0639: D0 FD	 BNE $0638
063B: 8D 00 D0  STA $D000 ;HPOSP0
063E: 88		DEY
063F: D0 C9	 BNE $060A
0641: A9 00	 LDA #$00
0643: 8D 00 D0  STA $D000 ;HPOSP0
0646: 68		PLA
0647: A8		TAY
0648: 68		PLA
0649: AA		TAX
064A: 68		PLA
064B: 40		RTI

Link to comment
Share on other sites

IMHO, this is a worthy discussion.

 

In this community, there exists a small set of users capable of just doing this. Of this set, there are a significant fraction who value their time and would be highly likely to purchase a kit, instead of just rolling their own.

 

The rest of the community then are targets for a product, kit or service.

 

My only worry about the code being out there is too many variations.

 

If we are to see some development on this, and a user base, perhaps it's wise to establish said kit, product and or service before the code sees greater distribution. In the end, having it out there is probably good, but maybe not so good right at the start.

Link to comment
Share on other sites

This is a cool project. I agree with the other poster that not using the full capabilities of the 65816 is a waste, though. But it's cool that as much backwards compatibility can be accomplished and still net some speed increases. I thought that would have been impossible. The ram separation and dual clocking reminds me of the "fast ram" vs. "chip ram" solution with Amigas. I wonder how this comapres to John Harris' approach to acceleration.

Link to comment
Share on other sites

I don't know if it's really a waste not to utilize the '816 fully... after all it is a fairly goofy chip overall. I'd even go so far as to suggest that perhaps a current-generation 65C02 would be a better choice: for a 64K machine, the extra instructions that the 65C02 has (some of which are missing from the '816 by the way) are just as useful, if not more so in some cases. And WDC's 65C02 goes up to 14MHz, just like the '816.

Link to comment
Share on other sites

I don't know if it's really a waste not to utilize the '816 fully...

 

That depends on whether you just want to accelerate existing software or write new stuff. The Chimera (if it ever gets finished) will have 256K minimum SRAM, and more likely 512K. If the 2600 can get that much RAM in 2008, seems reasonable to at least have that much linear RAM on something like this, if it can be done reliably. Actually, just having a RISC share the hardware with the base 6502 like in Chimera might be better as far as new programs go.

Edited by mos6507
Link to comment
Share on other sites

Bob: I calculated the delay constants in my head quickly, so they're probably a bit out.

 

Maybe I should have just done a colour change instead.

 

If you edit the data and change any "0" before "208" to a "26", then it'll just change the background colour instead.

 

The H1.. H5 values will then simply be the colour values it stores.

Link to comment
Share on other sites

SD3.2D, SDX 4.20 and 4.21 all work in high speed/low speed SIO on the XL7. As long as you are using the serial clock gererator and not bit-banging, (like they seem to do in the 810 and 1050) you're OK. If you're trying to count cycles, you're probably going to croak.

 

This is probably pure luck, that your serial devices are fast enough. Every SIO driver on this world contains a busy loop that makes some delay between setting the COMMAND line on the SIO, and sending the actual command to the device. When you clock CPU 4 times faster, this loop is 4 times shorter, of course. When you clock the CPU 5 or 8 times faster, or even more (on the accelerator I have), the loop becomes yet shorter. So no worry, the loop calibration is necessary anyways, to compensate this.

Edited by drac030
Link to comment
Share on other sites

The OS (and others, I'd guess) uses a pessimistically long delay to cater for a worse than worst case device lag.

 

Given that Bob's upgrade isn't strictly 4 times faster and that ROM code doesn't run much quicker, it wouldn't make a lot of difference.

 

But, RAM-based code is a different story, although if it's running in normal Gr. 0 then it's a bit closer to a normal machine.

 

Maybe trying the various SIO turbo modes with DMA turned off might unearth any potential incompatibilities.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...