Jump to content
IGNORED

New 4MB RAM expansion


Simius

Recommended Posts

Haven't tried alternate OS's yet, but had some fun playing with the different memory configurations, and 1MB ramdisk on SpartaDOS 3.2d using John Pickens Hyperspeed ramdisk driver in my 600XL.

 

The coolest part is not needing to upgrade the main memory to 64KB, but removing the onboard ram chips entirely... and the basic & OS ROM's too :D makes for a fun picture to show.

 

Hopefully you have the firmware that works with SIDE2. I am waiting for a USB Blaster from ebay/china to re-program the altera CPLD, supposedly by connecting to that red block on the board. Never done that before but I've got the windows software installed ready to go when it arrives!

 

Being without SDX is kinda crippling. Looking fwd to getting that going so I can play with stuff like 4MB Axlon, and the liinear Ramdisk driver.

post-53052-0-93913600-1496825767_thumb.jpg

Edited by Nezgar
  • Like 3
Link to comment
Share on other sites

Antonia is 65816, 4MB RAM and flashable OS's, but the normal atari 1.79Mhz, no SDX. PM Simius about availability.

 

If you want faster and built in SDX I think best to look at is rapidus / ultimate 1mb combo. Not sure if those fit in a 600xl.

Edited by Nezgar
Link to comment
Share on other sites

  • 1 month later...

 

Being without SDX is kinda crippling. Looking fwd to getting that going so I can play with stuff like 4MB Axlon, and the liinear Ramdisk drive

 

Please clarify for me - are you without SDX because of SIDE2 not working on older versions of the upgrade? or because of some other incompatibility between this upgrade and SIDE2?

 

(If I ordered/bought one of these upgrades 'now', it would have the updated firmware?)

Link to comment
Share on other sites

 

Please clarify for me - are you without SDX because of SIDE2 not working on older versions of the upgrade? or because of some other incompatibility between this upgrade and SIDE2?

Correct... i accidentally received one without the latest update apparently. Still waiting for USB Blaster coming on the slow boat from China to update it. I would hope if you ordered one that this would be addressed before sending. Can't hurt to mention it i guess when ordering :)

Edited by Nezgar
Link to comment
Share on other sites

  • 2 weeks later...

No speed boost. Same ol' 1.79Mhz. But potentially slightly faster RAMDisk performance using a linear ramdisk driver because the 65816 doesn't actually need to bank switch to access the extra memory like a standard 6502 ramdisk. OS flashing and switching by software is also 'faster' than making EPROMS and switching out ROM chips. :)

 

Send a PM to Simius (originator of this thread) to inquire about availability.

Link to comment
Share on other sites

No speed boost. Same ol' 1.79Mhz. But potentially slightly faster RAMDisk performance using a linear ramdisk driver because the 65816 doesn't actually need to bank switch to access the extra memory like a standard 6502 ramdisk. OS flashing and switching by software is also 'faster' than making EPROMS and switching out ROM chips. :)

 

Send a PM to Simius (originator of this thread) to inquire about availability.

 

Hmm.. I was always under the impression that the 65816 also lowered latency on instructions?

Link to comment
Share on other sites

The 65C816 is the 65xx series and its synchronous bus, so no great improvements over the 6502 can be expected, where - basically - one clock cycle equals to one memory access. When the CPU is clocked with a faster clock than the bus, the 65C816 can spare (superfluous in 6502) memory accesses on internal operation cycles, and this gains like 20% of speed, averagely.

 

The main advantage is better usage of the bus in the native mode. For example, in 6502, to write a word to the memory, you need to do something like this:

 

  lda #aa
  sta $xxxx
  lda #bb
  sta $xxxx+1
It is 10 cycles of instruction fetching to perform 2 cycles of data write. Now 65C816 in the native mode:

 

  lda #bbaa
  sta $xxxx
It is 6 cycles of instruction fetching to perform 2 cycles of data write. And it is possible to improve it to 5 cycles of instruction fetch per 2 cycles of data write (in the first 64k of the address space). So it is like 100% of improvement.

 

But of course, this needs the code to be written for the 65C816. The ordinary 6502 code runs as on the 6502 (the 65C816 timings are actually more compatible with 6502 than the timings shown by the 65C02).

Edited by drac030
  • Like 7
Link to comment
Share on other sites

The 65C816 is the 65xx series and its synchronous bus, so no great improvements over the 6502 can be expected, where - basically - one clock cycle equals to one memory access. When the CPU is clocked with a faster clock than the bus, the 65C816 can spare (superfluous in 6502) memory accesses on internal operation cycles, and this gains like 20% of speed, averagely.

 

The main advantage is better usage of the bus in the native mode. For example, in 6502, to write a word to the memory, you need to do something like this:

 

  lda #aa
  sta $xxxx
  lda #bb
  sta $xxxx+1
It is 10 cycles of instruction fetching to perform 2 cycles of data write. Now 65C816 in the native mode:

 

  lda #bbaa
  sta $xxxx
It is 6 cycles of instruction fetching to perform 2 cycles of data write. And it is possible to improve it to 5 cycles of instruction fetch per 2 cycles of data write (in the first 64k of the address space). So it is like 100% of improvement.

 

But of course, this needs the code to be written for the 65C816. The ordinary 6502 code runs as on the 6502 (the 65C816 timings are actually more compatible with 6502 than the timings shown by the 65C02).

 

 

Very interesting - Thank you!

Link to comment
Share on other sites

Hi folks - what kind of speed boost does this provide? Is the 65816 on average something like 50% lower latency for instructions at the same clock as 6502 or something like that?

 

...

A lot of the changes aren't directly speed related.

 

If you have to do something like drac030's example in a loop using indexed addressing, you cut the number of branches in half.

It's like unrolling the loop once, only better due to the 16 bit instructions.

You still need to increment or decrement the index register the same number of times though.

 

Since index registers are larger, you don't need to worry about crossing 256 byte boundaries, which can make code smaller and faster.

Index registers can address a full 64K without manipulating anything but the index register.

 

You have data bank registers which let index registers address different banks of 64K, and there is a program bank register so that you can execute programs outside of the first 64K of RAM.

 

Code can be relocatable due to 16 bit relative branches and returns. There's more to it than just that, but it opens the door.

Multitasking also becomes a lot easier even if you only have 64K.

 

There are several new transfer instructions to move data between registers.

You can directly push addresses to the stack.

It has some of the 65c02 enhancements.

There are instructions to push/pull X and Y registers to/from the stack without going through A.

Jump tables are easier to implement.

There is a BRA (branch always) instruction.

 

The direct page (formerly known as page 0) can be relocated to any 256 byte block in the first 64K.

This means that you can have more than 256 bytes available for indexed addressing. This could make some code smaller and faster.

 

You can use a larger hardware stack and it supports stack relative addressing.

 

Compilers don't need to maintain a program stack manually in the code, they can just use the hardware stack which makes the code smaller as well as faster.

And I think it adds another addressing mode.

 

etc...

 

I think the improved compiler support due to a combination of those things would certainly lead to noticeably smaller and faster code, and it makes implementing a compiler easier..

 

Link to comment
Share on other sites

A lot of the changes aren't directly speed related.

 

...

The direct page (formerly known as page 0) can be relocated to any 256 byte block in the first 64K.

This means that you can have more than 256 bytes available for indexed addressing. This could make some code smaller and faster.

...

 

That should say direct addressing instead of indexed addressing.

Link to comment
Share on other sites

Very interesting information I've not considered! I'm curious drac030, I would presume you have used optimizations like this in various functions of your custom OS? If so, it's further incentive for me to get it up and running on my system.

Edited by Nezgar
Link to comment
Share on other sites

I have done what I was able to do. The OS must unfortunately be able to run in the emulation mode too and be backwards compatible, so not everything can be used so freely. But the primary purpose is to allow the applications to switch safely into the native mode and use all the CPU features at will.

 

The ramd816l.sys is slow because it uses a (rather slow) loop copying data byte by byte. With a penalty imposed by using a 24-bit address once in each iteration. The advantage is that it works in the emulation mode and does not require a special OS to run.

 

The attached binary is a version of the same ramdisk which uses the 65C816 block moves to transfer data. It should be a little bit faster in comparison.

ramd816n.zip

  • Like 9
Link to comment
Share on other sites

I am convinced now that Mr. Carden buys two of absolutely everything.

Hi Panther!

Yes in most cases I will Purchase Atari Hardware Upgrades and will 99% of the time get at least 2 of them. and her is why I do this. I try to test RealDos with every hardware configuration that is around If there is only a single run of a product and I have only one and it becomes damaged I can no longer test that setup. I do not know why things work this way but if I have two of a bit of hardware the first one always works. The other thing that is most useful is I have been able to loan or give hardware to friends that for whatever reason did not have that hardware and really needed it. Now it the hardware is something really good like the Mega Speedy and Ultimate 1 Meg, IDE Plus 2.0 rev D or S I will often purchase 5 to ten of an hardware upgrade. Now when all that hardware is sold out and the designer no longer wants to to working on that project they want to go on to the next hardware design and build. I now have hardware I can trade, Give, or barter with. I have lost count on how many Ultimate 1 Meg's, Stereo Pokey, U-Switches, VBXE's 2.0, And Multiplexers. Because I gave a Basic Multiplexer to the programming team that did the SpartaDos X Upgrade I was able to get a functioning SDX Cart that work for those people who have purchased Multiplexer's from us.

 

Take Care and Have a Great Day!

Stephen J. Carden

  • Like 3
Link to comment
Share on other sites

  • 3 weeks later...

...

The direct page (formerly known as page 0) can be relocated to any 256 byte block in the first 64K.

This means that you can have more than 256 bytes available for indexed addressing. This could make some code smaller and faster.

...

Actually, the stack isn't restricted to a 256 byte page anymore. It's a 16 bit pointer that can point anywhere in the first 64K.

Not sure why I said it that way.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...