Jump to content


  • Content Count

  • Joined

  • Last visited

Everything posted by Crispy

  1. And since I've already worked out a solution, I'll go ahead and post it.
  2. For the first in what will hopefully be a series of challenges, I've chosen a favorite topic of mine. As we all know, the 6502 lacks multiply and divide instructions, so we are required to roll our own when we need to perform these operations. I've seen a few threads here about dividing by specific values, but I don't recall seeing a discussion on a general purpose divide routine. So, for this week's challenge that is exactly what we are going to do. Challenge Write a routine that will perform a divide operation given a dividend and divisor, and then return the quotient along with the remainder. Requirements The routine is to perform an integer division, and then return the quotient and remainder. There are no requirements for error checking; you may assume that the routine will always be called with valid parameters. Please post your solution as a spoiler. Input A = the divisor, where 255 >= divisor >= 1. X = the dividend, where 255 >= dividend >= 0. Return A = the quotient X = the remainder
  3. It's been a week, so I posted the solution. Reading through the posts, I get the impression that I'm the only one here who didn't already know this trick. I was hoping to see more people jump in and share their ideas, but it still generated some good discussion. In addition, I also like the idea of a weekly 6502 assembly challenge. I'll put my brain to work, and try to come up with a couple.
  4. While working on one of my FPGA designs, I came up with a trick that I thought was very cool. It turns out that I'm not the first one to think of it though, because when I googled it I found that it's already somewhat well known. I probably should have saved time by googling it in the first place, but where's the fun in that? I was thinking that some of you might appreciate this trick as well, but instead of giving it away I'm going to pose it as a brain teaser. The problem that needs to be solved is this: you are receiving data from a producer that creates 8 bit values. You need to determine if there is more than one bit set in each byte that you receive. How would you do this? For example, you receive a byte that contains the value 0x28. This byte has two bits set, so your program/logic will return a true condition to indicate that more than one bit is set in the byte. You can post your solution using C style syntax, or 6502 assembly, or actually anything that you like. There's no doubt that some of you already know this, so if you do then please wait a few days before giving it away. Also, no cheating and googling it before giving it a go yourself. And the solution is:
  5. I posted video on Youtube of the two test programs running on my hardware. Star field test: H Move test: In general, the TIA is a purely digital device, and is not supposed to be affected by analog effects such as the phase between two signals. That's the whole point of synchronous logic. Logic states are sampled at clock edges, and then the logic updates and settles to a new state in between the clock edges. Exactly how this happens, and exactly how long it takes is not important, so long as the logic has reached a stable and correct state before the next clock edge. (Actually there is some analog circuitry in the chip for the audio levels and color clock phase, but that circuitry doesn't really pertain to this discussion.) Unfortunately, due to size and cost constraints, not to mention the state of technology at the time, you can't always get what you want. The chip designers at Atari were obviously well aware of this. Some compromises had to be made, and one of them led to the famous Cosmic Ark star field effect. There's no doubt that the designers took some shortcuts with the extra-clocks counter in order to save space on the die, and they knew that these shortcuts could lead to potential software issues. The evidence can be found in the Stella Programmer's manual where it says, "WARNING : These motion registers should not be modified during the 24 computer cycles immediately following an HMOVE command. Unpredictable motion values may result." It is hardware shortcuts like these that create grey areas when attempting to reconstruct the original intent of a design. The schematic alone might not tell the whole story, and someone who is attempting to recreate the original design will be forced to read between the lines. A good example of this can be found in the CPU register decoding. There are two clock domains in the TIA. The 1.19 MHz CPU clock and the 3.58 MHz system/pixel/color clock. When the CPU writes a value to a register that is used by logic driven by the system clock, there is no clear indication in the schematic that says exactly when the value in that register should be updated. Should it be updated before the next system clock edge, or after the next system clock edge? There are, however, clues that answer this question, and it is obvious after careful examination what that answer is. The original chip designer realized that due to the speed of the logic, and the tolerances of the manufacturing process, that there wasn't enough margin to safely let the register update before the next system clock edge. So he added some extra delay to the decoding logic to ensure that the register will be updated after the next system clock edge. I did the same thing in my FPGA, and found that the resulting behavior matched that of the original Atari chip. So the point of all this is, yes, it will be interesting to see how the results of these tests compare between the original hardware and my hardware. Mr SQL, I also posted video of your Gate Crasher game. I ran the game on my FPGA hardware, on Stella, and on my NTSC 2600 Jr. To my ear they all sounded virtually the same. I also tried it with the analog audio output from my FPGA hardware connected directly to my Sansui receiver, and while it may have had a bit more depth than the HDMI audio, it sounded mostly the same. In each case, I ran the game from my Harmony cartridge. Do you think that might have something to do with it? I'm not sure if you're relying on something in the Supercharger that isn't exactly the same in the Harmony.
  6. OK, so let me pose a couple of questions, so that I may better understand your point. Are the TIA chips that Atari produced in the late 80s emulations of the chips produced in 1977? They aren't transistor exact implementations, and they also don't behave the same in corner cases. Also, if Atari was still producing the TIA chip today, it would probably be manufactured on a modern process, let's say a 45 nm process. The small transistors would remove the size constraints, and allow Atari to fix some of the bugs such as the bug in the extra-clocks counter that is responsible for the Cosmic Ark star field effect. Along with that, the fast switching times and tight tolerances would almost certainly guarantee that the anomalies seen in the old chips would be a thing of the past. Would this then be an emulation of the original TIA design? It takes just over one second for the FPGA to load its configuration data from FLASH, and then configure itself. It takes another 750 ms for the FPGA to configure the PLL that generates the core and HDMI clocks. And then finally it takes another 250 ms to configure the HDMI transmitter chip. Once all that is done, it waits another 100 ms, and then pulls the 6507 and RIOT out of reset. I've been very careful to ensure that all of the logic is held in reset long enough, and then pulled out of reset in the correct sequence so that the system comes up in a valid and stable condition every time. So, you wouldn't be able to "fry" carts on my system. Well, our motivations my not be the same. From what I have seen, a design goal for most emulators is to have it behave exactly like the old hardware. A lot of work goes into mimicking odd behavior, corner cases, etc. My goal, on the other hand, is to implement the original design, as it was intended, with modern hardware. At this point, I'm confident that it's correct, and that my hardware is doing exactly what the schematic says it should be doing. I'm basing this assertion on many hours of verifying my FPGA hardware against the logic simulations of the schematics. Because of this, there will most certainly be cases where my hardware doesn't exhibit the same anomalous behavior seen in a lot of the old hardware, and it may not be a good reference to use. It really comes down to what you are trying to achieve. I downloaded the test programs. I'll dust off the Elgato capture box, and get some video for you, probably this weekend.
  7. The schematic is labeled REV E, and is dated Oct. 4, 1982. I don't know how that corresponds to the silicon revision. I expect that tracking down the revision number of the chips manufactured in late 1982 or early 1983 would give us the answer. Without knowing the silicon revision, I am unable to test it against the original chip. The point of my design, though, is to implement the exact original circuit using contemporary technology. Since the tolerances of modern CMOS logic are so tight, there should be no corner cases related to thermal conditions. In my opinion, if a temperature change affects the behavior of a chip, then that chip is defective, so I really wouldn't want my implementation to exhibit that behavior. Agreed. But that can be said of any of the TIA chips that Atari produced. The tolerances of the NMOS processes in the late 70s and early 80s were so loose that even identically specified transistors on the same die would exhibit different transconductance characteristics. And then in the later 80s when the chips were manufactured on more modern processes, the transistor characteristics changed radically from those that were in the chips manufactured years earlier. So that raises the question, is there such a thing as a transistor-exact implementation? A more important point though, is that the TIA is designed with synchronous logic. This means that all of the logic state changes are synchronized to a clock. These state changes only occur when the clock transitions, either from low to high, or from high to low, depending on the circuit designer's choice. Exactly how the logic updates in between these clock edges is irrelevant. The only thing that matters is that the logic settles to a stable and correct condition before the next clock edge. And because of this the analog effects of the actual transitors such as propogation delay, rise time, etc., are irrelevent, or more correctly, are relevent only to the chip designer since it's his job to take into account these analog effects, and make sure that the logic is implemented in silicon in such a way that it behaves as intended. Yes, it works with bus stuffing. You can see it in action here: https://www.youtube.com/watch?v=qrrEQTTcX7U. This was captured directly from the HDMI output using an Elgato HD60 capture box. Be sure to set your browser/player to 1080p @60, or else it won't look correct. I'm still curious as to the reasoning behind calling my hardware implementation a simulation. That was whole point of my original post, to gain some perspective.
  8. I've seen a lot of comments about how an FPGA is simulation or emulation. I'm interested in hearing the reasoning behind these statements, because I don't agree with them. Well, let me be more specific, and say that in general I don't agree with these statements. In some cases I may be willing to concede the point. Take the example of an FPGA design that is solely the result of reverse engineering and observation of a device's behavior in the presence of controlled stimuli. In this case, the device is a black box, and the FPGA designer can't see the actual circuitry inside of it. All he can do is observe how it behaves. Even though the FPGA may behave identically to the original device, there's no way of telling if the FPGA design is identical to the circuit design inside the device. For this particular FPGA design, I would be willing to call it circuit emulation. Now take for example the FPGA design of the TIA chip in my Atari 2600 device. Since I was working from the actual Atari schematics for the TIA chip, I was able to implement the exact circuits in the FPGA. In this case, it is not circuit emulation. It is in fact a recreation of the TIA chip implemented with contemporary CMOS digital logic, and I will go as far as to say that it is as much a real TIA chip as any of the TIA chips that Atari produced over the years. Anyway, to make a point here, an FPGA design that aims to recreate an existing device is not by default emulation or simulation. It really depends on how the FPGA designer implemented it.
  9. I had a thought about how to improve on bus stuffing using a simple and cheap hardware mod. It will allow the Harmony to act as a DMA controller without the bus contention that bus stuffing causes. It also has the added benefit of allowing the Harmony to write a TIA or RIOT register in a single CPU cycle. The idea is to put a tri-state buffer between the CPU address bus and the main board address bus so that after a write to WSYNC, when the READY line goes low, the CPU address lines will be disconnected from the main board address bus. At the same time, the R/_W line will be forced low, and at this point the Harmony is free to control the address and data buses. Since the R/_W line is now in the write state, a register will be written every CPU cycle. For any cycle that doesn't require a register write, the Harmony can set the address to some non-existent register. There is no data bus contention during DMA because when the READY line goes low after the write to WSYNC, the internal CPU data bus drivers are disconnected since the CPU is now in the state of fetching the next instruction. The hardware mod is very simple. It will be a small circuit board with a 28 pin socket and a few discrete logic chips. You install it by unplugging the 6507 CPU from its socket, plugging the small board into the CPU socket on the main board, and then plugging the 6507 CPU in the socket on the small board. Unfortunately, installation would only be this simple on six switch and four switch consoles. Installation on the Jr. would require soldering since the CPU is soldered to the main board on those machines. Software interfacing is very straightforward as well. The small board will have a register that will be used to enable or disable DMA. When you want to use DMA you first write a 1 into the DMA enable register. Once DMA is enabled, you can write to WSYNC to activate DMA. It will remain active until the beginning of the next scan line when the READY line goes high, at which point the CPU is given back control of the address bus and R/_W line. Disabling DMA is done by writing a 0 to the register on the small board. Doing this will cause the hardware to behave as if the mod isn't even there. I realize that this is not a "pure" solution since it involves a hardware mod, but I would like to make a couple of arguments in favor of its validity. First, this is a cheap and easy hardware mod that was feasible back in the early 1980s when the 2600 was popular. It uses parts that were cheap and readily available back then. Yes, it is only one half of the solution. The other half is the Harmony cartridge, which didn't exist in the heyday of the 2600. However, knowing what David Crane did with the DPC chip, it would have been possible to make an inexpensive DMA controller chip for Atari 2600 cartridges back then. And finally, games that set hardware requirements are nothing new. Using the Amiga as a example, when the Fat Agnus chip became available, a lot of games hit the market that required the 1 Meg Agnus. If you had and A500 or A2000 that had the 512K Agnus, then you had to upgrade your hardware with the Fat Agnus in order to play those games. Comments? Suggestions?
  10. Yes, I should have been more descriptive. What I should have said is that the active sprite time is always eight pixel clocks, because there are always one or more sprite bits that are skipped, and one or more sprite bits that are displayed as double wide pixels. So in all cases when the lower three bits of the NUSIZx register are set to anything other than 5 or 7, then the sprite is always 8 clocks wide. I haven't looked at the cases for double size or quad size players yet.
  11. I ran a simulation of the player 0 sprite with extra clocks always enabled, and these are the points I took away from my analysis. The sprite position follows the same progression from line to line as the missile position. The sprite is always eight pixels wide during active video. There is always at least one bit that is not displayed followed by the next bit displayed as a double wide pixel. Two double wide pixels are separated by four pixel clocks. The bit to the right of a double wide bit will be displayed as a double wide pixel on the next line. For example, if bits 6 and 2 are double wide this line, then bits 5 and 1 will be double wide next line. Here we see that bits 5 and 1 are not displayed, and bits 4 and 0 are displayed as double wide pixels. On the next line bits 4 and 0 are not displayed, and bit 3 is double wide. I'll get back to looking at the missile at its 2x, 4x, and 8x sizes in a few days. Until then, here's my simulation data for the player 0 sprite. starfield_sim_p0.zip
  12. Another solution is this: https://www.amazon.com/CAIG-DeOxit-Cleaning-Solution-Spray/dp/B0002BBV4G It works very well, but has the drawback of being pretty expensive.
  13. There's always the option of adapting the Wii sensor bar and Wiimote to work with the 2600. Maybe even refit a light gun with the PixArt image sensor so that it uses the Wii sensor bar to detect position.
  14. I was able to collect some data using the test program that SpiceWare supplied and found that bus stuffing breaks when the Harmony is driving a logic low on a data pin, and the resulting voltage at the pin is higher than 1.39 V. This happens when the resistance between the harmony cartridge and the data bus is greater than 39 ohms. When the resistance is less than 1 ohm there is a comfortable margin of 0.47 V, at least on my four switch console. I realize that a sample size of one really doesn't provide enough data points to support a conclusion, but having a margin of nearly 0.5 V leads me to believe that bus stuffing will work on most 2600 consoles. Noting that it takes only 39 ohms of resistance between the harmony cartridge and the data bus to break bus stuffing, it's important to have the contacts perfectly clean on both the Harmony cartridge and on the 2600 cartridge connector. I've seen oxidation on connector contacts introduce 50 ohms or more of resistance. It's also possible, as Kosmic Stardust and ZackAttack pointed out, that some TIA chips and/or CPUs may have odd timing issues that prevent bus stuffing from working. However from what I'm seeing on the scope, the timing margins are good, and bus stuffing should work fine in most cases.
  15. I ran the 128bus_20170120.bin file on my 2600 Jr., and this is what I see. After reading about some of the issues reported here I was curious about how the data bus was behaving during the data contention that bus stuffing creates. I looked at the data bus on my 2600 four switch, and here's what I saw on the scope. Unfortunately I can't hook my four switch up to a TV to see how the image looks since I currently have a whole bunch of parts unsoldered. The voltage level at the TIA data pin is being pulled down to 0.92 V by the Harmony cartridge. Following the specs for the 6502, that exceeds the maximum voltage for a valid logic low by 0.12 V. However, we're really interested in how the TIA handles this. Unfortunately I can't find any published specs related to the low level voltage for the TIA. Obviously we're seeing bus stuffing working on most machines, but I get the feeling that things are right on the edge. I'd like to do some more testing. Would you be able to create a bus stuffing demo that puts a single black pixel somewhere in the middle of each scan line? In other words, write $FF to GRPx except for one write near the middle of the line. Here you would write $FE. This would have the effect of creating a single pixel wide vertical black line positioned roughly in the middle of the screen, and would provide a single event to trigger my scope on. With this I can figure out how much margin there is before bus stuffing breaks. Also, I had a thought about those who have machines that can't handle bus stuffing. It's possible that cleaning the cartridge connector may fix the issue. Oxidation build up on the cartridge contacts will introduce additional resistance between the cartridge and the data bus, and could be enough to keep the voltage from going to low enough to count as a logic low.
  16. I've had one remaining issue lurking in my FPGA TIA core for some time now, and I finally got around to working on it. Of course I'm talking about the famous Cosmic Ark star field. I've read through many posts on the subject, and unless I missed some critical information somewhere, it appears that the behavior and the underlying cause of the effect are not fully understood. So, I decided to pull out my scope, and dig into it myself. There are two parts to the star field effect. The first is well understood. You can trick the extra clock logic into generating extra clocks indefinitely after an HMOVE by changing the value in the HMMx (or HMPx, or HMBL) register so that the comparison that resets the extra clocks enable line never happens. Once the logic is in this state, extra clocks are continuously generated once every four system clocks until HMOVE is written again. The second part of the effect is where the mystery lies, and I was completely baffled by it when I first saw on the scope what the TIA was doing. Instead of always producing one clock wide pixels, the effect occasionally produces two clock wide and zero clock wide pixels. In the posts that I've read on the subject, the pattern is described as two single wide pixels followed by a double wide pixel and then no pixel, each shifted 15 pixels left from the position on the previous line. However, the actual pattern is more complex than this, as we can see in the following table. LINE PIXEL WIDTH DELTA BLANKING ---------------------------------------- 5 54 4 -- * 6 10 4 -44 * 6 214 0 +204 7 197 1 -17 8 180 1 -17 9 162 2 -18 10 146 0 -16 11 129 1 -17 12 112 1 -17 13 94 2 -18 14 78 0 -16 15 38 4 -40 * 15 221 1 +183 16 204 1 -17 17 186 2 -18 18 170 0 -16 19 153 1 -17 20 136 1 -17 21 118 2 -18 22 102 0 -16 23 85 1 -17 24 66 3 -19 * 25 0 2 -66 * 25 210 2 +210 26 194 0 -16 27 177 1 -17 28 160 1 -17 29 142 2 -18 30 126 0 -16 First off, the NTSC TIA produces 68 pixel clocks of horizontal blanking. This translates to 17 extra clocks per line, and so the shift is actually 17 pixels. But more interestingly, the progression of the pattern changes when a pixel straddles blanking and active video. An example of this behavior can be seen on line 24 at pixel 66. If we were to follow the pattern of 1-1-2-0 from line 19 pixel 153, then the pixel on line 25 at clock 210 should have zero width, but it is actually double wide. So what actually causes the pixels to sometimes be normal width and at other times be double or zero width? If we look at the schematic we see that the clock for the missile logic is formed by ORing the motion clock with an inverted copy of the extra clocks. Under normal circumstances these two clock lines are not active at the same time, but that's not the case here. When the inverted extra clocks line goes high, the motion clock line goes low. Do these two events happen at the exact same time? If not then there will be a glitch on the missile clock line every time an extra clock is generated, and if this is the case then does the missile logic see this glitch as a valid clock edge? I found the answer to these questions by looking at the TIA READY and XTAL pins with my scope, and using the measured time difference to arrive at a rough estimate of the average propogation delay through a single gate. Based on this information, I calculated that the motion clock line goes low 21.5 ns before the inverted extra clocks line goes high, and the resulting glitch has a pulse width long enough to create a valid clock edge for the missile logic. Now that my FPGA design was correct, I ran a simulation of it to see exactly what the TIA is doing when extra clocks are present during active video. Under normal conditions the missile clock and the pixel clock transition from low to high at the same time. When this happens a pixel at the input of the serial graphics latch will be latched at its output, and the missile logic updates causing the pixel to be removed from the latch input. The result is one pixel into the latch and one pixel out. Things get more interesting when the missile logic is about to produce a pixel, and an extra clock occurs. In this case the glitch on the missile clock line causes the missile logic to update early, and put a pixel at the input of the latch. I marked this event in red. Then at the time when the pixel clock does switch, there is already a pixel at the latch input. The missile logic isn't updated at this time because the missile clock line has already transitioned from low to high when the extra clock occurred. After this when the next pixel clock edge comes along, things are back to normal. The pixel on the latch input is latched to its output, and the missile logic updates and removes the pixel from the latch input. The result of all this is a double wide pixel that occurs one clock early. A similar chain of events occur when there is a pixel at the latch input at the time when an extra clock occurs. In this case the missile logic updates early, and removes the pixel from the latch input. Then when the pixel clock transitions from low to high, a logic low is latched at its output. The resulting effect is that no pixel is produced from the latch. So there it is. The star field pattern is the result of two different clock timings beating against each other. I've included my simulation data in a zip file for anyone interested. You will need GTKWave to view it. http://gtkwave.sourceforge.net starfield_sim.zip
  17. I had thought that maybe going the aluminum route would be cheaper, but a mechanical engineer friend of mine told me that it would probably be more expensive than plastic. In any case, I sure would like to find a solution that would allow me to produce an enclosure for a cost of under $20.
  18. I have a couple of spare fully assembled boards, but no blanks. I had the PCB fab house assemble the boards after fabrication since I didn't really want to have to solder on the one hundred plus parts myself. Besides, I don't have the equipment to solder on the two BGA parts. I've read about how to use a toaster oven to solder BGAs, but have never been inclined to try it. I just finished adding a post to my blog that covers some of the technical aspects of my design. Take a look at it. http://thehippiecampus.com/blog/
  19. I think I know why. In my haste I only included writes to COLUP0 AND COLUP1 as triggers for bus stuffing. Are you bus stuffing COLUBK as well?
  20. I changed my cartridge port MUXing so that it does the correct thing for bus stuffing, and this is what I see when I run the demo.
  21. I do have a Harmony cart. In fact, I have three now. My current FPGA code will not handle bus stuffing. It's a relatively simple addition though. I'll make the changes and try out the demos probably either this weekend or next. I'll let you know how it goes.
  22. Definitely. We'll have to do it again next year. In the meantime I can't wait until November when Al puts your games up for sale in the store.
  23. Unfortunately the manufacturing cost for the plastic enclosure is really what determines the price. I've been toying with the idea of designing an FPGA based drop-in replacement for the Atari 2600 circuit board. Something like that could probably be sold for about $150 or so, and it retains the Atari 2600 styling since you are using the original 2600 enclosure. I started this project not to sell a product and make money, but rather to create something that I wanted. I think that the Walkman motif works very well. It's compact, and it's something I can set on my coffee table, and play Atari 2600 games on my big screen TV from the comfort of my couch. Plus I think that the "PLAY" button is a cute pun. The connector placement is not an issue since the game console is right in front of me. The only long cable I need is the HDMI cable that connects to the TV. To answer your other questions, I used some styling cues from my Sony Walkman Pro, and that's where the tiny switches came from. The PLAY and STOP buttons simply turn the power on and off. I had thought about adding an eject function to the STOP button, but that made the mechanical design a bit too complex. And finally, I still want to add some kind of logo, and some lettering around the buttons that indicate their functions. I'm leaning towards dry transfer, but I'm not sure how well that will work on 3D printed material. It's on my to do list. The 2600 core timing is always running at the exact rate of the original Atari hardware. I'm using a triple buffer to frame sync the 2600 video timing to the output video timing, and the horinzontal scaling is always an integer multiple of the 2600 horizontal resolution. The vertical scaling, on the other hand, uses what's called a polyphase interpolating filter to scale from 208 or 224 lines, user selectable, up to the output resolution which can be 480, 600, 720, 768, 1080, or 1200 lines. One day I might experiment with trying to V lock the output timing to the 2600 timing, but for right now the delay through the triple buffer doesn't bother me.
  24. Thanks! I haven't tested the lag with a monitor, but my simulations show that my framebuffer delay varies between a minimum of 5 scanlines and a maximum of 1 frame depending on how the input timing is beating against the output timing. So, with a good monitor the total lag should be less than two frames.
  25. Here's a brochure I made for the AtariAge display at PRGE. It lists the button and switch functions. MMDC Retro Player (cropped).pdf
  • Create New...