Jump to content

speccery

Members
  • Posts

    920
  • Joined

  • Last visited

Everything posted by speccery

  1. I did a small hardware modification to the picocart. This enables reading of address line 0 of the cartridge port at any time trough one of the I/O pins. That in turn greatly simplifies handling of read cycles. The result is that Extended Basic works and I was even able to run Megademo a few times, although that latter one is not supposed to work. More about that later. Above can be seen the modification. A0 is routed to one of the unused buffers in one of the LVC245 chips. This chip buffers also the other control signals, the output enable is always on and direction is hardwired from the TI to the pico. The buffer signal, at 3.3V level, is connected to GPIO28, previously unused pin. The cartridge port bus has a nasty behaviour pattern: I added labels above to explain the four signals seen on the oscilloscope screen capture. Write handler - a debug output driven by software on the pico. High when the pico is handling a write signal. Read handler - a debug output driven by software on the pico. High when the pico is driving the first byte of a 16-bit read on the bus. #CS - this is the chip select for cartridge port, buffered to 3.3V level. #WE - this is write signal for the cartridge port, buffered to 3.3V level. In this screen capture I have a software version before the modification running. In that version of the software choice of which byte to present to the TI was based on a software delay loop. It will first present the odd byte (LSB=1) and then after a while the even byte (LSB=0). This worked fine for non-banked ROM only cartridges: Invaders, Parsec. But for Extended Basic and many other banked cartridges also writes are done to select the cartridge bank. In this capture we're looking at reads and writes in the cartridge space whenever #CS is low. By now you are probably wondering what is the nasty behaviour I mentioned. Well, if you look at the capture, the chip select signal #CS does not go high just after the write cycle! The write cycle is when we see #WE going low twice in a rapid succession. Instead of temporarily going high, #CS remains low as we proceed from the from the write cycle to the following read cycle! As also can be seen from the capture, this does not seem to be true for consecutive read cycles before the write cycle. With those #CS does go high between the read cycles. Thus for read cycle sequences a simple logic of detecting chip select signal going low is enough to detect the beginning for a read cycle. Remember that with the picocart the address low byte, address high byte and data are all multiplexed. When the picocart is driving data for the TI to read, it cannot monitor changes of the address bus, since that would mean temporarily stopping to drive the data byte to the TI while it would be checking what's going on with the address bus. To describe the behaviour in a bit more detail, it's probably so that if there are only consecutive reads from the cartridge space, #CS would stay low from one read cycle to the next. But in practice nearly all instructions need to touch RAM (mostly scratchpad) between reads, so #CS does go high while non-cartridge areas are being accessed. So all of this is a very long way of saying that we need to be able to detect changes on the address bus to know when we go from one read cycle to the next, or as in the case of this screenshot, we go from a write cycle to a read cycle. The simplest way of doing this is to be able to monitor the LSB of the address bus, as that is high in the beginning of every memory cycle and goes low halfway through a memory cycle. The LSB of the address bus (byte select) is created by the infamous 16-to-8 bit multiplexer on the TI-99/4A, it's not a signal the CPU offers. So I patched that signal, and now the Pico has a better idea what is going on during bus cycles. I was surprised to see that megademo worked at all since it cannot be served from the on-chip SRAM. In order to stabilise timing, I copy the cartridge ROM contents to a 32K SRAM buffer in the pico's on-chip SRAM. But this cannot be done with the megademo, since the pico has "only" just over 256k of SRAM and megademo is bigger than that. So for a cartridge of this size I instead keep it on the off-chip flash on the pico board. But this off-chip flash is a QPI flash chip, meaning it is connected via a 4-bit bus to the pico's CPU. In order to read something from this flash chip, multiple 4-bit bus cycles are needed. Several cycles are first needed to send the address (24 bits) to the flash chip, before the data can be returned, I assume in 32-bit chunks. Flash accesses are cached by a hardware cache on the RP2040 chip and the 4-bit bus runs at a very high frequency, I haven't checked but it might be something like 66MHz or 133MHz. But despite these great hardware features, there is a long latency of reading a new piece of information from the flash chip. I indirectly measured this in the beginning of working with the picocart, and it appears we might be talking something like 500 ns access latency for data which is not in the cache. If the latency is this high, it is significant enough to not be able to serve the TI's bus fast enough. To make understanding of the timing even more complex, both CPU cores can access the off chip flash. If the other core has started to read something from the external flash just before we need another byte for the megademo, the core serving the TI's bus will probably have to wait for two cache line fills from the off-chip flash: one for the first core to satisfy whatever its doing, and then read the cache line containing the data we want. Well this became quite a long story, hopefully it was of interest to the hardware design inclined people here
  2. Unfortunately not, a Rigol DS1054Z, purely digital beast. Affordable.
  3. Sorry to advertise my own project - but this is something which probably can be fixed with a TI-GROMmy board.
  4. Built some more GROMmy 2 boards. Half of them have been tested in a console, but all seem to work. I have a simple test bench to validate that power regulation is working, i.e. producing 3.3V. I do this test before soldering the processor chips. After soldering the processors, without firmware (i.e. running on-chip boot ROM), these three consumed about 11mA in total. After soldering the LED and programming the actual firmware, the power consumption seems to be 7.6mA when the LED is lit for a single GROMmy2. The firmware is running at the maximum speed of the chip, 64MHz. I don't know how this compares power consumption wise to an actual GROM chip - the GROMmy2 replaces three of them with the current firmware, it could easily facilitate 64K of GROM space, but that's not really what the board is for.
  5. I created brief instructions how to reprogram the flash of the TI-GROMmy2 boards. I don't have a new firmware available yet, but since I will send out several boards in the coming days I thought it would be useful to have these instructions available. TI-GROMmy flashing.pdf
  6. Thanks @Artoj interesting to hear what people are using. I used to use eagle but have recently moved over to Kicad. I am still very much in the learning curve, but I have already created 6 PCB designs with it. Every single one has worked. Still kicking myself for not making the move earlier. But as you know, there's always a learning curve and each one of these programs has its own kinks. The nice thing about Kicad is that it's now so widely used that there's a lot of information available, as well as plenty of reference designs.
  7. Looks nice! What PCB design software do you use?
  8. I had a rather embarrassing bug - a left over from an earlier test which restricted GROM space to 8k... After removing that Parsec immediately started to work properly. I also tested Atari's Defender, which is a 8K ROM cartridge (no GROM). I really enjoyed playing Defender after a while So while I was trying to find a problem from the new ROM handling code, the bug actually was in the GROM handling code... Typical. Of course, one thing normally leads to another: in order to test defender, I needed to have a joystick. Defender does not support keyboard play. I wanted to continue to use my setup where I have microcontroller connected to the keyboard connector, and the actual keyboard data is coming from my Mac. Thus I needed to extend both the microcontroller software somewhat and the Mac application for key forwarding to support joystick as well. Same way as @Tursi did with Classic99: tab key is the fire button, and the arrow keys are the directional buttons. I then proceeded to testing with extended basic, but didn't get that to work yet. Extended basic requires ROM paging. It's simple conceptually, but requires support for writes to ROM area. This is something I have done many times over with StrangeCart and other projects. I have some kind of a timing issue. I am trying to avoid going to the basement storage locker to dig up my logic analyser, and resorted the various ad hoc tests, which did not work yet. At least my four channel scope is readily available, need to continue with that. The root issue probably is that the ROM handling loop becomes a bit more complex when it has to differentiate between reads and writes. The conditionals to test for writes probably delay the handling of reads just enough to skew things off. Still, I am happy about this progress - at least I have simultaneous GROM space with multiple GROM "chips" and ROM space reads working for non-banked cartridges.
  9. I think the ROM emulation will be fast enough. After all, the Pico runs at 133MHz which is faster than the 96MHz initial clock speed of the StrangeCart. The strangest thing happened with the strangecart (pun intended): when I initially started to play around with the LPC54114 chip, its max clock speed was 100MHz. Then I noticed something remarkable - the maximum clock speed of the chip had been updated 150 MHz. Indeed when comparing the data sheets, on older data sheets they say it goes at 100 MHz, and newer talk about 150 MHz. Also the official APIs with a new SDK changed to support 150 MHz clock speed, and indeed it runs at that speed just fine. The RP2040 also overclocks extremely well, people routinely run it at 250 MHz. There are YouTube videos showing it running much faster than that as well. So if it comes to it, the clock speed can be just increased. I have already noticed with 133 MHz that when the '245 buffer is enabled, I have to insert NOP instructions before reading the pins for the data to be stable. In other words, this clock speed already is in the same class as the propagation delays of the buffer chips! Comparing the StrangeCart and Picocart - once I get this working - the biggest difference will probably be the type of the second processor core. One of the cores (in both cases a M0 core) will be completely busy serving the bus (and handling the multiplexing in the case of Pico). The "free" core with the Pico is another M0 core, while on the strangecart it is a M4F core. Not only is this core faster on per clock basis, but it also has a hardware floating point unit and other enhancements. My understanding is that the M4F can do a single cycle floating point multiply accumulate, putting the peak performance at 150MHz to 300 MFLOPS. Of course in practice you rarely have code which can optimally use the FPU. Compared to the M0, the M4F is more than an order of magnitude faster when it comes to floating point operations. Yes, the same LI instruction streaming approach will work If you want to participate in the bring up / software development, I can send you a board, just PM me.
  10. You don't need more than one core to handle GROMs, regardless of the amount. The limiting factor is memory space. In fact due to the existence of the GREADY signal in the cartridge bus, what happens is that the TI-99/4A effectively stops when accessing the GROM. The pico can then on its leisure respond to the bus access, and pulse GREADY suitably (in a time critical section of the code). In practice this happens a lot faster than what a normal GROM could do. The same is not true for cartridge ROM access, they cannot be delayed. That's what actually keeps one of the cores busy. It's also something I didn't write in the previous message - I did modify the code already so that it runs of both cores. One core is busy serving ROM and GROM accesses, while the other at the moment is not doing much at all.
  11. Ran out of time for today, but I have now been able to play TI Invaders with the picocart. So the ROM emulation works there, at least enough for this one game. The time critical code for ROM emulation must be run from RAM, something I saw today. It's not a surprise though, the same is done with the StrangeCart. I've also tested with Parsec, but I get some weird errors with the landscape graphics (the landscape sometimes is flat). It's hardly surprising, since there are so many timing details which might be off. This might relate to the fact that Parsec has more than one GROM, perhaps something is off there. Or someplace else...
  12. It works! At least to the extent that it emulates GROMs properly. I received the PCBs earlier this week and assembled one of these prototypes. In terms of prototyping process for my latest ideas, I have realised it is faster to just sketch an initial design and get a board produced instead of creating manually wired designs like I used to do. The logic is that this way I get not only prototype boards, but also a schematic to go with it . And once there is a board, making revisions is fast - and this one was no exception. I did not use copper pours here, since cutting traces is easier if there are only traces rather than large chunks of copper. Having said that, nothing needed cutting so far. I adapted my TI-GROMmy2 source files, and got GROM functionality working today on the picocart. Pictures below, and story continues after those. The biggest difference here when comparing to my GROMmy boards is that since the RP2040 chip is not 5V tolerant, level conversion between 3.3V and 5V is required. The 74LVC245 chips take care of that. As anyone playing around '245 would know, these chips have 8 bidirectional buffers which share two control lines: DIR direction (port A to B, or port B to A) and #OE output enable signal. These need to be handled by software; also since the pico doesn't have enough I/O pins to handle all of the necessary signals, I created a shared 8 bit bus which is multiplexed by software. This bus can 1) read low 8 bits of address bus, 2) read high 8 bits of address bus (only 5 available from the edge connector, rest brought to a header), 3) read data bus or 4) write data bus. The output enable signals need to be handled in a proper way for this to work. One of the '245 chips just buffers and handles level conversion of #WE, #GS, #ROMS and DBIN signals. They buffered signals directly connect to four GPIO lines of the RP2040. For this buffer, the direction is fixed and output enable always on. For the RP2040 chip these four signals are permanently inputs and can be read at any time. One '245 connects the TI's data bus to 8 GPIO pins (the board's internal data bus). This buffer needs to be controlled both for direction and enable under software control. One '245 connects the low 8 bits of the address bus to the board's internal data bus. The direction is fixed, but the enable signal is under software control. The last '245 connects the high 8 bits of the address bus to the board's internal data bus. Also here the direction is fixed, but the enable signal is under software control. Thus in total there are three output enable pins to manage, and one direction pin (for the data bus buffer). In addition to controlling the external buffers, also the RP2040 chip's GPIO direction for 8-bit bus needs to be controlled. And this needs to be done in the right order, so that there are no bus collisions between the '245 buffers, the TI's data bus, or between the buffers and the RP2040 chip. It is not hard but care is needed. As I have only so far implemented GROM support, I don't need to worry about the address bus in its entirety yet. In the next step, adding ROM support, I need to manage the address bus fully too. As an example, for GROM read cycles to the data array, the code roughly does this: See if #GS has gone low (GROM cycle). The steps below only apply if #GS has gone low. Check that #WE is high (inactive), i.e. we are dealing with a read cycle. Check the GROM address counter (maintained by software) so that it matches a GROM address the picocart is serving. Turn RP2040 port direction to inputs for the 8 bit board bus. Enable the low address bus '245 buffer Read the low address. Here we need A1 address bit (2nd to LSB) to distinguish between address counter and GROM data array reads. Disable the low address bus '245 buffer Enable data bus '245 buffer, and set its direction towards the TI's data bus. Turn RP2040 port direction to output for the 8 bit board bus. After this step any data presented in the GPIO pins by the RP2040 will be readable by the TMS9900. Set the GPIO pins appropriately for the data being read. After this increment the GROM address counter managed by software. Raise GREADY so that the GROM read cycle can end. Wait for #GS go high, i.e. end of read cycle. Very quickly deactive the '245 driving the cartridge data bus, so that it does not collide with anything else. Turn the RP2040 data bus back into input mode. Return GREADY back to low in anticipation of next GROM cycle. So as can be seen here, there are quite a few opportunities for bugs even if this is a simple sequence... But it works now.
  13. The same limits that we have with the StrangeCart would apply - a cartridge cannot access the directly VDP without asking the TMS9900 to do it, which seriously limits the bandwidth.
  14. That certainly could be an option The choice to make the board big at the prototype phase stems from the desire to use through hole buffer chips and only a 2 layer PCB. Those buffer chips are available in much smaller SMD form factors (I have them also in the TSSOP-20 packaging), but I think that will limit the number of people inclined to building these themselves, and at the prototype phase it would be nice if others can participate.
  15. Great! I'd be happy to send you a board, either bare PCB or with SMD components fitted. I need to get one up and running first though (and receive the PCBs, they're still in manufacturing) so that we know these actually work
  16. The handling of the IT's bus is likely going to 100% of the attention of one of the CPU cores in the Pico. That code also has to be native code, python code is almost certainly not going to be fast enough to handle the buffer control signals. In the similar StrangeCart the M0 core is spending all its time babysitting the TMS9900 bus, and that is a simpler design which does not require buffers nor bus multiplexing like is needed with the fewer pins of the Pico. Having said that, Python or Circuit Python could still be used, if a native library doing the bus handling is used. I don't know how CircuitPython handles multiple CPU cores, one would need to be reserved to serve cartridge bus.
  17. Updated 2022-12-30: The board works! Only GROM support tested so far. Original message: I have several Raspberry Pi Pico microcontroller boards which I find quite adorable as they are very affordable, in some respects very powerful and in other respects not too much so. Nevertheless I have been playing around with them for a while and trying various things, I built a prototype eurorack synthesiser sequencer with one for example. I've also used them with some boards I bought from Pimoroni to drive VGA screens and DVI monitors. In the last pandemic call which I participated (I guess around a week and a half ago) we also talked about them. Anyway it was high time for me to get one connected to a TI-99/4A. I know that other folks such as @arcadeshopper and @jedimatt42 have also contemplated the same idea. Perhaps someone has already done it, but if so I'm not aware of it and anyway making boards is fun when there is time for it. So I designed a prototype board and just submitted an order for them. I did not quite have enough time to really do this well, so I decided what the heck, let's see what happens with a quick design. It's a bit bulky, and will not fit into an ordinary cartridge case. This of course also means that I have no idea if this is going to work for sure, but of course I have reason to believe it will. It is not too different design-wise from the StrangeCart, except that the microcontroller is different. The RP2040 chip which powers the Raspberry Pi Pico is not 5V tolerant, so I threw in four 74LVC245 buffer chips and some other stuff. Since the board is large, I used some of the extra space to add a prototyping area, and I added breakouts for three SOT-23-5 footprints and two extra SOIC-8 footprints as well. One SOIC-8 footprint, U5, is fully connected to a SPI port and can serve as flash memory expansion or alternatively house a PSRAM (pseudo static RAM chip). As can be seen from the pictures, the buffers are through hole components, the idea being that if I get it to work and others are interested in these, it would be quite easy for people to build these. I don't like through hole resistors or capacitors, so I used 0805 sized surface mount parts for those. In my opinion they are easy to work with. The only slightly challenging part to solder for some could be the 74LVC1G125 buffer I put in to drive the GREADY signal. It can be replaced with a transistor circuit in the prototype area. To get quickly up to speed, I used FlashGROM99 board as the basis. I removed everything else except the outline, regulator and the edge connector and started from there, to have a basis for the design in Kicad. After placing the parts it became obvious it would be hard to make this fit inside a cartridge housing, so I just gave the board more space which made it easy to route the signals to the Raspberry Pi Pico. Now I just need to wait to get the boards and hope for the best during bringup...
  18. @arcadeshopper @Ksarul I'd also be interested in those prototype boards, please keep me posted when they're available. On the topic of SID chips, there are things like the ARMSID which are plug in replacement for the SID chips. I wonder if anyone has tried those on the TI board? This is perhaps something similar I did a few years ago, with the ET-PEB board capturing writes to the TMS9919 chip and converting them to MIDI information in real time, as presented in this video. Now that I look at the video date it was four years ago... I still think it is very cool concept. I have acquired many more synths since then, perhaps I should revisit myself this project for fun , to the point of @Asmusr this approach is totally compatible with existing code, and you get whatever sounds you configure your synths to do. It could even be something based on a Teensy as @Tuxon86 was thinking.
  19. Nice! Could also double as a generative game map creation algorithm
  20. Thanks for sharing this, that's interesting! And quite dense, looks like only 5 words of code space would be needed. I have been wanting to test BIND and BLSK, need to dig up my TMS99105 boards
  21. Sorry to go off the tangent, but I think I need to write this, since I deeply care about execution performance. I haven't professionally written software for the past seven or eights years or so, but the preceding 25 years I spent a lot of my career writing very time critical software. Mostly on x86 platforms, but also some on TMS320C30 and TMS320C51 DSPs. I was first in the video conferencing industry and then in the game streaming industry before moving on. In those companies the performance of the software - both execution speed and reliability - were paramount, not only for technical reasons but they were business enablers and drivers too. In the late nineties we created the world's first H.320 standard compliant software-only video conferencing solution for Windows. I wrote both H.261 standard compliant video encoders and decoders and spent a lot of time optimising them, it wasn't easy to get them to run in real-time with the PCs at the time. A lot of assembler coding using MMX technology, later on that changed to SSE. But I'm especially proud of the MPEG-4 encoder I wrote later on for game streaming. I think I spent at least a year optimising it, using SSE2 and SSSE3 (supplemental streaming extensions 3 - what an acronym) assembler code. The MPEG-4 encoder was/is very heavily multithreaded and uses dynamic programming algorithms to achieve minimum run time per frame with consistent image quality. In the process of all of this I also co-invented a very nifty motion estimation algorithm for 3D graphics, this is described in the U.S. patent is https://patents.google.com/patent/US20040095999A1/en, but that's another story. (Just now noticed that they misspelled my first name in that patent...) Over the years I have hired a lot of programmers and one of the things I've tried to teach them is to indeed care about performance and reliability. Keep it simple and make it good enough, but not too good so that we can get it done and actually ship it...
  22. My opinion is that never To be more precise, in my opinion BLWP type activities are fine if you're calling an operating system function (although a XOP would be better but requires a proper ROM vectors), or doing a context switch (perhaps for an interrupt but also for other things). Otherwise I'd prefer BL. In fact the only cases where I've used BLWP have been for debug or I/O purposes, to output a value in a register without messing any of the other registers. For this thing BLWP is very convenient. In those cases the workspace might even be in external slow RAM, since for debug purposes you typically don't care about performance. When you use BL and "manually" stack the registers you need, you create reentrant code. Which can support recursion. It doesn't have to be immediate recursion, i.e. the function calling itself, but also indirect: so you're at function A, and you call function B, which then calls back function A. Or however deep this might be. The point is, that with reentrant code you don't need to know if a routine you're calling is eventually calling the very same routine again, before returning. This topic is also closely related to calling conventions. The C language calling convention could use R10 as a stack pointer (and could use the same register for the frame pointer). In a C calling convention you'd typically pass the first four arguments to the function in R0-R3, permit the called function to destroy the values in R0-R3 but preserve other registers, and provide the return value in R0 for example. If this is standardised it becomes easy to know how a function is supposed to behave.
  23. Thanks - I forgot those in the code, my own emulator uses those for cycle counting. Of course they can just be commented out. Attached a fixed version. MULTICOLOR.bin
  24. I continued my small multicolour mode programming project - this is actually the first time I'm programming anything game like in assembler for the TI. I added joystick 1 control for the viewport position. So now you can scroll the viewport with the joystick. I also added a particle system, when fire is pressed particles are generated in the top middle of the viewport area (actually Y is fixed to the top of the framebuffer but X follows the viewport position). The particles fall down and have "random" colors and fall directions. I am using the console ROM as random number generator, I index into it with the frame counter and read bytes from there. The bytes are then massaged a little to produce meaningful downwards motion. The particle system uses simple fractional arithmetic, with a fixed point 8.8 bit format. Up to 30 particles can be in flight at the same time. The particles don't really do anything else than fall so it's not much a game yet. The source code and binary are attached here if someone is interested. I also removed the stupid test lines from the beginning of the program. For joystick reading I used code from Thierry's excellent pages, but discovered the joystick reading code was bogus, his example code tested the bits in the wrong order. When the bits are read from CRU they land in R1 in reverse order to what his code assumes. Basically the STCR instruction stores the bits read from CRU space so that they are filled from least significant bit upwards. This same example had also another problem, to read the fire button when R12 is >0024 you need to use TB -15 and not TB -11 like the example code shows. STCR R1,5 Read joystick position and fire button Here actually the most significant byte of R1 has these bits: (I'm using standard not TI bit numbers, R1.8 is bit 8 of R1 which is the least significant bit of the high byte of R1). R1.8 = Fire R1.9 = Left R1.10 = Right R1.11 = Down R1.12 = Up multicolor.asm MULTICOLOR.bin
  25. Not half bad Doing something like this on the 99/4A could require a split screen mode, if one wants to combine regular text in GM1 or 2 with multicolour mode. Don't mess with Texas demo has a scene which looks like that.
×
×
  • Create New...