Jump to content

speccery

Members
  • Posts

    920
  • Joined

  • Last visited

Everything posted by speccery

  1. That is great progress! I haven't had enough time to advance my own project, even if I have made some progress. I hopefully get back on track in the coming days. For PSRAM chips, I have been using these chips which are readily available from mouser. I have purchased most of the components for my projects from them.
  2. Congratulations! Must have been a great feeling to get this far after all the hard work. I have been out of network reach for a while. I ordered new PCBs before going out of town for a few days, those should be arriving tomorrow. I'm looking forward to see how those turned out. I know I ordered them in a haste without testing everything with the still brand new previous boards, but since I am on vacation I wanted to try to maximise the opportunities that I might have some spare time working on this project. Anyway, great to see that you have gotten this far!
  3. Thanks for the link and comments! I think I will continue my software PSRAM adventure a bit more. I will pretty likely be out of town for a few days, so advances in the project have to wait. But I did make an updated PCB design already. I am tempted to send it to manufacturing, so that I could perhaps get it already next week. Followed your "tip" and changed the 74VLC1G125 driver to a MOSFET to simplify. I was planning to use the BS170 MOSFET I have used earlier years ago, but then I realised it is almost the same which you used, the 2N7000, so went with that instead
  4. Jumped into surgery and some code optimisation: I changed the wiring of the PSRAM chip to reduce amount of bit mangling needed in the code. I first removed the chip, bended some legs and added capton tape to isolate the pins I was going to wire: Then it was time resolder the chip: And then as the final step add the wiring better for QPI mode: Positive result, this worked just fine. Performance of the code improved by 60%, and is now close to 6 megs per second. Still not sure if this is enough to handle the reading of the first first byte in real time. To test that, the PSRAM handling code needs to be interleaved with the code handling the cartridge bus. It will be a mess and almost certainly not fast enough, currently reading a single byte as a standalone operation takes almost one microsecond. That's due to the address setup needed, but also the function call overhead in my benchmark will have an impact. I think it should be faster when the code is embedded into the real thing.
  5. I got today the board working as an extended basic cartridge. Rather embarrassingly I wasted at least an hour wondering why the heck I could not get neither address or data reads working properly - as I worked my way eliminating various possible reasons, I realised I had forgotten to solder a good part of the pins of the Raspberry Pi Pico. Let's not tell anyone that, such a stupid mistake. It does explain why data pins were not really working... Once through with that, I started to play around with PSRAM. I have wired it to the SPI bus, chained to the micro-SD card slot. I also have connected it in a way that QPI mode (quad bit, i.e. four bit wide) can be used. I have been curious to see how this works in practice, so instead of using hardware SPI I decided to write a simple bit bang implementation SPI and then extended it to also work in QPI mode. As background, the PSRAM stands for pseudo dynamic RAM. It is a dynamic RAM chip, but in self refresh mode, so it looks like a static RAM. The benefits are low cost and high density. It comes with some restrictions - you cannot leave chip select active for extended periods of time, as that prevents the self refresh from working. PSRAMs come in many different packages, but this 8 pin package is very attractive for these hobby projects. The package is small and easy to solder in my opinion. I wrote routines with bit banged SPI to read chip ID, read data block, write data block and change mode to QPI. I also wrote a routine to read in QPI mode and to transfer back from QPI mode to SPI. I got all of the aforementioned working. QPI mode works similarly to SPI mode: you issue a command such as read (8 bit command code), followed by a 24 bit address. These are sent 4 bits at a time. Once done, the direction of the 4-bit wide data bus is changed, and four more clocks are issued to give the PSRAM the time to read its memory cells. After this bytes are read 4 bits at a time, so two clocks per byte. The maximum clock rate is 66 MHz (or using fast read command even 133 MHz). I am still very far away from that, after sending the command and address I am currently able to read a byte in about 224 nanoseconds. So the current data rate is just over four megabytes per second. This is much faster than the TI's cartridge bus, but the issue is the latency: in order to read a single byte, we pull chip select low, issue eight cycles delivering four bits at a time (8 bit command + 24 bit address), then another four dummy clocks for the memory to do its thing, and then start to receive data. For the first data byte this means another two clocks. So in total it's 14 clock cycles. If more bytes are needed we can just keep clocking and the chip provides data bytes form the following addresses (there are some constraints such as page boundaries, but otherwise it's a bit like reading GROM or the VDP). If I was able to run this at something like 66MHz this would go very fast, but currently as I mentioned I get a byte in about 220 nanoseconds, so the actual clock is something like 10MHz per nibble. I realise I did not choose optimal pins for this operation, it would have been better to not try to connect the PSRAM to the SPI lines at all if I am going to go with the quad mode. Oh well. With the current pins I need to do a fair amount of masking and oring the bits together to build nibbles and bytes. Anyway it's a start, I will continue to experiment how fast I can make the software QPI go with the Pi Pico. I am still running the Pico at the stock clock of 133MHz. If I am not able to make this work, I don't plan not to give up the idea of using the PSRAM chip but rather try to overclock to 250MHz, use yet another processor or small FPGA to help drive the PSRAM. It's a fun little exercise, hopefully going fast enough at the end.
  6. I assembled one of the new PCBs. I haven't had enough time to do meaningful testing yet. I did port the firmware to accommodate some of the pinout differences. But I just had to see if I can get the neopixels running - and I did. So these are the two multicolour LEDs just above the micro SD connector. The board has not been plugged in to the TI yet. I am powering the logic with the two wires on the left, bringing 5V from a lab power supply to the board. My plan is to test tomorrow this in the actual TI - finger crossed it won't blow up anything. That testing will be with the old firmware features, i.e. I will first see if this can do what the previous version did and emulate a cart with both ROM and GROM. Extended Basic would be a good test drive, as it is a banked ROM cartridge. If that works I will try to bring up the SD card support and then start to wrestle with the PSRAM. Or perhaps in the other order, first PSRAM as that is more interesting.
  7. @JasonACT I will keep you posted on the progress with the new PCBs and updated firmware goes. I have 64Mbit PSRAM chips, 8 megabytes. Unlike on the side port, there is no *READY line on the cartridge port. So whatever I end up doing, it needs be very fast My plan is to use fast QPI mode transfers, but I know it is going to be challenge to be able to do them in real time. The critical path will be the ability to respond to read requests with the first byte in time. Once in burst mode, data will move fast. On the strangecart I have used a flash ROM in SPI mode with a 48MHz clock to stream data from the ROM. This has worked well for cartridges with known access patterns such as the cool Bad Apple demo (5 megs), but random access is a challenge.
  8. New PCBs arrived from manufacturing. I haven’t had time to do anything with them except just look, and they don’t look bad on the first glance. Looking forward to having the time and energy to build a couple and adapt the firmware.
  9. True, it's something I realised not too long ago. It makes microcontroller implementation easy, since by default the code can keep GREADY low. This will automatically stall the processor when it accesses a GROM, and thus gives the code as much time as needed to work on. Or more to the point, the code can prioritise normal memory cycles, and less often monitor GROM accesses. This is how it works on the StrangeCart. Even with this kind of prioritisation at least in my case the MCU will be ready much, much faster than ordinary GROM chips.
  10. This is useful information for the bus control stuff. I have implemented GROM support already many times (with FPGA, several different microcontrollers) but I don't think I have encountered the issue you mention. Perhaps I have been lucky. I wonder if this is some kind of glitching or something. In your circuit you mention you are relying the pico to tolerate 4.9V. Is the databus buffered or directly connected to the pico? I don't think this is the problem, I'm just curious
  11. @JasonACT thanks for your comments and sharing your findings! In my StrangeCart project I came across similar issues. The LPC54114 chip I use in that project doesn't have a cache. It doesn't require it as badly as the RP2040, since it's using on-chip Flash memory over a 128-bit bus. But similarly to what you have experienced, when both CPU cores are accessing the same memory block, one of the cores has to wait. In the case of the LPC54114 it has four SRAM memory blocks, two 64k blocks and two 32k blocks, plus the flash. There's a cool on-chip crossbar switch which enables concurrent access to different memory blocks. I have structured the firmware so that the CM0+ core is always running from its own SRAM block, while the CM4 core is free to roam around. This way the CM0+ performance is consistent. It does occasionally access other memory blocks, but typically only to fetch ROM/GROM data. These accesses occur rarely compared to handling the TI-99/4A bus, so in practice the timing is predictable.
  12. @HOME AUTOMATIONgood one! I noticed the Kicad has another rendering mode. I also tested component sizes on printouts of copper top & bottom, and realised the footprints I chose were too wide.
  13. Just completed the revision of the Picocart, this is version 1.1. It fixes the bugs I had in my mind, and adds a micro SD card. I also added a PSRAM chip (underside) to be able to support large cartridges (assuming the software is able to run fast enough - not sure). It will be interesting to see how this turns out, and if the new components will fit...
  14. To clarify my understanding - you are on purpose invoking a LOAD between TB and JEQ, is that it? To override the read of the ALPHA LOCK with your own data? Nice one!
  15. Thanks for the feedback. I've not used the Arduino distribution, I went with the Raspberry Pico SDK directly. I didn't realise you're already running at 250MHz, I'm currently still at 133MHz for ROM and GROM services.
  16. You have a great project going on for sure, congrats! Are you using both cores of the Pico? In my code I have one core serving the TI bus and doing nothing else, while the other core is then free to do whatever. I guess running from RAM should run as fast as from the cache, although I haven't studied this part yet. The Pico apparently overclocks really well, I have a Pico System from Pimoroni and that runs at 250 MHz stock, their firmware always keeps it overclocked. This would nearly double the performance. I wrote a TI-99/4A emulator for the Pico System using their libraries, so that is running at 250MHz. I have not played with clock settings myself though.
  17. I've worked a little on optimising Basic execution on the StrangeCart, and I've been thinking about token formats for Basic. This message is going to be a bit technical, hopefully it makes sense. The tokenizer in the TI BASIC is a bit weird. Consider this line: 10 ABC=123 The TI BASIC tokenizer - and my tokenizer by default - create this, a screenshot from js99er.net VDP memory: There is the line number table at >37C9, which has a single entry: Line number >000A (10 decimal) and the pointer to the line, >37CE. In there we have (everything in hex): 37CE: 41 42 43 (ABC) 37D1: BE (token for assignment = ) 37D2: C8 03 31 32 33 (Unquoted string, length 3, contents 123) Thus at >37CE we find the the string ABC, in ASCII. Before it, the byte at >37CD is >0A which is the length of the tokenized line. The pointer in line number table never points to the length byte, it points to the first actual character. Anyway, the thing is that the variable name ABC is presented just like that, ABC, while the number 123 is tokenized as unquoted string, which conveniently includes the length byte. As I've been focused on performance, the small issue with ABC being stored just like that is that since there is no string length, the interpreter needs to count the length every time so that it can search the symbol table with that length. On the other hand, the constant 123 is stored as a string with length. From a performance point of view, the interpreter could run faster if the variable name length was precomputed, i.e. if it was stored as an unquoted string. I already implemented this as an optional feature, and it does improve performance if the variable names are a bit longer. For the constant 123, it would be better if the numeric constants would be stored with their own token, and then stored in binary format not requiring any run time conversions. For example if there was a token for 16-bit integers, 123 could be encoded with that token followed by two bytes. This could then be interpreted in fixed time, very fast, without all the checks normally needed when converting from ASCII to a binary number. In a simple scenario all numbers could be handled with two tokens: a token for 16 bit numbers, and another token for floating point format constants to handle all non-integers and numbers not fitting in to 16 bits. For variable references, it's time consuming and complex to have to search for symbol table all the time. I'm thinking about creating a new token for variable references, let's call it VAR, and have a separate table which would contain the name, and a runtime pointer to the variable entry in the runtime symbol table. That would mean that the name "ABC" would be copied into a variable name table, let's say as entry 0 since it's the first variable in the program. In the tokenized program line there would be the token VAR, followed by an 8-bit index into the variable name table. This way all references to ABC would become two bytes VAR >00, and the program size would become smaller if ABC was used a lot (ABC uses 3 bytes, VAR+index two bytes). The variable name table would need to contain the length of 3, the string ABC, and a pointer to the runtime symbol table. In this setup the variable name table would become an integral part of a Basic program, as important as the tokenized lines. However, it would be possible to convert it back to normal TI Basic format for saving. Also listing would be simple, When a program is run, the symbol table is cleared at start. [The symbol table in the StrangeCart Basic contains all variable values, their dimensions if they are arrays, their type (floating point or string) etc.] With this new token format the pointers in the variable name table would also need to be set to zero on start. As VAR 0 is encountered for the first time, the variable would be created in the runtime symbol table normally, the same way variables are created as they are encountered during interpretation. Once that's done, the address of the variable in the symbol table would be stored into this new variable name table. The net result would be that variables would never have to be searched, instead they could be directly referenced with the pointers in the variable name table. Sorry if this was a bit confusing, there are quite a few tables involved, but the benefit of this type of arrangement is that all variable references could be done in fixed time, regardless of program size. The program could still be listed normally. When saving, a simple conversion would have to be done to get back to TI BASIC format. The interpreter would not need to worry about variable names during runtime. If you got this far you might wonder why not store the addresses of variables directly into the token stream. This could be done, but it would expand the size of the tokenized code quite a lot. It also might cause complications when editing the code - removing lines or adding new ones. The other observation one might have is that what happens if a program has more than 256 variable names, since that's the maximum that a single byte after VAR token could reference. I think it rarely happens - if ever with TI Basic programs. This could be mitigated for example so that there would be an escape into two bytes after VAR token. A simple way would be to store indices 0-127 as a single byte. Having the most significant bit set would mean there would be another index byte, thus creating 15-bit index values.
  18. Nice work! I am curious, did you use my firmware or you rolled your own? Also interested in the source you used for the expansion port edge connector I have been thinking about creating a board similar to the ET-PEB I did earlier, but using the Raspberry Pi Pico instead of the LPC1347 and that would enable creating it without the CPLD, making a simple board, pretty much like you did. Have you made the schematics available?
  19. Thanks for the comments. 1. I am very much aware that cartridge ROM is slower than 16-bit ROM OS memory. But the conversion to machine code results in such a performance boost, that the 8-bit accesses won't be a problem performance wise. Very true that it results in a big increase of memory footprint. 3. I think the GPL interpreter can be be optimised, but like you say, it would be very easy to screw it up. I have first hand experience of this: when I built the FPGA version of the TMS9900 which had extra instructions to speed up GPL execution, I did modify the OS ROM to support the new instructions. I did get it to work, but I only touched on a very small section of the GPL interpreter. I very much like compilers and they do an amazing job at compiling very expressive modern languages into very efficient machine code. I have in the past written a lot of heavily optimised code professionally over many decades. I oftentimes had to resort to assembler programming to get the performance (or actually cost effectiveness) where it needed to be. I would often develop the algorithm in C or C++ using the great debuggers available for them, then compile the code to assembler to get the boilerplate in place, and finally hand optimise key sections of the code. Hand written code was for a long time the only way to get good access to SIMD instruction sets in modern processors. I wrote code for some TI DSPs, but mostly for Intel MMX, SSE, SSE2, SSE3 and SSSE3 instruction sets. The performance gains were phenomenal, certainly the difference between a proof of concept and actually commercially viable product. I found it very interesting to see how compilers generated code and what the compiled code looked like, it is also a very useful skill to have when debugging crash dumps, where you often just have a binary blob and the address of the instruction which blew things up.
  20. At least I don't see it that way. When we think about speeding up GPL in the broad sense, it appears there would be three ways to go: Convert GPL to TMS9900 machine code, which is what you have been doing. The beauty of this is that all you would need in addition to the bare computer is a ROM/GROM cartridge (I guess ROM only if everything was converted from GPL to assembly). Also, this approach is "era correct" since you're using the original CPU etc. aside from potentially using higher density memory chips RXB could have existed back in the day with the same good performance. Using a coprocessor, which is what the StrangeCart effectively is. From a software perspective that replaces the problem of execution speed with a software integration issue as I wrote above. Definitely not an "era correct" approach since these chips - if they existed back in the day - would have been in the supercomputer territory in 1979. Enhance the GPL execution speed without using a coprocessor. One "simple" way of doing it would be to replace the system ROMs with a higher performance implementation of the GPL interpreter - of course not in practice simple since system ROM replacement is not a plug-and-play job but requires a pretty involved soldering job; and the development of such an optimised GPL interpreter which almost certainly would need much more ROM memory than the 8K we have. As a variation of 1 and 3, have you considered creating a compiler from GPL to TMS9900 machine code? I have been sometimes thinking if this would be doable. I wonder if something like that has already been created, perhaps even back in the day?
  21. Thanks for the comments, good questions. This reply became rather long... From a motivational perspective, I have never written a language interpreter before, so I found it interesting to do an interpreter which would be compatible with TI BASIC. Well I have written Forth implementations, but they are not BASIC so perhaps they don't count. Writing a GPL interpreter would be far simpler than this Basic project, but the challenge would be how to integrate it with the rest of the TI software. You know, I really have to restrain myself from going crazy with Basic extensions. So far I have only implemented one extra command called VARS - it will show all currently declared variables, their dimensions and values for scalars - so primarily useful for debugging both the interpreter and BASIC programs. I have also declared two new tokens for functions: ADDR to find the address of a variable and FRE to return the amount of free memory in various areas; but I haven't yet implemented these two. Anyway with the basis I have now in place it should not be hard to incorporate Extended Basic functionality, but I will probably still first work on TI Basic to try out some things I am interested in: improved debugging support and programming support in general, token formats, JIT compilation to intermediate format and perhaps compilation from TI Basic to TMS9900 machine code directly on the StrangeCart. I am also thinking if it would make sense to have a built in TMS9900 assembler available, similar to how the BBC Basic works for the 6502 and Z80. This would make it possible to mix Basic and assembly in the same source file. I have allowed myself to spend some time to make a few simple optimisations, which resulted in 20% speed increase in Noel's BASIC program (I don't remember his handle here). The benchmark which has in total 10000 iterations and the inner loop has one assignment now runs in about 0.166 seconds on the StrangeCart. According to the chart I have this takes 77 seconds on the TI-99/4A, I don't remember anymore if this is with extended Basic or regular Basic. Anyway the interpreted Basic performance with floating point variables is pretty much on par with the speed of machine code instructions on the TMS9900. The current interpreted performance is limited by the token format TI used, which is really stupid from a performance point of view. A lot of time is spent just finding out how many characters each variable name is. I may replace this with my own tokens internally used by the Basic, so that variable references would be fixed length and directly lead to the symbol table entries. If I do that I still want to be able to load and save normal TI Basic programs. Coming back to the question on GPL vs TI Basic, a GPL interpreter running on the StrangeCart would certainly be massively faster than what the ROM implements, but it would be a different kind of project, the challenge would be how to integrate that processing with the TI-99/4A software infrastructure so that it's compatible with existing GPL code. I have studied speeding up the GPL interpretation in my icy99 FPGA project. In that project I added a couple of new instructions to my TMS9900 compatible CPU core to handle GPL address mode decoding in hardware. That was interesting but added a lot of complexity to the CPU core; it did condense many TMS9900 instructions into one and immediately boosted GPL performance. A final point is that I am able to compile and run my Basic interpreter on multiple platforms: I develop and debug it as a macOS command line application, but I have also used it on a Raspberry Pi, and I also ported it to STM32G431 microcontroller in addition to the StrangeCart. But the usefulness on other platforms that the StrangeCart in a TI-99/4A is limited, since most existing I/O and UI operations (like CALL HCHAR) only make sense in a TI. But adding some extensions would make it possible to use this Basic in microcontroller projects.
  22. This is a very nice idea, and something the StrangeCart could "easily" do. I will keep that in mind. It would be cool to have both the ability to execute code from GROM cartridges as well as save programs to GROM cartridges. This should mean that GROM spaces 3-7 could be used for BASIC, and if the whole 8k GROM size was used for each GROM slot, actually the TI BASIC programs could be larger than TI Extended BASIC programs... 40K of BASIC code and the entire VDP memory for variables. @pixelpedant could be interesting for your amazing TI BASIC projects - of course then we are no longer talking about something running on the bare console, but at least it would be still compatible with the console BASIC and a GROM only cartridge.
  23. Thanks @HOME AUTOMATION that's great. Looking forward to studying this with a bit more time, but I checked this quickly by loading it in js99er.net, launching it and looking up a few things in TI Intern book. When launched, the line number table pointers at @>8330 and @>8332 are set to C03C and D24B respectively. Looking briefly at those addresses in the file (hexdump -C file | less) and noting the >6000 offset as this GROM stuff is loaded at address >6000 in GROM space, there are valid looking line number table entries in there. The highest numbered lines there are 26140, 26040, 26030, with respective line pointers to >8830, >c038, >8163. These address clearly are not in VDP space but would be valid GROM pointers. I looked at the tokens on those lines as well, last line was a RETURN which makes sense, the 2nd to last an assignment which also makes sense.
  24. Thanks @Ksarul I remember this being mentioned by someone (you?) in some call or at least in the forums. Is there an example cartridge somewhere I could take a look at? I would be interested in taking a look. Would be a great match for this project - and a nice way to compare performance as the StrangeCart can both emulate GROMs allowing in cartridge mode and- if I can extract the BASIC program - I could run it using it's BASIC interpreter.
  25. I have continued to work on the StrangeCart BASIC over the past several weeks. I haven't recently had the time work on the hobby for extended amounts of time in one go, with these constraints working on software seems to be somewhat easier than hardware (I have several things to do there such as new picocart PCB layout). Anyway I realised that what I was missing was the thing BASIC is known for - immediacy i.e. command mode. Up til now what you could have been doing with the StrangeCart is to load an existing TI BASIC program and save it to the memory of the StrangeCart, and use its CALL RUN to have run the BASIC program very fast. In other words, BASIC program editing would be done with regular TI BASIC; access to StrangeCart has been a bit indirect, as save to the flash memory of the board has the side effect of making the program available for the StrangeCart BASIC interpreter. Earlier, the firmware has thus treated the BASIC program as a ROM: even though my interpreter can run the BASIC program, it could not modify it. I am now in the process of adding a tokeniser and building blocks for editing the loaded program. The revised BASIC has already now a command interpreter, which can be entered and it runs entirely on the StrangeCart. Within the interpreter there are bunch of standard BASIC features already working: RUN command enables running the program (nothing new) LIST command enables listing the program (supports now ranges of lines) OLD enables loading a BASIC program from the SCD1 drive BYE exits the command interpreter Those commands were already there in various forms, but now: Entering NEW clears the BASIC program and variables (yes, progress comes in small steps) Entering a line number alone deletes the line from memory Entering a line number and code after it tokenises the code and inserts the code line into the right place of the program. These sound trivial additions, but the size of the BASIC and other parts of the firmware expanded by almost 1000 lines of C++ to support these features. Tokenising TI BASIC lines takes a bunch of code, and the memory management of altering BASIC programs in memory is also a bit tedious: I am maintaining in the StrangeCart's memory the BASIC program in normal TI BASIC save format: there is a 8 byte header, followed by the line number table (backwards as in the VDP memory) followed by the program code (also backwards, highest numbered line of code in the lowest memory address). The only change I implemented is that while in VDP memory the BASIC program size is restricted to 16K (in practice less) and is loaded near the top of the 16K space, now after loading a I modify the 16-bit line pointers so that they are zero based relative to the memory address where the BASIC program is stored. This is reversible - it simply means subtracting a fixed memory offset from every line number table address entry. That can be added back, and if subsequently saved the program can be loaded back to normal TI BASIC (I haven't tested it yet but I am sure I can make it work since the format is the same). But the plus side is that the tokenised program size can extend up to 64K. The compatibility at the moment with the use of 16-bit line offsets means that 64K is the limit for tokenised code - data areas are on top of that. When fully completed and the onboard command mode is finalised, these changes mean that I can support different video modes with ease. My intention is to do this so that when one enters the StrangeCart BASIC command interpreter, the 16k VDP memory contents are copied to the StrangeCart's memory for backup, and then I will be free to mess around with VDP memory to support any video mode - including F18A 80 column mode - for TI BASIC programs. When returning from the StrangeCart BASIC back to TI BASIC I intend to restore the state of the VDP. Even with these limited features I have had a lot of fun playing with the command interpreter while working on the code. The command mode makes it self contained. In fact I am testing the interpreter without even plugging the board into the TI, since the command interpreter also supports operation over serial line.
×
×
  • Create New...