Jump to content

pnr

Members
  • Posts

    159
  • Joined

  • Last visited

Everything posted by pnr

  1. Thanks for finding this. How do I learn more about this, is there a link for this document?
  2. Here's some links about Marinchip Systems from John Walker's website: General intro to Marinchip: https://www.fourmilab.ch/documents/marinchip/ Photo's of the S-100 / 9900 boards: https://www.fourmilab.ch/documents/marinchip/boards/ Marinchip at the crossroads: https://www.fourmilab.ch/autofile/e5/chapter2_110.html The 9900 based ancestor of AutoCAD: https://www.3dcadworld.com/autocads-ancestor/ Marinchip morphs into AutoCAD company: https://www.fourmilab.ch/autofile/e5/chapter2_2.html It must have been a fascinating journey. Most of the Marinchip software runs on the Powertran Cortex incl. its emulator. If we ever find the source code for Interact, it would be cool to make it run on the Powertran. If we find the binary, I think doing a FPGA system with the appropriate S-100 graphics board might be possible.
  3. I seem to recall that Al Beard's C compiler also supports overlays (on the Geneve). Al is still around, maybe he can clarify. EDIT: I looked up the source. The manual says this about TIC: "The GENEVE MDOS version of the compiler utilizes sophisticated memory swapping to gain an 85k workspace for the compiler, even though the compiler is over 50k in length. This allows compilation of fairly sophisticated C programs. The total memory required to run the TIC compiler is 144k." However, this appears to be achieved via custom memory management in the compiler source code (grep for "swap" in the source code), where specific compiler tables are swapped in and out of the actual memory space. So, in the general sense the TIC compiler does not support overlays. Sorry for the red herring.
  4. True, but there is something close. The C compiler used for Mini-Cortex Unix (which is both a cross and native compiler) almost has support for that. The source is here: https://www.jslite.net/cgi-bin/9995/dir?ci=tip (ccom, cc and c2 directories). It is K&R C though, so if you want to compile recent C code you need to do some tweaking. This C compiler is derived from the original C compiler as developed by Dennis Ritchie for the PDP-11 mini. As many people on this list will be aware, the instruction set of the PDP-11 and the 9900 are quite close. In the late 70's, this compiler was modified to generate overlays for programs that could not fit in 16 bits (this work was originally done at Berkeley for 2.10BSD). The overlay system is quite clever and does not require modification to the C source: all the work is done by the linker. In essence, functions that call across overlays do so via a small (automatically generated) thunk that adjusts the memory mapping as needed. The process is described here (nroff document): https://www.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/doc/2.10/ovpap The Mini-Cortex compiler has the code in it to support this feature; however I've never written the bits of support code that it needs. Hence, it *almost* supports this.
  5. The unroll also reduces the cost of the loop counter and loop jump. In the my Unix code (for an 8 bit CF card) I used first this: https://1587660.websites.xs4all.nl/cgi-bin/9995/artifact/c22c09b80a674a44?ln=75,78 but soon switched to: https://1587660.websites.xs4all.nl/cgi-bin/9995/artifact/ad1f9336c3316aa1?ln=86,92 This made it much faster. In your case the loop overhead is less in relative terms, so it isn't as critical. Another learning was dealing with interrupts. Your disk access code may be interrupted, and the interrupt may cause another disk access to happen before returning. Leaving interrupts off for long periods is not a good idea (e.g. your 9902's need servicing), so you have think about the time it takes to read a sector and if you can afford to leave interrupts off for that long, or you have to make sure the disk code is not re-entered twice in parallel. The third learning was that using the CPU to read the disk is a hog. Once you start running parallel jobs (not sure MDOS supports this) you really start to notice this, although the short seek times on flash disks offset some of this. I am planning to add DMA capability to my next board.
  6. I think you have found the fastest form, maybe unrolling the loop 4 times would gain a few percent but that is it. Slower but more compact options could be: - If a 256 byte table is too much space, you could consider using a nibble table with 16 entries and do it in two steps. - There is this hack for bit reversing a byte using MPY: https://graphics.stanford.edu/~seander/bithacks.html#ReverseByteWith32Bits If you place the hypothetical shift register in memory instead of parallel CRU space, your last example would not need the R12 adjustments and could be a bit faster still. For Unix on the Cortex I've found that disk access speed does matter a lot (but early Unix was quite disk intensive, maybe more so than MDOS).
  7. Please read the full post for some context. The 8087 is actually much faster than the others on plain arithmetic (add/sub, mul, div). I did not make a comparison for the transcendental functions at all. If one is looking for simple & fast mathematical functions, consider using lookup tables. The original Forth needed that (it was created for software controlling telescopes); if I remember well the arithmetic was done using scaled 32 bit integers, with the each function looked up in a 64 entry table and using interpolation.
  8. Actually, it is 99000 code. The difference is not big, but the code uses such things as 32 bit addition and shifts. So it is not cut and paste, but the amount of effort needed to make it true 9900 code would not be big. Also, John Walker (of AutoCAD fame) wrote some single & double precision routines for the 9900 that were fast for their time: https://www.fourmilab.ch/fbench/fbench.html I don't have source code for this, but the object code libraries can be reverse engineered of course. Happy to post the object code if anybody is interested. Note that the RADIX 100 code has the benefit of being exact for for decimal fractions, i.e. it is much better suited to writing financial program code than IBM370 or IEEE (adding 0.01 to an amount one hundred times will not end up being 0.99 due to rounding issues).
  9. Well, the analog shows that the signals are as I would expect them to be. Had a quick look at the spec sheet for the VB-8012: https://www.ni.com/pdf/manuals/371527d.pdf It says that the input threshold on the digital inputs can be adjusted between 0V and 2V. For TTL signals I think low is 0-0.7V and high is 2.4-5V. Maybe the threshold is currently set to a quite low or high value), where overshoots are detected as a reverse signal for one sample. Mathematically, 1.5V should be ideal, but in a circuit that mixes (LS-)TTL, HCT, NMOS, etc. some experimentation may be in order.
  10. Again, congrats on getting it to work! This by itself is not certain. My understanding (after experimentation) is that reset is only sampled by the CPU on the rising edge of CLKOUT. Although the datasheet says that it must be asserted for at least 3 machine cycles (clocks), actually one is enough for the processor to reset. If the glitches occur outside the setup/hold time around the clock edge, the CPU would not notice. It may be interesting to do an analogue measurement for CLKOUT an RESET and see how that relates to the digital measurements.
  11. Maybe it is not a DC but an AC issue: maybe the bus line is ringing? Have you tried a 100R series resistor to dampen reflections (as in the firehose interface)?
  12. I am not sure that is a good idea. Initially my thoughts were like yours, and I was aiming for the PLA to be in block RAM. Two things changed my mind: (i) The "ROM" has lots of duplication in it and it turns out that generating signals from the state vector does not take all that many LUTs. Probably this is the reason that CPU's from that era often used PLA's for microcode in the real silicon. (ii) The LUT version is faster than the "ROM" version. This was the case on the ICE40 chips and perhaps even more so on the ECP5 chips. Maybe the second reason drops away when the microcode lookup is more pipelined than in my design. A now obsolete reason was that I wanted the conserve block RAM on the limited ICE40 chip. Yes. This design choice was driven by a wish to stay close to the original silicon (see here and figure 3 in the 99105/99110 data manual). This too uses a constant table. Trying to eliminate multiplexers is a good idea, I think. In the NMOS silicon of the era, it was almost free to have a tri-state bus on the chip. On an FPGA this translates to multiplexers. The natural multiplexer seems to be a 4 bit 2:1 multiplexer in a single logic block and an 8-way multiplexer takes 3 layers of LUT. Including all the wire routing, the actual layout quickly becomes hard to predict/understand. Selecting ALU inputs and ALU function, and generating flag bits, is a critical timing path for me. The 99000 microcode is 152 bits wide. Mine is much more narrow, but in part that is optical. Fields have often been constrained to 4 bits, so that 1 LUT can derive single signals. I've never counted how many bits I have after such expansion. For another take on microcode organisation, take a look at the microcode word of the 990/12. It is described briefly in one of the assembler manuals, but I cannot find the right link at the moment. It is 64 bits wide.
  13. Happy to hear that you found the problem. Yes, with AS I meant ALATCH; I was working with a M68K recently and got the signal names confused. Wow, that VB-8012 is a serious bit of kit. Does it have an input mode that adds some hysteresis to the 32 inputs? If so, it could maybe help with the cross-talk. Maybe @Jimhearne and @Stuart have suggestions -- they are much better with hardware issues than I am.
  14. This is very interesting avenue of development! Just throwing out some thoughts: 1. I heard (read) the GPL processor thing as well, but I am not sure it is correct. As I understood, the original plan was for a 99xx CPU with an 8 bit data path but this project did not (timely) materialise and the 16-bit 9900 was shoehorned in at a late stage. I also think I remember reading that the designers did not mind the "double interpreter" because they expected that a dedicated CPU would be used for a next gen system. I am not sure how the two things relate, if at all. 2. For a microcoded design, have a look at my 99000 version. It has ~200 states for the 9995 instruction set. 3. Another route could be to use the co-processor design of the 99xxx series. I am not implementing that, but it could help to keep complexity down, by separating the GPL part in a co-processor. That co-processor could have a data path optimised for GPL,with maybe a separate address ALU etc. The co-processor interface has facilities to transfer the WP, PC and ST registers between the CPU and the co-processor, so integration could be quite seamless.
  15. When I look at the scope output picture(s) I am surprised by some of the signals. It is not clear why CLKOUT should not show a nice regular square wave, and I don't think that the BST lines should change state when the AS signal is low. Is it possible that the scope / analyser is not grounded and hence mis-measuring the signals? If your system is multi-board, is it possible that ground does not feed through? Or a ground loop perhaps?
  16. Maybe this works for ya: https://www.reichelt.nl/gb/en/sr-32kx8-28p-62256-80-p2673.html?r=1
  17. It is not about the line count so much, it is about maximum simplicity. When using internal ram (the smallest version of the ULX3S has 112KB internal ram/rom capacity), doing a 9918 that just supports basic 256x192 VGA DVI output is very simple indeed, hardly more complex than the video circuit in the 99/2. The complexity is in the sprites, which are done with comparators/counters in 9918 silicon, 4 blocks of that. I'm thinking of duplicating that design in the FPGA, hopefully it leads to very simple & readable code. However, writing that takes time, which I currently don't have. Just the other day I learned that Yosys currently cannot infer true two-port anyway. It is limited to one R/W port and one R port -- this bit of the Yosys code is currently being rewritten, so hopefully this limitation will be gone soon. For true 2-port one currently has to use a library block (Emard has that in his repo). Yes, it does not use GPL, and it does not need to as the RAM is connected to the CPU. When debugging the TI99/2 I disassembled some parts of the 32KB ROM and it has a table driven parser that compiles into a token byte code ("IF", "NEXT", etc.). This token byte code is then interpreted by calling a subroutine for each token. I did not manage to fully understand the parser, but I think it is a bottom-up parser with separate left and right priorities for each token - I did not get to the bottom of it. At another time, yes please. At the moment work projects are keeping me away from hobby stuff and I'd like to complete three other hobby projects first: - A 4-way write-back cache, to make sdram access fast. I have that working for Oberon, but I'm not happy with it yet. - True HDMI video (as opposed to DVI). This means implementing data islands and sound encoding. - Clean up TCP/IP for the Cortex So, we're talking mid-2021 at the earliest, maybe 2022. Maybe it is a cool project for a Tomy Tutor enthusiast...
  18. Actually, the original Cortex did that, using a technique called "write under". Initially, reads are from ROM, but writes to the same addresses are sent to RAM. Once the copy is complete, the ROM is switched off (using a CRU bit) and both reads and writes go to RAM. There are wait states for slow ROM access. Have you considered the TMS9911 DMA controller?
  19. The TI99/2 is here: https://gitlab.com/pnru/ti99/-/tree/master/ti99_2 The Mini-Cortex is here: https://gitlab.com/pnru/cortex I've focused on the Unix side of it. Its main claim to fame is that it hosts a 99-native C compiler and tool chain, and hence it can re-compile itself. I have a native TCP/IP stack working, but the experience is not smooth yet. It uses the ESP32 as an ISP, and connects to it using a PPP serial line. Yes, I've been thinking about that as well. The CPU in the Mini-Cortex is my best approximation of the 9995 yet, implementing the extra 4 instructions. It is almost cycle accurate and the bus interface is that of the 99105. It also has code to emulate the 9995's interrupt lines, for the internal timer and CRU bits, etc. Tongue-in-cheek, I'm calling it the 99095. What I had in the back of my mind was to do a version of the 9918 that mimicked the data paths of the real vintage silicon. I think it should fit in some 500-700 lines of Verilog and would of course have the same limits (4 sprites on a line, no 80 column text mode, etc.). Never got around to doing that code. Together with the 99095 it would allow for a very compact implementation of the Tomy Tutor. Probably using your 9918 is a quicker route to success.
  20. Just for info to the interested: after the first production run of the ULX3S board of almost a 1000 pieces sold out in days, there is now a second production batch available on Mouser: https://eu.mouser.com/Search/Refine?Keyword=ulx3s If interested in this stuff, get one whilst supplies last. There is now the Icy99 implementation of the 99/4A + extensions, and there is also a TI99/2 implementation, and an implementation of the Mini-Cortex. Maybe an implementation of the TI99/8 will emerge over time. ULX3S development is discussed at: https://gitter.im/ulx3s/Lobby The complete open source Verilog tool chain can be downloaded from https://github.com/open-tool-forge/fpga-toolchain it is a big install (~700MB installed), but still much, much better than the multi-gigabyte installs that the vendor tool chains require.
  21. For future readers: I think you mean page 80. When developing the Verilog model for the 99000 I came across this optimisation of MOV. It is not only the DOP fetch, but also the destination WS fetch that is skipped. When debugging the FPGA version of the TI99/2 I disabled this optimisation for a while. Much to my surprise the TI99/2 ran some 10% slower without the optimisation. There is some interesting analysis by Karl Guttag (the 9995 and 99000 chip designer) here (note that the code name for the 99000 was "Alpha"): https://hansotten.file-hunter.com/uploads/files/99000 (Alpha) Misc Documents.pdf It says that MOV is 25% of instructions, together with MOVB even 30%. Throw JMP in the mix and it is 50%. Do you still have the problem for MOVB? On the 9995 interim solution it is not an issue and on the 99105 you can ignore the read-before-write using the BST outputs, as you already described above. What other current scenario is still problematic?
  22. Ehmm.... the above is the total of what I know about them.
  23. No, I did not mean any word length. I meant "general purpose" as opposed to hardwired to be an accounting machine.
  24. By coincidence I was looking a RTC chips a few weeks ago. Looking at some old boards, I arrived at the MM58174 as a chip that was period correct and has direct links to very first RTC chips to appear on the market in the 1970's (the OKI M5832). For an 8 bit variant have a look at the MM58167. A third choice could be an early serial chip, the NEC uPD4990. This can maybe be coerced to act like a CRU interface chip. I think all are still available on eBay, but I have no direct buying experiences for any of the above.
  25. As I understand it, Kienzle got started as a manufacturer of mechanical taximeters in the 1920's and from there expanded into electro-mechanical tabulating/accounting machines in the 1950/60's. From there they moved to electronic accounting machines (what in Germany they used to call "Mittlere Datentechnik", mid-range information technology). This was the 6000 series. Some sources say it used the 9900, but considering is was launched in 1968 that is probably wrong. Perhaps the later models did. Kienzle got stuck in that technology and was late converting to true mini-computers. They launched the 9000 series in 1979 and this used the tms9900 for sure. As they were entering the market late without clear differentiation, it was not a commercial success. I visited the Kienzle factory in 1984 (I think) as a student on a group field trip and came across a 9000 series machine in passing. They mainly wanted to showcase their automobile technologies but I got a few questions in. What I remember of that is that it was based on the tms9900 and that it ran MTOS. They also claimed that it could "run Unix as a sub-system under MTOS". I think what that meant was that it had a C compiler and a C library that worked under MTOS. I've never found any other reference to that, maybe it was a research skunkworks project. I think the main workhorse in the 9000 series was the Kienzle 9066. I am not sure what MTOS was. It could have been an in-house development, it is also possible that it was a translated version of DX10, or something like that. Later on they had the the 9100, 9200 etc. series, which I believe to have been tms99000 based (99105 most likely). In view of the timeline it is possible that the later 9x00 series used Ten-X technology, but I am speculating here. It is quite likely that (Mannesmann-) Kienzle made similar steps as TI in the late 80's, switching to x86 and 68K based unix systems, with software support to run the 990 base of Cobol programs, before giving up altogether. There is a list of Kienzle models here: http://www.computer-archiv.de (go to section K, select Kienzle, for the specific page). As Ksarul already observed, there are also MIPS machines in the list, the 2800 series.
×
×
  • Create New...