Jump to content

pnr

Members
  • Posts

    159
  • Joined

  • Last visited

Recent Profile Visitors

4,094 profile views

pnr's Achievements

Chopper Commander

Chopper Commander (4/9)

162

Reputation

  1. Thanks for finding this. How do I learn more about this, is there a link for this document?
  2. Here's some links about Marinchip Systems from John Walker's website: General intro to Marinchip: https://www.fourmilab.ch/documents/marinchip/ Photo's of the S-100 / 9900 boards: https://www.fourmilab.ch/documents/marinchip/boards/ Marinchip at the crossroads: https://www.fourmilab.ch/autofile/e5/chapter2_110.html The 9900 based ancestor of AutoCAD: https://www.3dcadworld.com/autocads-ancestor/ Marinchip morphs into AutoCAD company: https://www.fourmilab.ch/autofile/e5/chapter2_2.html It must have been a fascinating journey. Most of the Marinchip software runs on the Powertran Cortex incl. its emulator. If we ever find the source code for Interact, it would be cool to make it run on the Powertran. If we find the binary, I think doing a FPGA system with the appropriate S-100 graphics board might be possible.
  3. I seem to recall that Al Beard's C compiler also supports overlays (on the Geneve). Al is still around, maybe he can clarify. EDIT: I looked up the source. The manual says this about TIC: "The GENEVE MDOS version of the compiler utilizes sophisticated memory swapping to gain an 85k workspace for the compiler, even though the compiler is over 50k in length. This allows compilation of fairly sophisticated C programs. The total memory required to run the TIC compiler is 144k." However, this appears to be achieved via custom memory management in the compiler source code (grep for "swap" in the source code), where specific compiler tables are swapped in and out of the actual memory space. So, in the general sense the TIC compiler does not support overlays. Sorry for the red herring.
  4. True, but there is something close. The C compiler used for Mini-Cortex Unix (which is both a cross and native compiler) almost has support for that. The source is here: https://www.jslite.net/cgi-bin/9995/dir?ci=tip (ccom, cc and c2 directories). It is K&R C though, so if you want to compile recent C code you need to do some tweaking. This C compiler is derived from the original C compiler as developed by Dennis Ritchie for the PDP-11 mini. As many people on this list will be aware, the instruction set of the PDP-11 and the 9900 are quite close. In the late 70's, this compiler was modified to generate overlays for programs that could not fit in 16 bits (this work was originally done at Berkeley for 2.10BSD). The overlay system is quite clever and does not require modification to the C source: all the work is done by the linker. In essence, functions that call across overlays do so via a small (automatically generated) thunk that adjusts the memory mapping as needed. The process is described here (nroff document): https://www.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/doc/2.10/ovpap The Mini-Cortex compiler has the code in it to support this feature; however I've never written the bits of support code that it needs. Hence, it *almost* supports this.
  5. The unroll also reduces the cost of the loop counter and loop jump. In the my Unix code (for an 8 bit CF card) I used first this: https://1587660.websites.xs4all.nl/cgi-bin/9995/artifact/c22c09b80a674a44?ln=75,78 but soon switched to: https://1587660.websites.xs4all.nl/cgi-bin/9995/artifact/ad1f9336c3316aa1?ln=86,92 This made it much faster. In your case the loop overhead is less in relative terms, so it isn't as critical. Another learning was dealing with interrupts. Your disk access code may be interrupted, and the interrupt may cause another disk access to happen before returning. Leaving interrupts off for long periods is not a good idea (e.g. your 9902's need servicing), so you have think about the time it takes to read a sector and if you can afford to leave interrupts off for that long, or you have to make sure the disk code is not re-entered twice in parallel. The third learning was that using the CPU to read the disk is a hog. Once you start running parallel jobs (not sure MDOS supports this) you really start to notice this, although the short seek times on flash disks offset some of this. I am planning to add DMA capability to my next board.
  6. I think you have found the fastest form, maybe unrolling the loop 4 times would gain a few percent but that is it. Slower but more compact options could be: - If a 256 byte table is too much space, you could consider using a nibble table with 16 entries and do it in two steps. - There is this hack for bit reversing a byte using MPY: https://graphics.stanford.edu/~seander/bithacks.html#ReverseByteWith32Bits If you place the hypothetical shift register in memory instead of parallel CRU space, your last example would not need the R12 adjustments and could be a bit faster still. For Unix on the Cortex I've found that disk access speed does matter a lot (but early Unix was quite disk intensive, maybe more so than MDOS).
  7. Please read the full post for some context. The 8087 is actually much faster than the others on plain arithmetic (add/sub, mul, div). I did not make a comparison for the transcendental functions at all. If one is looking for simple & fast mathematical functions, consider using lookup tables. The original Forth needed that (it was created for software controlling telescopes); if I remember well the arithmetic was done using scaled 32 bit integers, with the each function looked up in a 64 entry table and using interpolation.
  8. Actually, it is 99000 code. The difference is not big, but the code uses such things as 32 bit addition and shifts. So it is not cut and paste, but the amount of effort needed to make it true 9900 code would not be big. Also, John Walker (of AutoCAD fame) wrote some single & double precision routines for the 9900 that were fast for their time: https://www.fourmilab.ch/fbench/fbench.html I don't have source code for this, but the object code libraries can be reverse engineered of course. Happy to post the object code if anybody is interested. Note that the RADIX 100 code has the benefit of being exact for for decimal fractions, i.e. it is much better suited to writing financial program code than IBM370 or IEEE (adding 0.01 to an amount one hundred times will not end up being 0.99 due to rounding issues).
  9. Well, the analog shows that the signals are as I would expect them to be. Had a quick look at the spec sheet for the VB-8012: https://www.ni.com/pdf/manuals/371527d.pdf It says that the input threshold on the digital inputs can be adjusted between 0V and 2V. For TTL signals I think low is 0-0.7V and high is 2.4-5V. Maybe the threshold is currently set to a quite low or high value), where overshoots are detected as a reverse signal for one sample. Mathematically, 1.5V should be ideal, but in a circuit that mixes (LS-)TTL, HCT, NMOS, etc. some experimentation may be in order.
  10. Again, congrats on getting it to work! This by itself is not certain. My understanding (after experimentation) is that reset is only sampled by the CPU on the rising edge of CLKOUT. Although the datasheet says that it must be asserted for at least 3 machine cycles (clocks), actually one is enough for the processor to reset. If the glitches occur outside the setup/hold time around the clock edge, the CPU would not notice. It may be interesting to do an analogue measurement for CLKOUT an RESET and see how that relates to the digital measurements.
  11. Maybe it is not a DC but an AC issue: maybe the bus line is ringing? Have you tried a 100R series resistor to dampen reflections (as in the firehose interface)?
  12. I am not sure that is a good idea. Initially my thoughts were like yours, and I was aiming for the PLA to be in block RAM. Two things changed my mind: (i) The "ROM" has lots of duplication in it and it turns out that generating signals from the state vector does not take all that many LUTs. Probably this is the reason that CPU's from that era often used PLA's for microcode in the real silicon. (ii) The LUT version is faster than the "ROM" version. This was the case on the ICE40 chips and perhaps even more so on the ECP5 chips. Maybe the second reason drops away when the microcode lookup is more pipelined than in my design. A now obsolete reason was that I wanted the conserve block RAM on the limited ICE40 chip. Yes. This design choice was driven by a wish to stay close to the original silicon (see here and figure 3 in the 99105/99110 data manual). This too uses a constant table. Trying to eliminate multiplexers is a good idea, I think. In the NMOS silicon of the era, it was almost free to have a tri-state bus on the chip. On an FPGA this translates to multiplexers. The natural multiplexer seems to be a 4 bit 2:1 multiplexer in a single logic block and an 8-way multiplexer takes 3 layers of LUT. Including all the wire routing, the actual layout quickly becomes hard to predict/understand. Selecting ALU inputs and ALU function, and generating flag bits, is a critical timing path for me. The 99000 microcode is 152 bits wide. Mine is much more narrow, but in part that is optical. Fields have often been constrained to 4 bits, so that 1 LUT can derive single signals. I've never counted how many bits I have after such expansion. For another take on microcode organisation, take a look at the microcode word of the 990/12. It is described briefly in one of the assembler manuals, but I cannot find the right link at the moment. It is 64 bits wide.
  13. Happy to hear that you found the problem. Yes, with AS I meant ALATCH; I was working with a M68K recently and got the signal names confused. Wow, that VB-8012 is a serious bit of kit. Does it have an input mode that adds some hysteresis to the 32 inputs? If so, it could maybe help with the cross-talk. Maybe @Jimhearne and @Stuart have suggestions -- they are much better with hardware issues than I am.
  14. This is very interesting avenue of development! Just throwing out some thoughts: 1. I heard (read) the GPL processor thing as well, but I am not sure it is correct. As I understood, the original plan was for a 99xx CPU with an 8 bit data path but this project did not (timely) materialise and the 16-bit 9900 was shoehorned in at a late stage. I also think I remember reading that the designers did not mind the "double interpreter" because they expected that a dedicated CPU would be used for a next gen system. I am not sure how the two things relate, if at all. 2. For a microcoded design, have a look at my 99000 version. It has ~200 states for the 9995 instruction set. 3. Another route could be to use the co-processor design of the 99xxx series. I am not implementing that, but it could help to keep complexity down, by separating the GPL part in a co-processor. That co-processor could have a data path optimised for GPL,with maybe a separate address ALU etc. The co-processor interface has facilities to transfer the WP, PC and ST registers between the CPU and the co-processor, so integration could be quite seamless.
  15. When I look at the scope output picture(s) I am surprised by some of the signals. It is not clear why CLKOUT should not show a nice regular square wave, and I don't think that the BST lines should change state when the AS signal is low. Is it possible that the scope / analyser is not grounded and hence mis-measuring the signals? If your system is multi-board, is it possible that ground does not feed through? Or a ground loop perhaps?
×
×
  • Create New...