Jump to content

mizapf

Members
  • Content Count

    5,351
  • Joined

  • Last visited

  • Days Won

    3

Posts posted by mizapf


  1. Hi all,

     

    I just benchmarked the SWPB. If someone is interested in the tool, just tell me. Or maybe I should upload it here.

     

    I'm using the RTC in the Geneve and do a loop with 0x0400000 iterations. The empty loop yields 8.4 seconds; the loop with a SWPB inside returns 26.6 seconds, i.e. a delta of 18.2 seconds. After division by the iteration count this is a time difference of 4.34 µs (which is the execution time of a single SWPB). With a cycle period of 0.333 µs we have exactly 13 cycles. No error in the specs.

     

    This is all measured with code and registers in on-chip RAM. You can change parameters in the source code to put code and/or registers in SRAM or DRAM.

    • Like 1

  2. Hi Matthew,

     

    one thing I would be really interested in, as you said you did a HDL implementation - can you tell me how DIV and DIVS are implemented? In particular, I'm interested in the overflow detection of DIVS, because right now I have an ugly piece of code in MESS to predict an overflow, and this *must* be easier to achieve.

     

    The problem is as follows: The overflow detection is pretty simple for unsigned division. For DIV R2,R0 we get

     

    R0 = ((R0<<16)+R1) / R2

    R1 = ((R0<<16)+R1) % R2

     

    and we will have an overflow iff R0 >= R2. So this can be quickly checked before the division algorithm starts. The fact that DIV requires much less cycles for this case seems to prove that. In MESS I let the code pretend to do a division procedure by calculating the result directly and then consuming the appropriate number of clock cycles. I'd favor to have the real division procedure, though.

     

    However, as for overflow I'm getting real headaches with the signed version, which has to do with the sign bit. I had go through all combinations of even/odd/positive/negative divisors and positive/negative dividends. So this works as expected, but as I said, this must be easier somehow. Maybe the real hardware can find out earlier during the execution of the DIVS microprogram.


  3. <disclaimer>None of my comments should be understood as conveying a feeling of being insulted or similar unless I say so.</disclaimer>

     

    Don't worry, and likewise, I hope you are not feeling offended by my contributions. I just say what I found out, and I'm happy to learn more. After all, this is not a religion but a scientific hobby.


  4. Also, I don't have experience in real hardware, that is right. This sharing of data paths is something that may indeed be easier in hardware than in software, where you try a lot to keep apart things for the sake of readability and code structure. What I did is to try to simulate each piece of microprogram, but as you said, from a functional perspective. As far as the manuals disclosed to me.


  5. OK, admitted, the DIV is not a good compare against the SWPB (as you said, it stops quickly in some cases), but don't you think the SWPB is still surprisingly slow compared to other operations (in particular when looking at the 9900 cycles)?


  6. In my opinion the differences between 9995 and 9900 are not negligible. For once, it has an 8-bit data bus output (but a 16 bit architecture), so the whole operation of the 16/8 data bus converter is inside the chip. The 9995 makes good use of this by dropping the time-consuming read-before-write known from the 9900. Then you have the prefetch feature which is not present for the 9900. Also, the interrupt handling is pretty different (watch the flow charts); you do not have 16 interrupt levels anymore.

     

    In fact, when you compare execution speeds of the 9980A to the 9900 you can see that there is no real difference, apart from the fact that it also has to split memory accesses as two 8-bit accesses, but otherwise it reacts just like the 9900. Both, as far as I could find out, really share the same microprograms.

     

    In MESS I could exploit this in terms of subclassing 9980A from 9900. For the 9995 I had to start almost from scratch.

     

    As for SWPB I think I remember that I actually measured it. I once wrote a program to do benchmarking (just by doing a loop with the operation and taking the time difference using the RTC in the Geneve), but I cannot tell for sure now that I actually tested SWPB. From the specs you can see that the memory access is not the real problem - you can try the operations in the on-chip RAM so you don't have wait states, and you have the full 16-bit access. And still the processing time is surprisingly high. (If I find some time I'll do a check, but right now I'm a bit busy preparing slides for my lectures next week... :) )


  7. I also thought about some internal SRC when I tried to figure out how the respective cycle count can be explained for various commands. It seems to be indeed plausible for TMS9900, but as you see, it takes even more cycles on the TMS9995.

     

    In the document "9900-FamilySystemDesign-04-HardwareDesign.pdf", page 4-89 ff. you can find the microprograms for the TMS9900 (at least as far as needed to explain the cycle count per command); I used it extensively for the re-implementation in MESS. The SWPB command is handled together with CLR, SETO, INV, NEG, INC(T), and DEC(T); the shifts look very different. Still, it could be that some portion of the ALU is re-used in both cases.

     

    Unfortunately there is no such document for the 9995, so I had to guess how the microprograms possibly look like (with all that prefetching and so on). And this was right there when I ran into that issue with SWPB. The 9995 is a huge improvement of the 9900, which becomes clear when you see how few (external) cycles it actually uses, but surprisingly not in that case.

     

    In fact, the 9995 is driven by a 12 MHz clock but outputs a 3 MHz external clock line. First I thought the 12 MHz is simply divided by 4, but when I saw how far the number of cycles has been reduced from the 9900, I guess the 12 MHz are well needed inside the CPU for driving the microprograms. While in the 9900 you could well imagine that this happens with this clock tick, now that is the next step and so on, in the 9995 you simply cannot believe how most operations can be handled in the seemingly few clock ticks visible on CLKOUT. There happening much more inside.


  8. You are possibly right with the *Rx+ cycles, I did not pull out the docs to check. I may have done it right in MESS as I closely followed the specs with all the flow charts and so, but just don't remember right now :) . So the claim is that MOVB R0,*R1+ is faster than MOV R0,*R1+ ... I'll see if I did that correctly inside the emulation.

     

    There is another funny thing with these delays. If you have a look at the specs of the TMS9995 (used in the Geneve and TI-99/8) you will notice that the SWPB instruction is extremely slow, compared to the others. For instance, on the TMS9900, A takes 14 cycles and SWPB 10 (plus additional memory cycles). On the 9995, while A takes 4 cycles, SWPB requires 13 with the same setting (all operand on-chip), and is thus twice as slow as DIV. I had to consider that when I re-implemented the CPUs in MESS, and had to explicit add dummy cycles to get that delay.

     

    I cannot really imagine why SWPB (a particularly simple command) should take that many cycles. My guess is that this is an intentional delay, just related to the issue with delays when accessing slow devices like the VDP. With this slowed down execution you could stay with the common MOVB/SWPB/MOVB sequence when setting the address without changing the source code. (Just a wild guess, yes.)


  9. 1. The VDP read routine provided by Matthew in this thread does not have the delay prescribed in the E/A manual between sending low and high byte. Is this a mistake or is it not really necessary? I know the F18A does not need the delay, and since I have one of those great boards installed in my console it's difficult to test. How about emulators, do they care about the delay? Is there a window after vsync where the delay is not needed?

     

    It also depends on the location of the code. For instance, in 16-bit memory (ROM or scratch pad or internal 32K or TMS9995-internal RAM) the instructions require less cycles, so this may overrun the VDP. I know that in the Geneve you have to use delays.

     

    2. Is there a clever way to read a VDP byte, modify it (e.g. OR with a bit mask), and write it back without having to set the write address twice? (like temporarily disabling auto-increment)

     

    No, each read automatically increases the memory pointer of the VDP.

     

    3. There doesn't seen to be any bit instructions in the instruction set. If I want to set bit n (variable) in a register I can set it to >8000 and shift right (n-1) times. Is there a faster way?

     

    As already said you should use SOC for this purpose. BTW, SZC can be used to clear those bits that were set with SOC.

     

    4. Is immediate addressing, e.g. LI R1,>0001, faster than memory addressing, e.g. MOV @ONE,R1? What if ONE is in scratch pad?

     

    Immediate values require an own cycle to be read. For the special values 0 and FFFF you should use CLR or SETO which are faster. Being in scratch pad only means that the time-multiplexed data bus operation is suspended, so there are less cycles.

     

    5. Are byte instructions, e,g. MOVB, faster than word instructions, e.g. MOV?

     

    No. Byte operations always require the CPU to load the full word first. For instance, if you want to write 01 to address A000, you expect that address A001 remains the same. However, the CPU is 16-bit, it cannot move half a word out of the ALU. Accordingly, what it does is to load the complete word at A000 first, keep the value of A001 in an internal storage, modify the byte at A000, and write the complete word back.

     

    The TMS9995 (in the Geneve) is more flexible here because it has only 8 databus lines. It can indeed address single bytes and therefore never needs to read before the write. (In this case, fewer data bus lines actually make the architecture faster!)

     

    7. Can anyone point me to a good alternative to the KSCAN routine? The one in the ROM is pretty lousy as far as I understand.

     

    Why? I'd recommend to stay with the standard key scan routine as long as you don't have tight timing constraints. Using the standard routine ensures that systems with a different keyboard (like the Geneve) will run with your program. If you start to work with bare CRU lines this compatibility is gone.

     

    8. What is the easiest way to place a small routine in scratch pad? I don't suppose you can use AORG?

     

    I never tried, but you are likely to interfere with the loader. As long as the Editor/Assembler cartridge is running (and GPL is executing), the scratch pad is heavily in use. AORG is only useful for tagged object code, and once you switch to memory image code you don't have it anymore. The standard is to copy a routine from some location in your program into the scratch pad.


  10. What is the "right location" for a bad opcode detection? :) if you run into bad opcode you just continue.

     

    (and the TMS9995 raises a MID interrupt, of course; I once wrote a tool I named GURU mediation as it did something similar like the Amiga)

     

    @Tursi: I just say that this is a very likely issue (and in particular in MESS as there is no clean memory at the startup), as the code checks locations it has not previously loaded. In the case that the above Compare instruction is not EQ, the code continues and (maybe, did not check so far) loads the support code at the right memory locations. I suppose that when LK99 is loaded before, the execution correctly lands in some expected code. It is still not clear why this seems to always work with the hardware console.

     

    (OK, forget about that, you already gave the answer...)


  11. Analysis:

     

    WURM loads at >A000

    WURN loads at >FD40

     

    a000: 0460 ff52 B @>ff52

     

    ff52: 02e0 fd78 LWPI >fd78

    ff56: 8820 2000 311e C @>2000,@>311e

    ff5c: 1339 JEQ >ffd0

    ...

    ffd0: c820 fd40 2008 MOV @>fd40,@>2008

    ffd6: c820 fd42 200a MOV @>fd42,@>200a

    ffdc: 0460 200c B @>200c

     

    OK, now this is definitely bad style; at least it should not be that way. >2000 is cleared on entry of Editor/Assembler, but >311e is left untouched. So the result is not really predictable and depends on what was in that location.

     

    My guess would be that the real console always starts with a blank memory, while almost all emulators just fetch their memory from the operating system, possibly filled with random bytes. This is something that I can fix easily; I can add a memset at device startup in MESS; of course not at reset time because the contents must survive a console reset.

     

    Michael


  12. I have not found a sector dump disk image (v9t9 style) of TurboPasc'99 in that ZIP file, so I've created one right now - anywhere I should upload it to? (Had the chance to find some import errors in TIImageTool on this occasion.)

     

    I'm quite happy with the original German version :) , anyway, I have the printed manual in a cardboard box somewhere.

     

    As for the issue with starting EA5 programs this should be possible to find out. If LK99 needs to be loaded before this sounds to me as if the executable file was not properly linked, and during execution it jumps into the remainders of LK99 loaded before.

     

    Thanks for the link!


  13. Er ... do I still have a telephone joker? :) Sorry, I cannot give you more information about how to work with the HSGPL. I just did some elementary tests to make sure the emulation is good. Of course, if things don't work as in real, please tell me.

     

    Michael


  14. Hi Matthew,

     

    no, there's no chance to run such an Evolutionary Algorithm at a 3 Mhz speed. Our EA framework is written in Java, but while Java is certainly slower than machine language on a PC, it is by many factors faster than the TMS9900 machine code (estimated, don't ask me for numbers). And we let it run for some weeks.

     

    Michael

     

    P.S.: As a non-native speaker I just checked "musings" to make sure, but it indeed means "thoughts/ponderings", doesn't it? Sorry if my text is a bit theoretical, just wanted to throw in some of my thoughts from science. :)

     


  15. The interesting thing about the discussion on intelligence is that we tend to call every kind of mental process intelligent that we cannot describe as a "mechanical" sequence of elementary decisions. Or, in other words, intelligence is on retreat the further we understand mental processes. :-)

     

    I'm not sure how far we are away from an acceptable level of mental power of machines. There is still some romantic idea of people believing in a phenomenon called "free will" (which I believe is a Fata Morgana) which already led some renowned scientist (just forgot the name) to consider quantum processes in the mind to achieve this "free will". In my view a free will is nothing else but the ability to conceive, evaluate, and select options for the next action, and maybe also the self-awareness about this fact (i.e. we are aware that the next action is up to our own choice and not exclusively determined by external influence).

     

    Accordingly, a free will is in good reach for machine intelligence.

     

    Intelligence is also about deducing knowledge from information - beyond the simple pattern recognition. This addition seems important to me to justify the initial axiom that intelligence is any non-understood mental process. :-) Animals can be trained to react to spoken commands. Is that intelligent? Is learning as such an expression of intelligence or just a mental capability? Which of our own habits is learned behavior, which is intelligent?

     

    Will algorithmic computing lead to intelligent behavior? By algorithmic computing I refer to our "classic" way of programming, i.e. we have a problem and find an algorithm to solve that problem. It is not very likely (close to 0) that this solution will solve anything else. Accordingly, no problem will be solved by a computer this way that has not been treated by us before.

     

    During my scientific work at Kassel University, we explored Genetic Programming as a recent way of programming computers in a non-algorithmic way. GP is a kind of Evolutionary Algorithm, which in turn tries to mimic nature's evolution: Hundreds of thousands of sample programs are applied to a problem, and the effect is evaluated in terms of suitability. The "better" programs are kept, while the inferior ones are excluded from the gene pool. Genetic Programming not only addresses parameters of programs that are optimized this way but whole programs. That is, we start with a simple program and modify it until it fits. May take some weeks time on a computer cluster.

     

    What we learned from that is that evolution just does not care about obvious advantages (obvious in our view) if other solutions are likewise helpful but in a non-obvious way. In our concrete scenario we modeled an environment where we wanted our programs to distribute as equally as possible. The result that was "bred" over the time were about 10 lines of code in a simple rule-based programming language that we designed before. However, the lines seemed to make no sense at all - which is, by the way, just the point: there is no "sense" in such programs because no one actually wrote them.

     

    Interestingly, the programs fulfilled their job, but it took me some time to find out why. It was difficult to understand because the rules are iteratively executed which means you cannot simply follow the trace of execution. (Nothing else happens inside our brains where neural signals circle round and round between different areas.). Eventually I got the point why these lines worked the way they did. The small "agents" just "abused" some of their memory locations to expand their internal state space, which we originally planned for them to store specific location information. I was pretty excited to see that the system indeed worked around some limitations that we unintentionally left in the system. This was the moment where I saw that we could make a system behave in some way *without* some prior concept.

     

    Could continue for hours, sorry. :-) If you are interested in these experiments I can give you some links.

     

    Michael

     


  16. Seems to be. I just had a look at the CALL KEY routine in [email protected]>3708. Here's a part of the disassembled GPL (from my TIImageTool, next release :) ):

     

    3724: DST >4001,@>834a prepare the radix 100 number

    3728: CEQ >ff,@>8375 -1 means no new key

    372b: BS [email protected]>3742

    372d: CHE >64,@>8375

    3730: BR [email protected]>373d

    3732: INC @>834a

    3734: SUB >64,@>8375

    3737: ST @>8375,@>834c

    373a: B [email protected]>3740

    373d: ST @>8375,@>834b

    3740: BR [email protected]>3744

    3742: DNEG @>834a negate 4001 yields -1 (indicates no new key)

    3744: XML >15

    3746: BR [email protected]>3620

     

    This works as long as the value is not 0. The 0 is represented as 00 00 xx xx xx xx xx xx (see Editor/Assembler, page 279). But this routine returns >4000 for 0. The problem is that the floating point routines in the Monitor ROM check for the first word being 0000 on multiply, and very likely also when comparing. Funny thing is that adding a value turns the invalid number to a valid number, while multiplying fails. :)

     

    Michael


  17. You can also press normal E, will bring you to "PRESS CASSETTE STOP", then you get an I/O ERROR 56. And the program is gone unless you have a 32K memory expansion.

     

    However, in TI BASIC, the program is still there. So again, an issue in Extended Basic without 32K?

     

    BTW, with 32K, you get a nonsense error 03 here, so that's another issue. But I remember the I/O error codes got somewhat mixed up in Extended Basic.

     

    Michael


  18. I never had a real problem with FCTN-=. I cannot remember that I ever hit it by accident. What happened to me, instead, was that after a whole afternoon of typing in a BASIC program, I typed in OLD CS1 instead of SAVE CS1. :D This was before I had a memory expansion, so it wiped the memory instantly, no chance for Exit. I don't remember whether I ever typed in that program a second time.

     

    Michael

    • Like 1
×
×
  • Create New...