-
Content Count
62 -
Joined
Posts posted by BillG
-
-
This is a bit of a tangent, but...
I have been working on a 8080 simulator to allow running CP/M programs on the 680x and 65[x]02. One of the posters on the forum at 6502.org said he is interested in what I am doing because he was "collecting languages" on his Apple II. Turbo Pascal was one of the compilers he mentioned; he did not say he had a Z80 card.
That reminded me of my discovery long ago that Turbo Pascal for CP/M required a Z80 processor, both for running the compiler as well as for the programs it generated. I went back to MT+ or assembly language for things needing to run on an 8080 or 8085. But I also briefly considered trying to modify Turbo Pascal to generate 8080 programs; that effort did not go very far as programming on the then new IBM PC took up more of my time.
I took a closer look at the CP/M version of Turbo Pascal after that discussion. I have since disassembled most of the run-time library and commented much of it. I still believe that creating a version of the compiler to run on an 8080 system is not feasible, but it now seems very doable to:
* write an 8080 version of the run-time library
* write a post-compiler to translate Z80 code generated by the compiler to 8080 assembly language
* write a tool to stitch the two together
If that can be done, it is not a huge stretch to
* write enough of a Z80 simulator to run the Turbo Pascal compiler on processor X
* write a version of the run-time library for processor X
* write a version of a converter to translate Z80 code to processor X
I believe this can be done with substantially less effort than writing a new Turbo compatible compiler to run natively on processor X
-
2
-
-
Version 6 of Microsoft C was able to generate that compact interpreted code. It did not last that long, I think version 8 could not.
Parts of Word 1.1 used it.
-
On 7/29/2020 at 2:56 PM, Mehridian Sanders said:On top of that .. does anyone feel like taking a swing at a "Double Wide Speech Synth Case" for modelling? (To accommodate the TIPI/32k and Speech Synth?) I really don't have the time the learn 3d modelling as fast as I would like to.
Tinkercad is a very approachable way to get started in 3D design: https://www.tinkercad.com/
You add or subtract geometric shapes. Definitely give it a try.
Somewhat more advanced is OpenSCAD, a parametric design tool. It is somewhat like programming. https://www.openscad.org/
Both of these are free to use.
At the other end of the spectrum, more powerful and difficult to learn, are the full-featured "professional" CAD programs.
-
In the new year, I hope to have my compiler technology far enough along to compile Pascal and Python solutions for the 7's problem. And hopefully, they will be competitive speed-wise.
-
3
-
-
-
On 9/15/2020 at 11:55 PM, alanbeard said:I can post this source code for the TI-99 and 9640 compilers/linkers/demo programs/tic compiler, etc. but not sure where to post it. Any ideas?
Something like Github may be good in case somebody wants to make improvements to it.
-
On 9/11/2020 at 10:21 PM, JamesD said:Microsoft BASIC might be easier to port since it was designed for that.
Source is availableI am steering away from Microsoft BASIC because it is still under copyright protection. I do not want to risk putting effort into something then be told I cannot use it.
-
Thanks.
That's good to know.
-
My head is spinning and this is how I plan to make it stop.
1. I choose this as the authoritative document. Appendix A of:
2. It speaks in terms of machine cycles. A machine cycle consists of two clock cycles. This was a source of much of my confusion as some numbers appeared to be off by half or double. They were.
3. My assembler will distinguish only between workspace accesses and all other memory accesses (including instruction opcode and immediate operand fetches) and allow specifying whether each is in fast or slow memory. It will assume a 99/4A platform in which all slow memory accesses incur a penalty of four additional clock cycles. The default is fast workspace and slow other memory.
-
1
-
-
6 hours ago, FarmerPotato said:The J1 is a stack-based machine, similar to the Novix NC4016, with an instruction set designed to run FORTH. Read about it here: https://www.excamera.com/files/j1.pdf or https://excamera.com/files/svfig-2015-aug.pdf
Do you think this will be effective for writing emulators for other processors?
My understanding is that the fastest non-jit approach is some form of threaded code instead of a central decoder/dispatcher.
-
11 hours ago, TheBF said:Classic99 debugger shows cycles in the dis-assembler.
In your opinion, is it accurate?
-
I have been playing with code generation on different processors.
This line of code
W0 := S1;
sign extends a byte into two.
For the 6502:
00037 ; 1 L v S1 0029 A0 00 [2] 00038 ldy #0 002B A6 16 [3] 00039 ldx S1 002D 10 01 (0030) [2/3] 00040 bpl 2f 002F 88 [2] 00041 dey 0030 00042 2: 0030 98 [2] 00043 tya 00044 ; 0 := v W0 -> 1 0031 86 0D [3] 00045 stx W0 0033 85 0E [3] 00046 sta W0+1
For the 6800:
00037 * 1 L v S1 0029 4F [2] 00038 clra 002A D6 16 [3] 00039 ldab S1 002C 2A 01 (002F) [4] 00040 bpl 2f 002E 4A [2] 00041 deca 002F 00042 2: 00043 * 0 := v W0 -> 1 002F 97 0D [4] 00044 staa W0 0031 D7 0E [4] 00045 stab W0+1
For the 8080:
00037 ; 1 L v S0 0100 3A 0015 [13] 00038 lda S0 0103 6F [5] 00039 mov L,A 0104 17 [4] 00040 ral 0105 9F [4] 00041 sbb A 0106 67 [5] 00042 mov H,A 00043 ; 0 := v W0 -> 1 0107 22 000D [16] 00044 shld W0
For the 9900:
00043 * 1 L v S1 0052 D020 0037 00044 movb @S1,R0 0056 0880 00045 sra R0,8 00046 * 0 := v W0 -> 1 0058 C800 002E 00047 mov R0,@W0
For the 68000:
00009 ; 1 L v S1 00000400 1038 0421 00010 move.b S1,D0 00000404 4880 00011 ext.w D0 00012 ; 0 := v W0 -> 1 00000406 31C0 0418 00013 move.w D0,W0
That almost feels like cheating.
It is interesting that the last four examples have each been ten bytes long.
Real cheating would be the 80386:
movsx AX,[S1] mov [W0],AX
For the AVR:
00011 ; 1 L v S1 000060 9160 0116 [2] 00012 lds R22,S1 000062 2F76 [1] 00013 mov R23,R22 000063 0F77 [1] 00014 lsl R23 000064 0B77 [1] 00015 sbc R23,R23 00016 ; 0 := v W0 -> 1 000065 9360 010D [2] 00017 sts W0,R22 000067 9370 010E [2] 00018 sts W0+1,R23
And finally, for the 6809:
00037 * 1 L v S1 0029 D6 16 [4] 00038 ldb S1 002B 1D [2] 00039 sex 00040 * 0 := v W0 -> 1 002C DD 0D [5] 00041 std W0
which brings up one of my favorite programming jokes:
The Motorola 6809, where SEX is sometimes followed by STD.
Thank you very much. Drive safely...
-
1
-
6
-
-
On 9/13/2020 at 10:38 AM, TheBF said:What language are your compilers for?
I forgot that I also started PL/M compilers for the 6800 and 6502.
-
1
-
-
Even though it is a challenge and I love a challenge, I am getting somewhat discouraged about trying to generate good code for the 9900.
Are any of the emulators accurate in counting machine cycles?
-
This shows just how un-RISC the 9900 is.
Reminds me of programming the 8088 - the specialized "string" instructions were by far the fastest way to do some things. That advantage eroded away as all of the instructions were made more efficient with each new generation of the architecture. By the time of the 486, a sequence of simple instructions can beat most of the string instructions.
-
1
-
-
Consider the case of
inc @I
The documentation says 10 base cycles, 3 memory accesses and Table A says 8 additional cycles and one additional access for symbolic mode. The operand is listed as the source instead of the destination. None of the memory accesses is workspace.
-
I thought that CZC is just fancy 9900-ese for do an AND, set the EQ flag appropriately and throw away the result.
If not, then you are right that the order matters.
-
If the size of the block is large enough, it may be worthwhile to try to copy words at a time.
00001 * 00002 * R0 = source address 00003 * R1 = destination address 00004 * R2 = count 00005 * 0000 00006 Move 0000 0282 0006 00007 ci R2,6 ; Enough to get clever? 0004 1404 (000E) 00008 jhe TryWord 00009 0006 00010 ByteLoop 0006 DC70 00011 movb *R0+,*R1+ ; Copy a byte at a time 0008 0602 00012 dec R2 ; More? 000A 16FD (0006) 00013 jne ByteLoop 00014 000C 00015 MoveDone 000C 045B 00016 b *R11 ; Return 00017 000E 00018 TryWord 000E 0203 0001 00019 li R3,1 ; Load mask for LSB 00020 0012 C100 00021 mov R0,R4 ; Are they aligned? 0014 2901 00022 xor R1,R4 0016 2503 00023 czc R3,R4 0018 16F6 (0006) 00024 jne ByteLoop ; No, do a byte at a time 00025 001A 24C0 00026 czc R0,R3 ; Word aligned? 001C 1302 (0022) 00027 jeq Aligned 00028 001E DC70 00029 movb *R0+,*R1+ ; Copy the first byte 0020 0602 00030 dec R2 00031 0022 00032 Aligned 0022 C102 00033 mov R2,R4 ; Save count 00034 0024 0912 00035 srl R2,1 ; Convert to number of words 00036 0026 00037 WordLoop 0026 CC70 00038 mov *R0+,*R1+ ; Copy a word 0028 0602 00039 dec R2 002A 16FD (0026) 00040 jne WordLoop 00041 002C 24C4 00042 czc R4,R3 ; One byte left over? 002E 13EE (000C) 00043 jeq MoveDone ; No 00044 0030 DC70 00045 movb *R0+,*R1+ ; Copy the last byte 00046 0032 045B 00047 b *R11 ; Return 00048 00049 * 00050 * Just to see the overhead of combining bytes into words 00051 * 0034 00052 NotAligned 0034 24C0 00053 czc R0,R3 ; Is source aligned? 0036 160F (0056) 00054 jne DestAligned 00055 0038 C102 00056 mov R2,R4 ; Save count 00057 003A 0912 00058 srl R2,1 ; Convert to number of words 00059 003C D170 00060 movb *R0+,R5 ; Get unaligned byte 00061 003E 00062 SrcLoop 003E C1B0 00063 mov *R0+,R6 ; Get next word 00064 0040 06C6 00065 swpb R6 ; Assemble dest word 0042 C1C6 00066 mov R6,R7 0044 D185 00067 movb R5,R6 00068 0046 CC46 00069 mov R6,*R1+ ; Store word 00070 0048 D147 00071 movb R7,R5 ; Ready for next 00072 004A 0602 00073 dec R2 004C 16F8 (003E) 00074 jne SrcLoop 00075 004E 24C4 00076 czc R4,R3 ; One byte left over? 0050 13DD (000C) 00077 jeq MoveDone ; No 00078 0052 DC45 00079 movb R5,*R1+ ; Store the last byte 00080 0054 045B 00081 b *R11 ; Return 00082 0056 00083 DestAligned 00084 * 00085 * Why bother? The overhead is too bad. 00086 *
-
1
-
-
If my calculations are correct, shifting is faster (assuming the workspace is in fast memory and everything else is in slow memory.)
0052 04C0 00049 clr R0 ; 14 : 10 + 4 (fetch) 0054 D020 0037 00050 movb @S1,R0 ; 30 : 14 + 4 (fetch) + 8 + 4 0058 1502 (005E) 00051 jgt 2f ; 12/14 : 8/10 + 4 (fetch) 005A 0260 00FF 00052 ori R0,>FF ; 22 : 14 + 2 * 4 (fetch) 005E 00053 2 005E 06C0 00054 swpb R0 ; 14 : 10 + 4 (fetch)
vs
0052 D020 0037 00049 movb @S1,R0 ; 30 : 14 + 4 (fetch) + 8 + 4 0056 0880 00050 sra R0,8 ; 32 : 12 + 4 (fetch) + 16
-
3 hours ago, senior_falcon said:I think you are in the 9940 section. On page 8-23 of that book we find: "TMS 9900 INSTRUCTION EXECUTION TIMES"
Your jaw will drop when you see the number of clock cycles each instruction uses.
You are right.
Jaw dropped...
-
1
-
-
3 hours ago, apersson850 said:The TMS 9900 does a memory access in two cycles. But if it accesses the memory expansion, or rather anything outside the 256 bytes RAM or the 8 K monitor ROM in the console, it accesses that memory byte by byte, adding a wait state for each byte. So instead of two cycles, a memory access is six cycles.
Thus an assembler can't really know the cycle count, since it doesn't know where the code and the workspace is located. The same software will also behave differently, depending on the machine. My main console has 16-bit wide (two cycle per access) memory for the memory expansion, so it runs faster than a standard TI 99/4A.
There are also different additions depending on the addressing mode used, but they can be calculated, as they are consistent.
A simpler way is to assume the workspace is in fast memory and everything else is in slow memory.
Unfortunately, the tables in the manual do not separate workspace accesses from everything else, so I will have to figure those out.
-
Thanks.
I was going to do that when optimizing for size.
The 9900 does not have a barrel shifter, but shifts a bit at a time. The documentation seems to say one cycle per bit position; if that is true, shifting will be the fastest.
-
As mentioned in another thread, the following code to add a small constant to a variable:
00037 * W0 := W0 + 2; 00038 00039 * * 0 := v W0 -> 1 00040 * * 1 L r 2 00041 00042 * * 2 L v W0 -> 3 00043 * * 3 + c 2 00044 00045 00046 * 1 L r 2 00047 * 2 L v W0 -> 3 00048 * 3 + c 2 0052 C020 002E 00049 mov @W0,R0 0056 05C0 00050 inct R0 00051 * 0 := v W0 -> 1 0058 C800 002E 00052 mov R0,@W0
can be optimized to this:
00037 * W0 := W0 + 2; 00038 00039 * * 0 := v W0 -> 1 00040 * * 1 L r 2 00041 00042 * * 2 L v W0 -> 3 00043 * * 3 + c 2 00044 00045 00046 * 1 L r 2 00047 * 2 L v W0 -> 3 00048 * 3 + c 2 0052 05E0 002E 00049 inct @W0 00050 * 0 := v W0 -> 1
This is the code to add two signed bytes resulting in a 16-bit number. Is there a better way to do sign extension?
00037 * W0 := S1 + S2; 00038 00039 * * 0 := v W0 -> 1 00040 * * 1 L r 2 00041 00042 * * 2 L v S1 -> 3 00043 * * 3 + v S2 00044 00045 00046 * 1 L r 2 00047 * 2 L v S1 -> 3 00048 * 3 + v S2 0052 04C0 00049 clr R0 0054 D020 0037 00050 movb @S1,R0 0058 1502 (005E) 00051 jgt 2f 005A 0260 00FF 00052 ori R0,>FF 005E 00053 2 005E 06C0 00054 swpb R0 0060 04C1 00055 clr R1 0062 D060 0038 00056 movb @S2,R1 0066 1502 (006C) 00057 jgt 2f 0068 0261 00FF 00058 ori R1,>FF 006C 00059 2 006C 06C1 00060 swpb R1 006E A001 00061 a R1,R0 00062 * 0 := v W0 -> 1 0070 C800 002E 00063 mov R0,@W0
-
1 hour ago, TheBF said:Ya totally sucks doesn't it. If you are using the TI-99 "O/S" as it were you are best to treat the 256 bytes of 16 bit RAM with kid gloves. There are about 120 ish bytes at the top that you can use.
For example in most Forth systems the workspace is at >8300. So at least the primary registers are in fast ram.
My cross assemblers for most other processors offer an option to display the number of machine cycles each instruction uses.
My 9900 one does not attempt doing so due to the complexity of calculating it and that was not even taking 8-bit delays into account. It is so helpful that I may still attempt it.
-
1
-

Anyone up to the challenge?
in TI-99/4A Development
Posted
Surprisingly, there is apparently no entry for the TMS9900 or 99/4A...
http://www.99-bottles-of-beer.net/t.html