+mizapf Posted December 1, 2019 Share Posted December 1, 2019 If you have a Geneve and a way to transfer these files (on the attached DSK) to that machine, it would be nice if you ran these test programs and note their results or take photos of the screens and send them to me. The SPEED program takes about 10 minutes, each VWAIT program about 6 minutes. Each program runs several tests with loops containing accesses to SRAM, DRAM, or video memory. I am currently working on a reimplementation of the Geneve emulation, with the Gate Array and the PAL as separate components, in particular, making use of the PAL equations that were published here recently. Still, I have some differences to the times with my real machine: the video write operations take a cycle too long on emulation, and the "System clock speed" CRU bit 23 has absolutely no effect on my real machine, although there should be visible effects according to the PAL equations. Before I start to fake things to force them to work like my machine, it could be worth to check whether my real machine is actually correctly working. timing.dsk 2 Quote Link to comment Share on other sites More sharing options...
+9640News Posted December 1, 2019 Share Posted December 1, 2019 Here are my results: SPEED Result for test 01: 175 Result for test 02: 306 Result for test 03: 306 Result for test 04: 305 Result for test 05: 305 Result for test 06: 305 Result for test 07: 328 Result for test 08: 306 Result for test 09: 327 Result for test 10: 327 Result for test 11: 371 Result for test 12: 327 Result for test 13: 371 Result for test 14: 218 Result for test 15: 587 Result for test 16: 670 Result for test 17: 328 Result for test 18: 328 VWAIT1A Result for test 01: 154 Result for test 02: 154 Result for test 03: 364 Result for test 04: 364 Result for test 05: 406 Result for test 06: 392 Result for test 07: 406 Result for test 08: 364 Result for test 09: 335 Result for test 10: 363 VWAIT1B Result for test 01: 154 Result for test 02: 154 Result for test 03: 210 Result for test 04: 364 Result for test 05: 349 Result for test 06: 349 Result for test 07: 391 Result for test 08: 210 Result for test 09: 210 Result for test 10: 252 VWAIT1C Result for test 01: 154 Result for test 02: 153 Result for test 03: 363 Result for test 04: 364 Result for test 05: 405 Result for test 06: 391 Result for test 07: 405 Result for test 08: 364 Result for test 09: 336 Result for test 10: 364 1 1 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 1, 2019 Author Share Posted December 1, 2019 Thanks Beery, this fully conforms with my measurements, and if someone else (maybe Tim) checks and gets the same values, we can assume these are the correct and intended ones. The values are the tenths of seconds for all iterations (154 = 15.4 seconds) and may differ by ±1, depending on when the time was taken within a second. SPEED: Null loop, read byte from on-chip RAM, read word from on-chip RAM, write byte on-chip, write word on-chip, read byte SRAM, read word SRAM, write byte SRAM, write word SRAM, read byte DRAM, read word DRAM, write byte DRAM, write word DRAM, set video address, video read, video write, read byte from peripheral bus, read byte from peripheral bus with CRU bit 23=0. Iteration count is 6553600. You can see that a difference of 2.18 seconds (22 units here) means 333 ns which is one cycle. Video tests have 8192000 iterations. VWAIT1A: Various video read and write tests with "hidden" wait states (see https://www.ninerpedia.org/wiki/Geneve_video_wait_states ). Here I use >400000 iterations. 1 Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted December 1, 2019 Share Posted December 1, 2019 I downloaded the image and will transfer to my Geneve today. Are your measurements presuming a standard, unmodified Geneve? One of my Geneve's has a WHT turbo video PAL - I may be able to test both. 1 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 1, 2019 Author Share Posted December 1, 2019 This is all standard Geneve (except that I have 32K SRAM upgrade, which is not relevant here). Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted December 1, 2019 Share Posted December 1, 2019 I ran the four tests - my results mirrored Beery's results with at most +1/-1 delta. I could not find my WHT turbo video PAL; I hope I didn't lose it or mistakenly ship it somewhere. 1 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 1, 2019 Author Share Posted December 1, 2019 Here is a comparison of the current state of the Geneve emulation in MAME (left) vs. the real Geneve (right), concerning execution speed. Note that the TI BASIC is run in GPL5 speed, which is a lot faster than the TI console. The seemingly simple counting in BASIC involves a lot of different points, including correct timing for DRAM/SRAM and the GROM emulation inside the Gate Array, and also video read and write operations. As you can see (maybe do a fast forward if you don't want to watch all the 120 seconds, or press pause at some points in time), the real Geneve is a tiny bit faster, which may depend on the problem with the wait states for video write operations. Altogether, however, I think this is a pretty good result. 6 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 2, 2019 Author Share Posted December 2, 2019 One more check, but pretty short. Really short, in fact, and completely wrong in MAME. This test (SWAIT) writes the byte 9F to F120 (the sound port in native mode, corresponding to 8400 in the TI console) in a loop. I am running 380480 loops. No need to turn off the speakers; 9F means muting channel 1. On my real Geneve, this test finishes after 5.7 seconds. On MAME, however, this test currently runs for 1950.5 seconds (more than 30 min)! There is clearly something wrong with the READY handling of the sound chip. The reason why this does not affect the rest of the emulation is that writes to the sound chip are extremely rare, compared to all other accesses. Here again, I'd appreciate if some of you with a real Geneve verify that their machine and my machine behave equally. timing1.dsk 1 Quote Link to comment Share on other sites More sharing options...
+9640News Posted December 3, 2019 Share Posted December 3, 2019 SWAIT Result for test 01: 51 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 3, 2019 Author Share Posted December 3, 2019 Looks good, 51 means 5.1 seconds. At least it is much lower than 19505. I'll have to investigate a bit more in MAME. My real system seems to be in perfect health, that is also good news. The sound chip in fact produces wait states by lowering READY; the low time is 32 cycles of its input clock, which is the output of the V9938 - it is actually driven by the video processor. The clock output of the 9938 is 1/6 of its XTAL frequency input, which is about 21 MHz. Hence, it should produce a READY low time of 27 CPU cycles (333ns), i.e. 27 wait states. -- I am not only trying to maximize the precision for MAME's sake here. I'm trying to emulate the Gate Array and the PAL in order to develop a theory how the Gate Array is actually implemented. We do not have detailed descriptions about the Gate Array, but if I manage to get it running at the exact timings, this is a good indication what is actually happening inside. Interestingly, the emulation of the Gate Array has become simpler than before. I remember that Matthew or someone else once said that the actual hardware realizations are much simpler than what you typically meet in emulation, and I think this is going in the right direction now. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 3, 2019 Author Share Posted December 3, 2019 I'm currently away from my Geneve for the some days. It would be really kind if someone of you gives it a try. The JWAIT test produces two results after about 16 seconds each. The difference between both tests is that in one test, a conditional jump is done, while in the other one, the jump is not done; apart from that, the same operations are performed. I'd like to know how that affects the 9995 prefetch. Normally, the prefetch gets the next instruction while the ALU performs the last operation. While writing the results, the new command is decoded. But how does that work with conditional jumps? The decision whether to jump or not should be a matter of the ALU operation, which is in parallel with the prefetch, but how does the CPU know which operation to fetch? T01 MOVB @SRAM,R3 DEC R1 JNE T01 DEC R2 JNE T01 In this example, assume that right now, the DEC R1 command fetches the R1 value. The destination address need not be calculated, so the next step would be to fetch the next instruction, JNE. While this is done, the ALU performs the DEC operation. Next, the JNE is decoded, while the results of DEC are written to R1. Then, the program counter is fetched. On the next cycle, the JNE operation is evaluated (check the EQ bit in the status register and add the displacement to the PC or not), while in parallel, the next operation is prefetched. But is it MOVB or DEC R2? This detail is not explained in the 9995 specifications. It merely says how many cycles may be saved in the optimal case. The scenario described above is a typical issue in pipelined systems called "bubbles in the pipeline": If the wrong way is followed (related to the prefetch), the prefetched command is discarded, and this "pipeline flush" reduces the efficiency. Accordingly, we should notice a different cycle count for both executions. In more advanced system there is a branch prediction, and both options are executed in parallel, until one of them turns out to be the "actual one", and the other is discarded. In MAME, I went for the simple way, finishing the evaluation of the jump condition in zero time, and then 100% correctly "predicting" the next operation; and, of course, I'd like to fix that. (Also, it may help with the still unsolved video wait state issue.) (I hope this is not too boring for the rest of you, but I think it shows some fascinating details of our machines that most are not aware of.) timing2.dsk Quote Link to comment Share on other sites More sharing options...
+9640News Posted December 4, 2019 Share Posted December 4, 2019 JWAIT Result for test 01: 160 Result for test 02: 160 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 4, 2019 Share Posted December 4, 2019 (edited) 14 hours ago, mizapf said: Looks good, 51 means 5.1 seconds. At least it is much lower than 19505. I'll have to investigate a bit more in MAME. My real system seems to be in perfect health, that is also good news. The sound chip in fact produces wait states by lowering READY; the low time is 32 cycles of its input clock, which is the output of the V9938 - it is actually driven by the video processor. The clock output of the 9938 is 1/6 of its XTAL frequency input, which is about 21 MHz. Hence, it should produce a READY low time of 27 CPU cycles (333ns), i.e. 27 wait states. Is this the same sound chip as in the TI? The sound chip holds the CPU in the 4A because if it doesn't, the write cycles are actually too fast for it to be able to process the data. It's actually slower than the GROMs in that respect. So the hold is mandatory for the chip to work. It should be easy to calculate what the real machine /should/ take, and then you can determine whether emulation has extra wait states or hardware has fewer. You run 380480 loops. Just in terms of sound chip holds only, then, let's calculate. (Edit: sorry, I had to redo my math here.) Per your math above, that's 27 CPU cycles * 380480 = 10,272,960 cycles * (1/3,000,000) = 3.42 second. Trying the math on the input clock (21/6?) gives us 32 * (1/3,500,000) * 380480 = 3.47 seconds. This second more closely matches reality (remember that's hold time on the sound chip only, and none of the CPU cycles), so I clearly have a mistake in my math on the first line. I probably misread something there. They actually both look pretty close when I stop being dumb on my units. But that suggests that there is no magic involved in the hardware - reality seems to match the expected time on the sound chip holds alone. Maybe my confusion can help offer some insight into where the MAME code is confused. Edited December 4, 2019 by Tursi Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 4, 2019 Author Share Posted December 4, 2019 Thank you for your calculations ... but I start to believe this is a simple programming error in MAME (in the sn76496.cpp file). I noticed that there are long pauses where the READY line is low (20 ms). The point is that the MAME core requests a number of sound samples from audio components, and here, if that number equals the clock divider exactly, the loop is left without raising READY. That is clearly wrong. 2 Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 4, 2019 Share Posted December 4, 2019 Classic99 supports correct timing on the SN sound chip. 2 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 7, 2019 Author Share Posted December 7, 2019 For now, I decided to put aside the video write issue and continue with the GenMod. But there I also found a problem, or let's say, something to clarify. I do not have a Genmod; I hope we have some owner here, who is also able to run the attached program GMWAIT. The point is that the Genmod intercepts the READY line from the Gate Array and feeds its own READY into the PAL. This is important, since the Gate Array produces a wait state for external accesses, and the Genmod promises by its TURBO switch to turn off these 1-ws accesses. The daughter board is soldered to the back of the Gate Array, so we can assume it gets the same inputs, and it can watch its outputs. The Gate Array can be set to create an additional wait state for all accesses (using a CRU bit). My problem is: How does the Genmod daughter board know what is the actual cause of the wait state? It cannot look inside the Gate Array, so it cannot see the mapper and the additional wait state flag. In particular, when accessing the Pbox, the Gate Array pulls down READY, so the Genmod may possibly ignore this, but if there is an additional wait state, it should probably also pull down READY. I could imagine a solution, but which looks a bit too complicated for two GALs on that daugther board. Before, it would be wiser to check on a real Genmod that these additional wait states actually occur. It could also be that with TIMODE set to off and TURBO set to on, no additional wait states are supported. So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with TURBO=0 / TIMODE=0; TURBO=0 / TIMODE=1; TURBO=1 / TIMODE=0; TURBO=1 / TIMODE=1. Each run should take only a few minutes. Thanks in advance. timing3.dsk 1 Quote Link to comment Share on other sites More sharing options...
+9640News Posted December 8, 2019 Share Posted December 8, 2019 13 hours ago, mizapf said: For now, I decided to put aside the video write issue and continue with the GenMod. But there I also found a problem, or let's say, something to clarify. I do not have a Genmod; I hope we have some owner here, who is also able to run the attached program GMWAIT. The point is that the Genmod intercepts the READY line from the Gate Array and feeds its own READY into the PAL. This is important, since the Gate Array produces a wait state for external accesses, and the Genmod promises by its TURBO switch to turn off these 1-ws accesses. The daughter board is soldered to the back of the Gate Array, so we can assume it gets the same inputs, and it can watch its outputs. The Gate Array can be set to create an additional wait state for all accesses (using a CRU bit). My problem is: How does the Genmod daughter board know what is the actual cause of the wait state? It cannot look inside the Gate Array, so it cannot see the mapper and the additional wait state flag. In particular, when accessing the Pbox, the Gate Array pulls down READY, so the Genmod may possibly ignore this, but if there is an additional wait state, it should probably also pull down READY. I could imagine a solution, but which looks a bit too complicated for two GALs on that daugther board. Before, it would be wiser to check on a real Genmod that these additional wait states actually occur. It could also be that with TIMODE set to off and TURBO set to on, no additional wait states are supported. So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with TURBO=0 / TIMODE=0; TURBO=0 / TIMODE=1; TURBO=1 / TIMODE=0; TURBO=1 / TIMODE=1. Each run should take only a few minutes. Thanks in advance. timing3.dsk 180 kB · 1 download Unfortunately, not able to help you out on this one. While I have a GenMod Geneve, it has been acting a bit flakey when back in October the PEBox fan went out and the card got VERY VERY hot. In addition, I do not have a working switch box. Beery Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 8, 2019 Author Share Posted December 8, 2019 Hi Beery, not a big issue. Maybe someone else can help. Anyway, without the switch box, the results may not be helpful. I'll go for a reasonable guess in the meantime. If there is someone else who intends to do the test on his Genmod now or in the next days, I'd appreciate to get a short notice so that I know it's worth waiting. Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted December 8, 2019 Share Posted December 8, 2019 I no longer have a Genmod so I cannot run the test. Certainly they are a rarity; I wonder how many were sold and subsequently put into service. I've only repaired/upgraded 2-3 such cards in 25+ years; not sure how many have come across Richard's bench. (Most of the upgrades would have been performed with original regulators/caps still in place and would probably have needed a refresh by now) Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted December 10, 2019 Share Posted December 10, 2019 On 12/8/2019 at 7:14 AM, mizapf said: Hi Beery, not a big issue. Maybe someone else can help. Anyway, without the switch box, the results may not be helpful. I'll go for a reasonable guess in the meantime. If there is someone else who intends to do the test on his Genmod now or in the next days, I'd appreciate to get a short notice so that I know it's worth waiting. I can run this tonight on my genmod. -M@ 3 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 10, 2019 Author Share Posted December 10, 2019 In the meantime (thanks in advance, Matt), I collected some theoretical questions that may be answered without running a program. 1. If you fully populate the Memex (2M), can (must) you remove the on-board DRAM? (No DRAM on the Geneve means no GPL mode anymore, of course, since the GROM handling requires the DRAM.) I know that the Memex can be configured by switches to ignore some page areas. However, it cannot be configured to mask the 00-3F area for the board DRAM. When driven by 0 wait state, the Memex may be able to respond, but at the same time, the board DRAM may get garbled. Since the Genmod cannot turn off the DRAM on the board (no traces are cut for that purpose), how can it handle this issue? That is, in TURBO mode (otherwise, there is 1 WS). 2. Can (must) you remove the on-board SRAM at pages EC-EF when you use the full Memex? 3. Peripheral cards should decode AMA/B/C. However, I do not see any requirements to also decode AMD/E. Without this, the DSR page BA also appears by three more mirrors, i.e. as page 00-111-010 (3A), 01-111-010 (7A), 10-111-010 (BA), 11-111-010 (FA). That is, are these pages also blocked? (With the stock Geneve, this is not a problem, as only the BA page is in the external space, where the bus drivers are activated.) Running MEMTEST might reveal this. 4. Concerning pages F0-FF, which are the on-board boot ROM, a full 2M Memex could allow storing values at these addresses. However, if we have the EPROM still in place, this would mean a bus conflict when those values differ. The Memex has switches to mask these pages, but I cannot really imagine a use case (unless the Memex has an EPROM at that location). The Memex seems to be invented before the Genmod. The question arises whether all this actually worked as expected. Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted December 11, 2019 Share Posted December 11, 2019 On 12/7/2019 at 3:54 AM, mizapf said: ... So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with TURBO=0 / TIMODE=0; TURBO=0 / TIMODE=1; TURBO=1 / TIMODE=0; TURBO=1 / TIMODE=1. timing3.dsk 180 kB · 1 download My GMWAIT results for genmod w/2M memex TURBO=0 / TIMODE=0 1, 60 2, 60 3, 100 4, 120 5, 110 6, 120 7, 110 8, 120 TURBO=0 / TIMODE=1 1, 60 2, 60 3, 99 4, 120 5, 110 6, 120 7, 110 8, 119 TURBO=1 / TIMODE=0 1, 60 2, 60, 3, 100 4, 120 5, 100 6, 100 7, 110 8, 120 TURBO=1 / TIMODE=1 1, 60 2, 60 3, 100 4, 119 5, 110 6, 120 7, 110 8, 120 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted December 11, 2019 Share Posted December 11, 2019 Here is also my memtest results, 2M memex card. You see that 3A, 7A, BA, and FA are treated as not ram. Also the system ROM, I believe is F0-FF, is all blocked out. The main Geneve board still has all the DRAM on board, physically. -M@ 1 1 Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 11, 2019 Author Share Posted December 11, 2019 Thanks, Matt. This is pretty interesting! - Even though you have on-board DRAM, the pages 00-3F are used with 0 ws (turbo=1, timode=0). I really wonder how the DRAM feels like when it is treated with these 0-ws accesses. As I said, the traces to the DRAM are not cut (and cannot be cut), so it will react on accesses to 00-3f. As it seems, it must stay in sync with the Memex RAM ... somehow. - The BA page access is not sped up (1 ws) - As I guessed, with Turbo=1, TIMODE=0, the "additional wait state" feature is ineffective, at least for the 00-3F access. The even-numbered tests were using the additional wait state setting. As I said, the Genmod cannot at the same time suppress wait states and allow for additional wait states, as it cannot peek into the Gate array. - The mirroring of BA is as expected. I will provide one more (hopefully final) test to check the access to different page areas. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted December 14, 2019 Author Share Posted December 14, 2019 Hello Matt, would you please run the GMWAIT1 program from this image on your Genmod with all four switch combinations, as you did above? You may certainly abbreviate the result by writing like "values are all 110 (±1) except for pages xx,xx,xx..." (plus the switch combination). 110 means 1 wait state, 100 means 0 wait states. Thanks! timing4.dsk Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.