mizapf #1 Posted December 1 If you have a Geneve and a way to transfer these files (on the attached DSK) to that machine, it would be nice if you ran these test programs and note their results or take photos of the screens and send them to me. The SPEED program takes about 10 minutes, each VWAIT program about 6 minutes. Each program runs several tests with loops containing accesses to SRAM, DRAM, or video memory. I am currently working on a reimplementation of the Geneve emulation, with the Gate Array and the PAL as separate components, in particular, making use of the PAL equations that were published here recently. Still, I have some differences to the times with my real machine: the video write operations take a cycle too long on emulation, and the "System clock speed" CRU bit 23 has absolutely no effect on my real machine, although there should be visible effects according to the PAL equations. Before I start to fake things to force them to work like my machine, it could be worth to check whether my real machine is actually correctly working. timing.dsk 2 Quote Share this post Link to post Share on other sites
BeeryMiller #2 Posted December 1 Here are my results: SPEED Result for test 01: 175 Result for test 02: 306 Result for test 03: 306 Result for test 04: 305 Result for test 05: 305 Result for test 06: 305 Result for test 07: 328 Result for test 08: 306 Result for test 09: 327 Result for test 10: 327 Result for test 11: 371 Result for test 12: 327 Result for test 13: 371 Result for test 14: 218 Result for test 15: 587 Result for test 16: 670 Result for test 17: 328 Result for test 18: 328 VWAIT1A Result for test 01: 154 Result for test 02: 154 Result for test 03: 364 Result for test 04: 364 Result for test 05: 406 Result for test 06: 392 Result for test 07: 406 Result for test 08: 364 Result for test 09: 335 Result for test 10: 363 VWAIT1B Result for test 01: 154 Result for test 02: 154 Result for test 03: 210 Result for test 04: 364 Result for test 05: 349 Result for test 06: 349 Result for test 07: 391 Result for test 08: 210 Result for test 09: 210 Result for test 10: 252 VWAIT1C Result for test 01: 154 Result for test 02: 153 Result for test 03: 363 Result for test 04: 364 Result for test 05: 405 Result for test 06: 391 Result for test 07: 405 Result for test 08: 364 Result for test 09: 336 Result for test 10: 364 1 1 Quote Share this post Link to post Share on other sites
mizapf #3 Posted December 1 Thanks Beery, this fully conforms with my measurements, and if someone else (maybe Tim) checks and gets the same values, we can assume these are the correct and intended ones. The values are the tenths of seconds for all iterations (154 = 15.4 seconds) and may differ by ±1, depending on when the time was taken within a second. SPEED: Null loop, read byte from on-chip RAM, read word from on-chip RAM, write byte on-chip, write word on-chip, read byte SRAM, read word SRAM, write byte SRAM, write word SRAM, read byte DRAM, read word DRAM, write byte DRAM, write word DRAM, set video address, video read, video write, read byte from peripheral bus, read byte from peripheral bus with CRU bit 23=0. Iteration count is 6553600. You can see that a difference of 2.18 seconds (22 units here) means 333 ns which is one cycle. Video tests have 8192000 iterations. VWAIT1A: Various video read and write tests with "hidden" wait states (see https://www.ninerpedia.org/wiki/Geneve_video_wait_states ). Here I use >400000 iterations. 1 Quote Share this post Link to post Share on other sites
+InsaneMultitasker #4 Posted December 1 I downloaded the image and will transfer to my Geneve today. Are your measurements presuming a standard, unmodified Geneve? One of my Geneve's has a WHT turbo video PAL - I may be able to test both. 1 Quote Share this post Link to post Share on other sites
mizapf #5 Posted December 1 This is all standard Geneve (except that I have 32K SRAM upgrade, which is not relevant here). Quote Share this post Link to post Share on other sites
+InsaneMultitasker #6 Posted December 1 I ran the four tests - my results mirrored Beery's results with at most +1/-1 delta. I could not find my WHT turbo video PAL; I hope I didn't lose it or mistakenly ship it somewhere. 1 Quote Share this post Link to post Share on other sites
mizapf #7 Posted December 1 Here is a comparison of the current state of the Geneve emulation in MAME (left) vs. the real Geneve (right), concerning execution speed. Note that the TI BASIC is run in GPL5 speed, which is a lot faster than the TI console. The seemingly simple counting in BASIC involves a lot of different points, including correct timing for DRAM/SRAM and the GROM emulation inside the Gate Array, and also video read and write operations. As you can see (maybe do a fast forward if you don't want to watch all the 120 seconds, or press pause at some points in time), the real Geneve is a tiny bit faster, which may depend on the problem with the wait states for video write operations. Altogether, however, I think this is a pretty good result. 5 Quote Share this post Link to post Share on other sites
mizapf #8 Posted December 2 One more check, but pretty short. Really short, in fact, and completely wrong in MAME. This test (SWAIT) writes the byte 9F to F120 (the sound port in native mode, corresponding to 8400 in the TI console) in a loop. I am running 380480 loops. No need to turn off the speakers; 9F means muting channel 1. On my real Geneve, this test finishes after 5.7 seconds. On MAME, however, this test currently runs for 1950.5 seconds (more than 30 min)! There is clearly something wrong with the READY handling of the sound chip. The reason why this does not affect the rest of the emulation is that writes to the sound chip are extremely rare, compared to all other accesses. Here again, I'd appreciate if some of you with a real Geneve verify that their machine and my machine behave equally. timing1.dsk 1 Quote Share this post Link to post Share on other sites
BeeryMiller #9 Posted December 3 SWAIT Result for test 01: 51 Quote Share this post Link to post Share on other sites
mizapf #10 Posted December 3 Looks good, 51 means 5.1 seconds. At least it is much lower than 19505. I'll have to investigate a bit more in MAME. My real system seems to be in perfect health, that is also good news. The sound chip in fact produces wait states by lowering READY; the low time is 32 cycles of its input clock, which is the output of the V9938 - it is actually driven by the video processor. The clock output of the 9938 is 1/6 of its XTAL frequency input, which is about 21 MHz. Hence, it should produce a READY low time of 27 CPU cycles (333ns), i.e. 27 wait states. -- I am not only trying to maximize the precision for MAME's sake here. I'm trying to emulate the Gate Array and the PAL in order to develop a theory how the Gate Array is actually implemented. We do not have detailed descriptions about the Gate Array, but if I manage to get it running at the exact timings, this is a good indication what is actually happening inside. Interestingly, the emulation of the Gate Array has become simpler than before. I remember that Matthew or someone else once said that the actual hardware realizations are much simpler than what you typically meet in emulation, and I think this is going in the right direction now. Quote Share this post Link to post Share on other sites
mizapf #11 Posted December 3 I'm currently away from my Geneve for the some days. It would be really kind if someone of you gives it a try. The JWAIT test produces two results after about 16 seconds each. The difference between both tests is that in one test, a conditional jump is done, while in the other one, the jump is not done; apart from that, the same operations are performed. I'd like to know how that affects the 9995 prefetch. Normally, the prefetch gets the next instruction while the ALU performs the last operation. While writing the results, the new command is decoded. But how does that work with conditional jumps? The decision whether to jump or not should be a matter of the ALU operation, which is in parallel with the prefetch, but how does the CPU know which operation to fetch? T01 MOVB @SRAM,R3 DEC R1 JNE T01 DEC R2 JNE T01 In this example, assume that right now, the DEC R1 command fetches the R1 value. The destination address need not be calculated, so the next step would be to fetch the next instruction, JNE. While this is done, the ALU performs the DEC operation. Next, the JNE is decoded, while the results of DEC are written to R1. Then, the program counter is fetched. On the next cycle, the JNE operation is evaluated (check the EQ bit in the status register and add the displacement to the PC or not), while in parallel, the next operation is prefetched. But is it MOVB or DEC R2? This detail is not explained in the 9995 specifications. It merely says how many cycles may be saved in the optimal case. The scenario described above is a typical issue in pipelined systems called "bubbles in the pipeline": If the wrong way is followed (related to the prefetch), the prefetched command is discarded, and this "pipeline flush" reduces the efficiency. Accordingly, we should notice a different cycle count for both executions. In more advanced system there is a branch prediction, and both options are executed in parallel, until one of them turns out to be the "actual one", and the other is discarded. In MAME, I went for the simple way, finishing the evaluation of the jump condition in zero time, and then 100% correctly "predicting" the next operation; and, of course, I'd like to fix that. (Also, it may help with the still unsolved video wait state issue.) (I hope this is not too boring for the rest of you, but I think it shows some fascinating details of our machines that most are not aware of.) timing2.dsk Quote Share this post Link to post Share on other sites
BeeryMiller #12 Posted December 4 JWAIT Result for test 01: 160 Result for test 02: 160 1 Quote Share this post Link to post Share on other sites
Tursi #13 Posted December 4 (edited) 14 hours ago, mizapf said: Looks good, 51 means 5.1 seconds. At least it is much lower than 19505. I'll have to investigate a bit more in MAME. My real system seems to be in perfect health, that is also good news. The sound chip in fact produces wait states by lowering READY; the low time is 32 cycles of its input clock, which is the output of the V9938 - it is actually driven by the video processor. The clock output of the 9938 is 1/6 of its XTAL frequency input, which is about 21 MHz. Hence, it should produce a READY low time of 27 CPU cycles (333ns), i.e. 27 wait states. Is this the same sound chip as in the TI? The sound chip holds the CPU in the 4A because if it doesn't, the write cycles are actually too fast for it to be able to process the data. It's actually slower than the GROMs in that respect. So the hold is mandatory for the chip to work. It should be easy to calculate what the real machine /should/ take, and then you can determine whether emulation has extra wait states or hardware has fewer. You run 380480 loops. Just in terms of sound chip holds only, then, let's calculate. (Edit: sorry, I had to redo my math here.) Per your math above, that's 27 CPU cycles * 380480 = 10,272,960 cycles * (1/3,000,000) = 3.42 second. Trying the math on the input clock (21/6?) gives us 32 * (1/3,500,000) * 380480 = 3.47 seconds. This second more closely matches reality (remember that's hold time on the sound chip only, and none of the CPU cycles), so I clearly have a mistake in my math on the first line. I probably misread something there. They actually both look pretty close when I stop being dumb on my units. But that suggests that there is no magic involved in the hardware - reality seems to match the expected time on the sound chip holds alone. Maybe my confusion can help offer some insight into where the MAME code is confused. Edited December 4 by Tursi Quote Share this post Link to post Share on other sites
mizapf #14 Posted Wednesday at 09:16 PM Thank you for your calculations ... but I start to believe this is a simple programming error in MAME (in the sn76496.cpp file). I noticed that there are long pauses where the READY line is low (20 ms). The point is that the MAME core requests a number of sound samples from audio components, and here, if that number equals the clock divider exactly, the loop is left without raising READY. That is clearly wrong. 2 Quote Share this post Link to post Share on other sites
Tursi #15 Posted Wednesday at 11:07 PM Classic99 supports correct timing on the SN sound chip. 2 Quote Share this post Link to post Share on other sites
mizapf #16 Posted Saturday at 11:54 AM For now, I decided to put aside the video write issue and continue with the GenMod. But there I also found a problem, or let's say, something to clarify. I do not have a Genmod; I hope we have some owner here, who is also able to run the attached program GMWAIT. The point is that the Genmod intercepts the READY line from the Gate Array and feeds its own READY into the PAL. This is important, since the Gate Array produces a wait state for external accesses, and the Genmod promises by its TURBO switch to turn off these 1-ws accesses. The daughter board is soldered to the back of the Gate Array, so we can assume it gets the same inputs, and it can watch its outputs. The Gate Array can be set to create an additional wait state for all accesses (using a CRU bit). My problem is: How does the Genmod daughter board know what is the actual cause of the wait state? It cannot look inside the Gate Array, so it cannot see the mapper and the additional wait state flag. In particular, when accessing the Pbox, the Gate Array pulls down READY, so the Genmod may possibly ignore this, but if there is an additional wait state, it should probably also pull down READY. I could imagine a solution, but which looks a bit too complicated for two GALs on that daugther board. Before, it would be wiser to check on a real Genmod that these additional wait states actually occur. It could also be that with TIMODE set to off and TURBO set to on, no additional wait states are supported. So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with TURBO=0 / TIMODE=0; TURBO=0 / TIMODE=1; TURBO=1 / TIMODE=0; TURBO=1 / TIMODE=1. Each run should take only a few minutes. Thanks in advance. timing3.dsk 1 Quote Share this post Link to post Share on other sites
BeeryMiller #17 Posted Sunday at 01:09 AM 13 hours ago, mizapf said: For now, I decided to put aside the video write issue and continue with the GenMod. But there I also found a problem, or let's say, something to clarify. I do not have a Genmod; I hope we have some owner here, who is also able to run the attached program GMWAIT. The point is that the Genmod intercepts the READY line from the Gate Array and feeds its own READY into the PAL. This is important, since the Gate Array produces a wait state for external accesses, and the Genmod promises by its TURBO switch to turn off these 1-ws accesses. The daughter board is soldered to the back of the Gate Array, so we can assume it gets the same inputs, and it can watch its outputs. The Gate Array can be set to create an additional wait state for all accesses (using a CRU bit). My problem is: How does the Genmod daughter board know what is the actual cause of the wait state? It cannot look inside the Gate Array, so it cannot see the mapper and the additional wait state flag. In particular, when accessing the Pbox, the Gate Array pulls down READY, so the Genmod may possibly ignore this, but if there is an additional wait state, it should probably also pull down READY. I could imagine a solution, but which looks a bit too complicated for two GALs on that daugther board. Before, it would be wiser to check on a real Genmod that these additional wait states actually occur. It could also be that with TIMODE set to off and TURBO set to on, no additional wait states are supported. So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with TURBO=0 / TIMODE=0; TURBO=0 / TIMODE=1; TURBO=1 / TIMODE=0; TURBO=1 / TIMODE=1. Each run should take only a few minutes. Thanks in advance. timing3.dsk 180 kB · 1 download Unfortunately, not able to help you out on this one. While I have a GenMod Geneve, it has been acting a bit flakey when back in October the PEBox fan went out and the card got VERY VERY hot. In addition, I do not have a working switch box. Beery Quote Share this post Link to post Share on other sites
mizapf #18 Posted Sunday at 03:14 PM Hi Beery, not a big issue. Maybe someone else can help. Anyway, without the switch box, the results may not be helpful. I'll go for a reasonable guess in the meantime. If there is someone else who intends to do the test on his Genmod now or in the next days, I'd appreciate to get a short notice so that I know it's worth waiting. Quote Share this post Link to post Share on other sites
+InsaneMultitasker #19 Posted Sunday at 05:52 PM I no longer have a Genmod so I cannot run the test. Certainly they are a rarity; I wonder how many were sold and subsequently put into service. I've only repaired/upgraded 2-3 such cards in 25+ years; not sure how many have come across Richard's bench. (Most of the upgrades would have been performed with original regulators/caps still in place and would probably have needed a refresh by now) Quote Share this post Link to post Share on other sites