Jump to content
IGNORED

Geneve wait state timing


mizapf

Recommended Posts

If you have a Geneve and a way to transfer these files (on the attached DSK) to that machine, it would be nice if you ran these test programs and note their results or take photos of the screens and send them to me. The SPEED program takes about 10 minutes, each VWAIT program about 6 minutes. Each program runs several tests with loops containing accesses to SRAM, DRAM, or video memory.

 

I am currently working on a reimplementation of the Geneve emulation, with the Gate Array and the PAL as separate components, in particular, making use of the PAL equations that were published here recently. Still, I have some differences to the times with my real machine: the video write operations take a cycle too long on emulation, and the "System clock speed" CRU bit 23 has absolutely no effect on my real machine, although there should be visible effects according to the PAL equations.

 

Before I start to fake things to force them to work like my machine, it could be worth to check whether my real machine is actually correctly working.

timing.dsk

  • Like 2
Link to comment
Share on other sites

Here are my results:

 

SPEED

Result for test 01: 175

Result for test 02: 306

Result for test 03: 306

Result for test 04: 305

Result for test 05: 305

Result for test 06: 305

Result for test 07: 328

Result for test 08: 306

Result for test 09: 327

Result for test 10: 327

Result for test 11: 371

Result for test 12: 327

Result for test 13: 371

Result for test 14: 218

Result for test 15: 587

Result for test 16: 670

Result for test 17: 328

Result for test 18: 328

 

VWAIT1A

Result for test 01: 154

Result for test 02: 154

Result for test 03: 364

Result for test 04: 364

Result for test 05: 406

Result for test 06: 392

Result for test 07: 406

Result for test 08: 364

Result for test 09: 335

Result for test 10: 363

 

VWAIT1B

Result for test 01: 154

Result for test 02: 154

Result for test 03: 210

Result for test 04: 364

Result for test 05: 349

Result for test 06: 349

Result for test 07: 391

Result for test 08: 210

Result for test 09: 210

Result for test 10: 252

 

VWAIT1C

Result for test 01: 154

Result for test 02: 153

Result for test 03: 363

Result for test 04: 364

Result for test 05: 405

Result for test 06: 391

Result for test 07: 405

Result for test 08: 364

Result for test 09: 336

Result for test 10: 364

 

 

 

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Thanks Beery, this fully conforms with my measurements, and if someone else (maybe Tim) checks and gets the same values, we can assume these are the correct and intended ones. The values are the tenths of seconds for all iterations (154 = 15.4 seconds) and may differ by ±1, depending on when the time was taken within a second.

 

SPEED: Null loop, read byte from on-chip RAM, read word from on-chip RAM, write byte on-chip, write word on-chip, read byte SRAM, read word SRAM, write byte SRAM, write word SRAM, read byte DRAM, read word DRAM, write byte DRAM, write word DRAM, set video address, video read, video write, read byte from peripheral bus, read byte from peripheral bus with CRU bit 23=0. Iteration count is 6553600. You can see that a difference of 2.18 seconds (22 units here) means 333 ns which is one cycle. Video tests have 8192000 iterations.

 

VWAIT1A: Various video read and write tests with "hidden" wait states (see https://www.ninerpedia.org/wiki/Geneve_video_wait_states ). Here I use >400000 iterations.

  • Like 1
Link to comment
Share on other sites

Here is a comparison of the current state of the Geneve emulation in MAME (left) vs. the real Geneve (right), concerning execution speed.

 

Note that the TI BASIC is run in GPL5 speed, which is a lot faster than the TI console. The seemingly simple counting in BASIC involves a lot of different points, including correct timing for DRAM/SRAM and the GROM emulation inside the Gate Array, and also video read and write operations. As you can see (maybe do a fast forward if you don't want to watch all the 120 seconds, or press pause at some points in time), the real Geneve is a tiny bit faster, which may depend on the problem with the wait states for video write operations.

 

Altogether, however, I think this is a pretty good result.

 

 

  • Like 6
Link to comment
Share on other sites

One more check, but pretty short. Really short, in fact, and completely wrong in MAME. This test (SWAIT) writes the byte 9F to F120 (the sound port in native mode, corresponding to 8400 in the TI console) in a loop. I am running 380480 loops. No need to turn off the speakers; 9F means muting channel 1.

 

On my real Geneve, this test finishes after 5.7 seconds. On MAME, however, this test currently runs for 1950.5 seconds (more than 30 min)!

There is clearly something wrong with the READY handling of the sound chip. The reason why this does not affect the rest of the emulation is that writes to the sound chip are extremely rare, compared to all other accesses.

 

Here again, I'd appreciate if some of you with a real Geneve verify that their machine and my machine behave equally.

timing1.dsk

  • Like 1
Link to comment
Share on other sites

Looks good, 51 means 5.1 seconds. At least it is much lower than 19505.

 

I'll have to investigate a bit more in MAME. My real system seems to be in perfect health, that is also good news.

 

The sound chip in fact produces wait states by lowering READY; the low time is 32 cycles of its input clock, which is the output of the V9938 - it is actually driven by the video processor. The clock output of the 9938 is 1/6 of its XTAL frequency input, which is about 21 MHz. Hence, it should produce a READY low time of 27 CPU cycles (333ns), i.e. 27 wait states.

 

--

 

I am not only trying to maximize the precision for MAME's sake here. I'm trying to emulate the Gate Array and the PAL in order to develop a theory how the Gate Array is actually implemented. We do not have detailed descriptions about the Gate Array, but if I manage to get it running at the exact timings, this is a good indication what is actually happening inside.

 

Interestingly, the emulation of the Gate Array has become simpler than before. I remember that Matthew or someone else once said that the actual hardware realizations are much simpler than what you typically meet in emulation, and I think this is going in the right direction now.

Link to comment
Share on other sites

I'm currently away from my Geneve for the some days. It would be really kind if someone of you gives it a try.

 

The JWAIT test produces two results after about 16 seconds each. The difference between both tests is that in one test, a conditional jump is done, while in the other one, the jump is not done; apart from that, the same operations are performed. I'd like to know how that affects the 9995 prefetch. Normally, the prefetch gets the next instruction while the ALU performs the last operation. While writing the results, the new command is decoded. But how does that work with conditional jumps? The decision whether to jump or not should be a matter of the ALU operation, which is in parallel with the prefetch, but how does the CPU know which operation to fetch?

 


T01  MOVB @SRAM,R3

       DEC R1

       JNE  T01

       DEC R2

       JNE  T01

 

In this example, assume that right now, the DEC R1 command fetches the R1 value. The destination address need not be calculated, so the next step would be to fetch the next instruction, JNE. While this is done, the ALU performs the DEC operation. Next, the JNE is decoded, while the results of DEC are written to R1. Then, the program counter is fetched. On the next cycle, the JNE operation is evaluated (check the EQ bit in the status register and add the displacement to the PC or not), while in parallel, the next operation is prefetched. But is it MOVB or DEC R2?

 

This detail is not explained in the 9995 specifications. It merely says how many cycles may be saved in the optimal case. The scenario described above is a typical issue in pipelined systems called "bubbles in the pipeline": If the wrong way is followed (related to the prefetch), the prefetched command is discarded, and this "pipeline flush" reduces the efficiency. Accordingly, we should notice a different cycle count for both executions.

 

In more advanced system there is a branch prediction, and both options are executed in parallel, until one of them turns out to be the "actual one", and the other is discarded.

 

In MAME, I went for the simple way, finishing the evaluation of the jump condition in zero time, and then 100% correctly "predicting" the next operation; and, of course, I'd like to fix that. (Also, it may help with the still unsolved video wait state issue.)

 

(I hope this is not too boring for the rest of you, but I think it shows some fascinating details of our machines that most are not aware of.)

timing2.dsk

Link to comment
Share on other sites

14 hours ago, mizapf said:

Looks good, 51 means 5.1 seconds. At least it is much lower than 19505.

 

I'll have to investigate a bit more in MAME. My real system seems to be in perfect health, that is also good news.

 

The sound chip in fact produces wait states by lowering READY; the low time is 32 cycles of its input clock, which is the output of the V9938 - it is actually driven by the video processor. The clock output of the 9938 is 1/6 of its XTAL frequency input, which is about 21 MHz. Hence, it should produce a READY low time of 27 CPU cycles (333ns), i.e. 27 wait states.

 

Is this the same sound chip as in the TI? 

 

The sound chip holds the CPU in the 4A because if it doesn't, the write cycles are actually too fast for it to be able to process the data. It's actually slower than the GROMs in that respect. So the hold is mandatory for the chip to work.

 

It should be easy to calculate what the real machine /should/ take, and then you can determine whether emulation has extra wait states or hardware has fewer. You run 380480 loops. Just in terms of sound chip holds only, then, let's calculate.

 

(Edit: sorry, I had to redo my math here.) Per your math above, that's 27 CPU cycles * 380480 = 10,272,960 cycles * (1/3,000,000) = 3.42 second.

 

Trying the math on the input clock (21/6?) gives us 32 * (1/3,500,000) * 380480 = 3.47 seconds.

 

This second more closely matches reality (remember that's hold time on the sound chip only, and none of the CPU cycles), so I clearly have a mistake in my math on the first line. I probably misread something there.

 

They actually both look pretty close when I stop being dumb on my units. ;) 

 

But that suggests that there is no magic involved in the hardware - reality seems to match the expected time on the sound chip holds alone. Maybe my confusion can help offer some insight into where the MAME code is confused. :)

 

Edited by Tursi
Link to comment
Share on other sites

Thank you for your calculations :)  ... but I start to believe this is a simple programming error in MAME (in the sn76496.cpp file). I noticed that there are long pauses where the READY line is low (20 ms). The point is that the MAME core requests a number of sound samples from audio components, and here, if that number equals the clock divider exactly, the loop is left without raising READY. That is clearly wrong.

  • Like 2
Link to comment
Share on other sites

For now, I decided to put aside the video write issue and continue with the GenMod. But there I also found a problem, or let's say, something to clarify. I do not have a Genmod; I hope we have some owner here, who is also able to run the attached program GMWAIT.

 

The point is that the Genmod intercepts the READY line from the Gate Array and feeds its own READY into the PAL. This is important, since the Gate Array produces a wait state for external accesses, and the Genmod promises by its TURBO switch to turn off these 1-ws accesses.

 

The daughter board is soldered to the back of the Gate Array, so we can assume it gets the same inputs, and it can watch its outputs. The Gate Array can be set to create an additional wait state for all accesses (using a CRU bit).

 

My problem is: How does the Genmod daughter board know what is the actual cause of the wait state? It cannot look inside the Gate Array, so it cannot see the mapper and the additional wait state flag. In particular, when accessing the Pbox, the Gate Array pulls down READY, so the Genmod may possibly ignore this, but if there is an additional wait state, it should probably also pull down READY. 

 

I could imagine a solution, but which looks a bit too complicated for two GALs on that daugther board. Before, it would be wiser to check on a real Genmod that these additional wait states actually occur. It could also be that with TIMODE set to off and TURBO set to on, no additional wait states are supported.

 

So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with

TURBO=0 / TIMODE=0;

TURBO=0 / TIMODE=1;

TURBO=1 / TIMODE=0;

TURBO=1 / TIMODE=1.

 

Each run should take only a few minutes. Thanks in advance.

 

 

timing3.dsk

  • Like 1
Link to comment
Share on other sites

13 hours ago, mizapf said:

For now, I decided to put aside the video write issue and continue with the GenMod. But there I also found a problem, or let's say, something to clarify. I do not have a Genmod; I hope we have some owner here, who is also able to run the attached program GMWAIT.

 

The point is that the Genmod intercepts the READY line from the Gate Array and feeds its own READY into the PAL. This is important, since the Gate Array produces a wait state for external accesses, and the Genmod promises by its TURBO switch to turn off these 1-ws accesses.

 

The daughter board is soldered to the back of the Gate Array, so we can assume it gets the same inputs, and it can watch its outputs. The Gate Array can be set to create an additional wait state for all accesses (using a CRU bit).

 

My problem is: How does the Genmod daughter board know what is the actual cause of the wait state? It cannot look inside the Gate Array, so it cannot see the mapper and the additional wait state flag. In particular, when accessing the Pbox, the Gate Array pulls down READY, so the Genmod may possibly ignore this, but if there is an additional wait state, it should probably also pull down READY. 

 

I could imagine a solution, but which looks a bit too complicated for two GALs on that daugther board. Before, it would be wiser to check on a real Genmod that these additional wait states actually occur. It could also be that with TIMODE set to off and TURBO set to on, no additional wait states are supported.

 

So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with

TURBO=0 / TIMODE=0;

TURBO=0 / TIMODE=1;

TURBO=1 / TIMODE=0;

TURBO=1 / TIMODE=1.

 

Each run should take only a few minutes. Thanks in advance.

 

 

timing3.dsk 180 kB · 1 download

Unfortunately, not able to help you out on this one.  While I have a GenMod Geneve, it has been acting a bit flakey when back in October the PEBox fan went out and the card got VERY VERY hot.  In addition, I do not have a working switch box.

 

Beery

Link to comment
Share on other sites

Hi Beery, not a big issue. Maybe someone else can help. Anyway, without the switch box, the results may not be helpful.

 

I'll go for a reasonable guess in the meantime.

 

If there is someone else who intends to do the test on his Genmod now or in the next days, I'd appreciate to get a short notice so that I know it's worth waiting.

Link to comment
Share on other sites

I no longer have a Genmod so I cannot run the test. Certainly they are a rarity; I wonder how many were sold and subsequently put into service.  I've only repaired/upgraded 2-3 such cards in 25+ years; not sure how many have come across Richard's bench.  (Most of the upgrades would have been performed with original regulators/caps still in place and would probably have needed a refresh by now)

Link to comment
Share on other sites

On 12/8/2019 at 7:14 AM, mizapf said:

Hi Beery, not a big issue. Maybe someone else can help. Anyway, without the switch box, the results may not be helpful.

 

I'll go for a reasonable guess in the meantime.

 

If there is someone else who intends to do the test on his Genmod now or in the next days, I'd appreciate to get a short notice so that I know it's worth waiting.

I can run this tonight on my genmod. 

 

-M@

  • Like 3
Link to comment
Share on other sites

In the meantime (thanks in advance, Matt), I collected some theoretical questions that may be answered without running a program.

 

1. If you fully populate the Memex (2M), can (must) you remove the on-board DRAM?

(No DRAM on the Geneve means no GPL mode anymore, of course, since the GROM handling requires the DRAM.)

 

I know that the Memex can be configured by switches to ignore some page areas. However, it cannot be configured to mask the 00-3F area for the board DRAM. When driven by 0 wait state, the Memex may be able to respond, but at the same time, the board DRAM may get garbled. Since the Genmod cannot turn off the DRAM on the board (no traces are cut for that purpose), how can it handle this issue? That is, in TURBO mode (otherwise, there is 1 WS).

 

2. Can (must) you remove the on-board SRAM at pages EC-EF when you use the full Memex?

 

3. Peripheral cards should decode AMA/B/C. However, I do not see any requirements to also decode AMD/E. Without this, the DSR page BA also appears by three more mirrors, i.e. as page 00-111-010 (3A), 01-111-010 (7A), 10-111-010 (BA), 11-111-010 (FA). That is, are these pages also blocked? (With the stock Geneve, this is not a problem, as only the BA page is in the external space, where the bus drivers are activated.) Running MEMTEST might reveal this.

 

4. Concerning pages F0-FF, which are the on-board boot ROM, a full 2M Memex could allow storing values at these addresses. However, if we have the EPROM still in place, this would mean a bus conflict when those values differ. The Memex has switches to mask these pages, but I cannot really imagine a use case (unless the Memex has an EPROM at that location).


The Memex seems to be invented before the Genmod. The question arises whether all this actually worked as expected.

Link to comment
Share on other sites

On 12/7/2019 at 3:54 AM, mizapf said:

...

So my request would be to run the GMWAIT program on a Genmod once for each combination of settings of the switch box, i.e. test with

TURBO=0 / TIMODE=0;

TURBO=0 / TIMODE=1;

TURBO=1 / TIMODE=0;

TURBO=1 / TIMODE=1.

timing3.dsk 180 kB · 1 download

My GMWAIT results for genmod w/2M memex

 

TURBO=0 / TIMODE=0

1, 60

2, 60

3, 100

4, 120

5, 110

6, 120

7, 110

8, 120

 

TURBO=0 / TIMODE=1

1, 60

2, 60

3, 99

4, 120

5, 110

6, 120

7, 110

8, 119

 

TURBO=1 / TIMODE=0

1, 60

2, 60,

3, 100

4, 120

5, 100

6, 100

7, 110

8, 120

 

TURBO=1 / TIMODE=1  

1, 60

2, 60

3, 100

4, 119

5, 110

6, 120

7, 110

8, 120

 

Link to comment
Share on other sites

Thanks, Matt. This is pretty interesting!

 

- Even though you have on-board DRAM, the pages 00-3F are used with 0 ws (turbo=1, timode=0). I really wonder how the DRAM feels like when it is treated with these 0-ws accesses. As I said, the traces to the DRAM are not cut (and cannot be cut), so it will react on accesses to 00-3f. As it seems, it must stay in sync with the Memex RAM ... somehow.

- The BA page access is not sped up (1 ws)

- As I guessed, with Turbo=1, TIMODE=0, the "additional wait state" feature is ineffective, at least for the 00-3F access. The even-numbered tests were using the additional wait state setting. As I said, the Genmod cannot at the same time suppress wait states and allow for additional wait states, as it cannot peek into the Gate array.

- The mirroring of BA is as expected.

 

I will provide one more (hopefully final) test to check the access to different page areas.

Link to comment
Share on other sites

Hello Matt,

 

would you please run the GMWAIT1 program from this image on your Genmod with all four switch combinations, as you did above? You may certainly abbreviate the result by writing like "values are all 110 (±1) except for pages xx,xx,xx..." (plus the switch combination). 110 means 1 wait state, 100 means 0 wait states.

 

Thanks!

timing4.dsk

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...