Jump to content

Photo

The BASIC sandbox is no more....


41 replies to this topic

#26 OLD CS1 OFFLINE  

OLD CS1

    >OLD CS1█

  • 6,092 posts
  • Technology Samurai
  • Location:Tallahassee, FL

Posted Thu Nov 14, 2013 9:10 AM

Meanwhile, I am still curious as to whether those who can run it on a real TI see any anomalies in the lower case characters.

 

I am hurriedly working on a new server today.  I will have a second look after work today.



#27 senior_falcon ONLINE  

senior_falcon

    Stargunner

  • Topic Starter
  • 1,485 posts
  • Location:Lansing, NY, USA

Posted Thu Nov 14, 2013 10:52 PM

Alright, I finally figured out what was going on.  One section of the demo program loads a table of lower case character definitions.  There are no utilities such as VMBW, VSBW, etc. to help you out, so you have to do it the long way:

LI R3,>4608 address in VDP of "a" plus >4000 for write
SWPB R3
MOVB R3,*R15   (R15 has >8C02)
SWPB R3
MOVB R3,*R15

I had forgotten to include the >4000 to indicate that it was to be a write operation.  Oddly enough, Classic99 and probably real iron can handle this mistake just fine, but for some reason the bytes are shifted by one byte.  Win994a compensated for my mistake and wrote the bytes the way I intended.  Now the demo program works the same in either emulator.  I have attached the revised version.  You only need this if the original did not work properly for you.

Attached Files



#28 OLD CS1 OFFLINE  

OLD CS1

    >OLD CS1█

  • 6,092 posts
  • Technology Samurai
  • Location:Tallahassee, FL

Posted Thu Nov 14, 2013 11:40 PM

Tursi, are your consoles F18A-enhanced?  The program failed on your iron but not on mine.  I wonder if that could be the difference?



#29 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 12:22 AM

ooooh.. interesting. I did /not/ test on my F18A console!!

In the case of forgetting to include the >4000 bit when setting a write address, the behavior of the system is well defined, but not as well understood. I spent a long time on this (and actually made use of it in some programs).

To understand what happens, it is best to change your thinking of that bit. TI called it a "write mode" bit. On the 9918, it is actually a prefetch inhibit bit, and has nothing (specifically) to do with write mode.

The way that the 9918A works is that it must manage access to the video memory in a predictable way, because it is continuously fetching data to display on the screen. However, a request from the CPU can arrive at any time. In order that the VDP can have data ready for the CPU no matter what else it may be doing, it has a single byte register that contains data to send to the CPU, which we can call the prefetch buffer.

At powerup, the contents of this buffer are undefined, any random number can be in it.

When you write a normal address, as soon as you finish writing that address the VDP uses the next free moment to go get that byte of data from the video RAM (fetching it before you ask for it, therefore, 'prefetch'). The VDP address is then internally incremented, ready for the next byte. When the CPU reads from the VDP, it gets the value in that register, and the currently pointed-to address is fetched into the register (with the address then incremented again). This allows for a relatively fast data transfer without the CPU needing to synchronize with an access window to VDP memory.

When the CPU /writes/ to video memory, it goes to the address currently in the VDP address counter.

When you set the "write" bit, it actually inhibits this prefetch operation. This means that when you finish writing the address, the VDP does NOT fetch data and does NOT increment the address counter. This makes sense -- you usually want to write to the address that you asked for. If the prefetch happened, the address counter would be incremented before your write, and would go to an address one higher than you expected.

One trick I have done to save program space, is to set a normal read address one less than I want, and then write. This works just fine on hardware and any emulator that handles the prefetch correctly. (I think I made Matthew make it work right on the F18A too ;) ). It probably does NOT work on chips like the 9938 or 9958, which have separate read and write registers. (I can't recall if they both do, or just one of them).

I'm not sure if this explains the failures I saw... nor why it behaves differently on the F18A. I still suspect timing may be at fault, Harry and I chatted about that. (The F18A has no timing restrictions). I'll give this one a quick try on all three consoles. ('quick', he says...)

#30 OLD CS1 OFFLINE  

OLD CS1

    >OLD CS1█

  • 6,092 posts
  • Technology Samurai
  • Location:Tallahassee, FL

Posted Fri Nov 15, 2013 12:48 AM

One trick I have done to save program space, is to set a normal read address one less than I want, and then write.

 

You have my attention: can you show how this saves space?



#31 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 3:11 AM

First! I finally got the darn thing to load. Last time I loaded by holding the wires to a serial cable, tonight I actually built a monitor->cassette cable, and had all kinds of issues with it (for such a simple task... ;) )

So I can confirm that on both of my non-F18A consoles, the app crashes, and on my F18A console, it runs fine!

I added some code to Classic99 to check for the one case that we /know/ can cause an overrun, in hopes that's it. Our best efforts at calculating what conditions might overrun calculated that the most likely condition was to read immediately after setting the VDP address. It definitely comes up with the warning!

   8384  D7EB  movb @>0001(R11),*R15       (34)
         0001
   8388  C7DB  mov  *R11,*R15              (30)
   838A  D2DE  movb *R14,R11               (22)
This is the offending code (although I'm not sure what bank it's in ;) ) - R15 contains 8C02 and R14 contains 8800. Our timing experiments back in the day say that this sequence is very likely to be too fast for the VDP (also that it should be the only common one that is...!)

If there is enough memory to fix that up, we should be good to go!

#32 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 3:23 AM

I notice the code continues like this:
 
8388 C7DB mov *R11,*R15
838A D2DE movb *R14,R11
838C 098B srl R11,8
838E 020C li R12,>8300
If this is a fixed block, and you can move the LI R12,>8300 up after the MOV *R11,*R15, it will take the same number of bytes and provide the delay needed before you read VDP...? I watched that memory in the debugger, and it seemed to be!
 
8388 C7DB mov *R11,*R15
838A 020C li R12,>8300
838C 8300
838E D2DE movb *R14,R11
8390 098B srl R11,8
I tried a hex edit and only saw that in one place, so I patched it up and tried it. This removed the warning in Classic99, so I tried it on hardware...

Success! On all three systems (F18A, version 1, and version 2.2)!

Edited by Tursi, Fri Nov 15, 2013 3:45 AM.


#33 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 3:25 AM

My patched version of the demo, as above:

Attached File  DEMOTEST2.zip   1.75KB   19 downloads

Attached File  20131115_012120.jpg   941.41KB   1 downloads

#34 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 3:44 AM

You have my attention: can you show how this saves space?


It depends on your program structure, of course -- there are probably only a few special cases where this helps. In the cases I've done it, I've been in very tight memory space, and using a register to track both VDP reads and writes. To add the 'write' bit to an address in a register, you would normally use SOCB or ORI - ORI will always take 2 words, SOCB can take just one word if you have a spare register. In cases where I have not had a spare register or a spare word of program memory (such as programming in scratchpad), you can use DEC to decrease the address by 1, set as a read address, and the VDP will auto-increment before you write, so the data goes to the right place. (This is only useful if you do not have to INC the register back up again! Often in cases like this, the code adds a value to the register later, such as AI R0,>0100 - in this case I would just take that old DEC into account and AI R0,>0101 ;) If you never need the /real/ address, well, you can just keep it with the >4000 bit set, but if for some reason you still can't, you can always just keep the address minus one...)

It can work the other way, too - if you have a write address but want to read, set the write address and do a dummy read first. The dummy read will return garbage, but trigger the prefetch and autoincrement. There are probably even fewer cases that this saves you memory, but I've used it. ;)

The above tricks, and they ARE tricks, should really be used sparingly. They break compatibility with the upwards chips, and maybe there are not a lot of them in use anymore, but that's no reason to go for it deliberately. ;)

That's worth noting, too... if you need to skip a byte of memory when reading from VDP... just read it. It's faster than setting a new address, by a lot. (One read versus two writes, each with a read-before-write - just in wait states the one read is 4, and the address write is 16! To that end, it's probably faster to just read to skip up to 3 bytes!)

The last trick I use for VDP access is more common, and doesn't break any rules - if you're going to load data to set an address or a register (as opposed to calculating it), you might as well load it in the format you need and save the instructions. For instance, it's common to do something like this:
 
 LI R0,>0300
 BL @VDPRAD
...
 LI R0,>0300
 BL @VDPWAD
...
 LI R0,>01E0
 BL @VDPWTR
...
* VDP Set Read Address in R0
VDPRAD
 SWPB R0
 MOVB R0,@VDPWA
 SWPB R0
 MOVB R0,@VDPWA
 RT
* VDP set Write address in R0
VDPWAD
 ORI R0,>4000
 SWPB R0
 MOVB R0,@VDPWA
 SWPB R0
 MOVB R0,@VDPWA
 ANDI R0,>3FFF
 RT
* VDP set register in R0 (MSB=index, LSB=data)
VDPWTR
 ORI R0,>8000
 SWPB R0
 MOVB R0,@VDPWA
 SWPB R0
 MOVB R0,@VDPWA
 ANDI R0,>3FFF
 RT
These are all similar and pretty much the same code. If you can afford to throw away the register afterwards, you can just do this:
 
 LI R0,>0003
 BL @VDPADR
...
 LI R0,>0043
 BL @VDPADR
...
 LI R0,>E081
 BL @VDPADR
...
* VDP write R0 (byte-swapped) to VDPWA
VDPRAD
 MOVB R0,@VDPWA
 SWPB R0
 MOVB R0,@VDPWA
 RT
Just pre-byte-swap and set the bits in the immediate load - much less code. More tracking on your side, though. :)

Anyway, that's way off topic, sorry for derailing!

Edited by Tursi, Fri Nov 15, 2013 3:47 AM.


#35 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 4,225 posts
  • Location:Denmark

Posted Fri Nov 15, 2013 4:06 AM

In the case of forgetting to include the >4000 bit when setting a write address, the behavior of the system is well defined, but not as well understood. I spent a long time on this (and actually made use of it in some programs).

To understand what happens, it is best to change your thinking of that bit. TI called it a "write mode" bit. On the 9918, it is actually a prefetch inhibit bit, and has nothing (specifically) to do with write mode.


Ah yes. So after setting the address, you could interleave practically any number read and writes (without setting address) at some clever and useful level ?

 

:) 
 



#36 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 4:24 AM

Ah yes. So after setting the address, you could interleave practically any number read and writes (without setting address) at some clever and useful level ?


Should work! You just have to remember that a write will increment the address, but will NOT prefetch the data, so the next read will get garbage but will increment the address again. Not sure where that might be useful, but it will work.

edit: I wrote a summary years ago here. It has gotten very hard to search Yahoo groups ;) http://groups.yahoo....s/messages/1442

Edited by Tursi, Fri Nov 15, 2013 4:31 AM.


#37 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 4:38 AM

As an aside, I retested the patched program on the CF7, and that worked correctly too. :)

#38 sometimes99er OFFLINE  

sometimes99er

    River Patroller

  • 4,225 posts
  • Location:Denmark

Posted Fri Nov 15, 2013 4:45 AM

Thanks for a quick reply. And thanks for reminding me of how to use it. I might have a few ideas, but frankly, all of them could easily be solved in the old fashioned ways without any practical impact.

It's interesting how some old and new software sometimes fail in emulation and/or on old or newer hardware. At least we seem to learn something.

 

:) 
 



#39 senior_falcon ONLINE  

senior_falcon

    Stargunner

  • Topic Starter
  • 1,485 posts
  • Location:Lansing, NY, USA

Posted Fri Nov 15, 2013 10:26 AM

Tursi, thanks for the information and the testing.  I will test my method of handling subroutines later today and this change will be included.  This brings up yet another puzzling question.  The offending code that reads too soon after writing to >8C02 is part of the page loader which is used every time a page of code is moved from VDP into the scratchpad.  You were able to load a number of pages using that code before it crashed.  I can understand that with differences in consoles one console might be OK with that code and another might not be.  But why would a console be able to run the same code sometimes and not at other times?  It must be right on the edge between working and not working.



#40 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 5:34 PM

Tursi, thanks for the information and the testing.  I will test my method of handling subroutines later today and this change will be included.  This brings up yet another puzzling question.  The offending code that reads too soon after writing to >8C02 is part of the page loader which is used every time a page of code is moved from VDP into the scratchpad.  You were able to load a number of pages using that code before it crashed.  I can understand that with differences in consoles one console might be OK with that code and another might not be.  But why would a console be able to run the same code sometimes and not at other times?  It must be right on the edge between working and not working.

 

It more than likely is. I can't remember the exact timing numbers we came up with.

 

Basically, the VDP operates with the concept of memory access "slots" - specific times during the generation of the picture during which it can access the VRAM. Depending on the current screen mode, certain of these slots are available for external CPU memory access. In most modes, these access slots for the CPU occur every 8uS.

 

Now, the access slots happen at predictable times, but there is no external synchronization, so the CPU access essentially happens at random. You don't have to wait 8uS between every access from the CPU, you just need a CPU memory access slot to have occurred between them. The only way to /guarantee/ this is to wait 8uS between accesses, but you can get lucky -- and you can get lucky more often than you might expect! Since the instruction timing is probably very close to 8uS already, the program probably got lucky for a few passes.

 

It's worth noting that the failure was not at any reliable spot - I tried three times and it failed at different points - one of the boots didn't get anywhere at all.



#41 Tursi ONLINE  

Tursi

    Quadrunner

  • 5,636 posts
  • HarmlessLion
  • Location:BUR

Posted Fri Nov 15, 2013 5:42 PM

Just to extend on the above - during vertical blank (and if the screen is disabled with the disable bit), /all/ memory access slots are CPU memory access slots, so at those times there are no timing restrictions at all. So how often that code works really comes down to how lucky you are. If you synchronize with the vertical blank, then you have a fixed amount of time during which you can run at full speed. But in our system, we believe that this case (set address then read) is the only case where you can actually overrun the VDP on a stock system. This is mostly because of the extra wait states added by the multiplexer and the fact that it triggers for the read-before-write pattern. The access patterns were determined using a logic analyzer and cycle counts from the 9900 datasheet used to do the math.



#42 OLD CS1 OFFLINE  

OLD CS1

    >OLD CS1█

  • 6,092 posts
  • Technology Samurai
  • Location:Tallahassee, FL

Posted Wed Nov 27, 2013 10:44 PM

Since I demonstrated before in this thread and I really did not have any other place to put it (or justify a new thread for one picture which has nothing to do with programming.)

 

This is my contraption for connecting my phone to the TI cassette port.  I have the splitter/combining cable there as I found that the output of both channels was necessary for the TI to "hear" the output of the phone.  It works perfectly without having to rig stuff up as I had done before.

Attached Files






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users