Jump to content
IGNORED

kenjennings' Blog - Part 10 of 11 -- Simple Assembly for Atari BASIC


RSS Bot

Recommended Posts

Binary File I/O (Part 1 of 2)


Sidebar: This section turned out to be considerably more difficult and time consuming to write than anticipated. No two sources agree completely on this subject. The only consistency I found is the list of CIO and XIO command numbers. Everything else encountered documents this subject with varying amounts of accuracy. Descriptions of the CIO and XIO commands sometimes differ in the just names, but in ways that imply different expectations for results. Detailed descriptions of the commands vary from the strangely incomplete to being outright wrong. One guide for machine language on this subject described CIO features using the BASIC XIO limitations. Another tutorial declared that only 155 bytes could be read in one operation. In the end it took writing test programs in BASIC and Assembly to understand precisely how XIO commands work vs how the corresponding CIO commands actually do work.

If anyone cares, the stuff that worked is derived from reading De Re Atari (50%, which was mostly correct), Compute!'s Mapping the Atari (20% and it has a couple mistakes), Atari's BASIC Reference Manual (15% which was painful and incomplete), and rest from several 6502 programming manuals that were altogether horriffic.


Stating that programs work with data would be borderline silly. Everything about programs is about working with data – they calculate data, manipulate data, and copy data. A fundamental concern of programming is how to introduce data to the program and get it back out. Many programs have their data built into them or receive data by computing values, reading from storage, or by other input/output device or communications. A file contains data. A serial port sends and receives data. A joystick provides data. Numbers are data, text is data, graphic information is data. Data, Data, Data.


Atari BASIC programs have access to several data acquisition methods. Data may be stored in a program using DATA statements. Data may be read from a file, or from user input. Although all data is ultimately represented as bytes in memory, BASIC languages provide a higher abstraction where the data it accepts is usually expressed as text and numbers, and even in the case of number values the input and output data is expressed as the text equivalent of the number value. This means data presented to BASIC is typically not in its final, usable form for the computer. The BASIC language reads the text data then converts it into a format the computer understands.


Although an Atari BASIC program can contain its own data, it cannot have the data built into it in a form that is immediately usable. For instance, variables and array elements must be specifically initialized. The program must assign the values as it runs. There is not even a default behavior to clear numeric array values to zero. Data contained in DATA statements is not immediately usable by the Atari's unique features. The Atari's custom hardware features often use blocks of binary data – Graphics data, character set data, Player/Missile images, etc. The Atari BASIC DATA statement cannot present arbitrary binary data. It can only present decimal numbers and strings with limitations on the characters that can be included in the string.

Like most BASIC languages, Atari BASIC has little provision for dealing with data in the computer's terms other than PEEK and POKE. Most Atari BASIC programs creating data in memory for Atari graphics features will read DATA statements – the text equivalent of number values – and then POKE those values as bytes into memory. This consumes a lot of storage space in the BASIC program. The byte value 192 in a DATA statement is presented as the text characters “1”, “9”, and “2” and then if there is another value the comma separator between the values also occupies memory. This one value to POKE into one byte of memory requires four supporting bytes in DATA. And then after the program stores the value in memory the DATA statement continues to occupy memory, wasting space. Wasted space means reduced space for code and reduced features in large programs.

In addition to DATA's memory space issue the other problem with reading DATA statements is BASIC's slow execution time. BASIC must loop for every data value; reading the data, storing it into memory, and then reading more data. Any significant amount of data provides a long, boring wait for the user. Many BASIC games share this “feature” of making the user wait for a long program initialization. The second test program for the BITS utilities illustrates this problem. The time to store several machine language utilities totaling only about 300 bytes in memory was long enough that the program's loading section was modified to output helpful progress messages to assure the user the program had not crashed. Now consider that one complete character set is 1,024 bytes, and a complicated program may need to setup several thousand bytes of data.

Assembly language and some compiled languages do not have these same issues with data storage space. These languages can declare where data will occupy memory and define the initial values. This is saved with the assembled/compiled machine language program, so the same mechanism that loads the machine language program into memory also establishes the organized, initialized data in memory.

So, what to do about BASIC's problems? Eliminating the space-wasting behavior means eliminating DATA statements filled with numbers, or somehow optimizing the contents. Strings could be used to represent a lot of data. One character of a string is conveniently one byte of information. But strings still have a few problems:
  • ATASCII characters are not always the same as binary byte values, so some translation is needed.
  • Representing cursor control characters can be problematic when a program is LIST'ed and then ENTER'ed.
  • There are two characters that cannot be expressed at all in a quoted string – the ATASCII End of Line character and the quote character itself.

If the data includes either of these unrepresentable characters then the string as DATA must replace them with a different value that can be included, and then after assigning the string it must go back and insert the problem characters into their correct positions in the string. This means adding more data to identify the locations of these special characters. Furthermore, the DATA statements filled with strings still occupy memory after the data is assigned to a string variable, though the waste factor for string characters is closer to 1:1 rather than roughly 4:1 for numeric (byte) data.

If a BASIC program completely eliminates the DATA statements then where else can a program get its data? Here is an idea – This program loading values into memory from DATA:
   
has the same end result as this program loading values into memory from a file:
 
The difference is that the file reading method leaves behind no redundant data occupying valuable code and memory space (aside from this code to load the data.) Whether reading eight bytes or 800 bytes the amount of code to read from a file is the same.

So, how does the data get into the file? A little one-time work is required to write the data into the file. Here is the same original code, but instead of POKE'ing the data into memory it writes the data out to the file. Then, the original program can be changed to read the data from the file and POKE that into memory, and so eliminate the DATA statements.
  
There is a question that is not obvious at this point, but will be wedged in here now, because the answer determines the way the code should write data to and read data from the file, and ultimately the direction of an assembly language utility. The question is, “What data is in the file?”

Most of the time I work with the Atari800 or Atari++ emulators in Linux to extend the lifespan of the real 8-bit hardware, so here I will detour into Linux for the benefit of tools that show exactly what is in the data file. First of all, the file itself containing the data written for the 8 bytes:
  
This file is intended to contain data for 8 bytes, so then why does the directory listing report the file contains 28 bytes? A hex dump utility shows the file contains the following information:
   
The right side of the hex dump provides the explanation. The file contains the text representation (ASCII/ATASCII) of the numeric values. The byte $9B is the ATASCII CHR$(155), the Atari's End Of Line character, which appears after each value.

Recall that BASIC prefers to read and write data using text representation of values. The example program uses PRINT (or ?) and INPUT which quietly translate between an internal representation and the text representation. This is BASIC's data abstraction versus the programmer's desire to have BASIC do what is meant, not what is written. The program's (poorly expressed) intent is to store and read bytes of memory. However, BASIC treats the numbers as bytes only within the context of PEEK and POKE instructions.

So, then how to get data into the file that is just discrete byte values? Single characters are bytes, so using the CHR$() function will output numeric values as single characters (aka bytes), so that's potentially workable for output. But, then how about reading the bytes? A program using INPUT will still read the file contents as a string which means it will try to read bytes until it reaches an End of Line ($9B) character. So, this is also not a workable solution.

Atari BASIC provides a method to write and read a file one byte at a time with the commands PUT and GET. The same example program using PUT instead of PRINT (?):
  
Results in this file only 8 bytes long:
 
that contains the eight values as individual bytes (Hex dump):
  
This program uses GET instead of INPUT to retrieve the data to store in memory:
  
So, now we know how using data files can save valuable memory space in BASIC, and how to optimize the file content to the actual bytes as they would be stored in memory. However, a fundamental problem with BASIC remains – the program must read the file data byte by byte during a loop, and BASIC's slow execution speed will turn any significant amount of data into a long and inconvenient wait. In fact, doing device I/O byte by byte in BASIC is slower than reading from DATA statements in memory, so this memory saving solution penalizes the program with even slower data loading. If only there was some kind of machine language solution that could read the bytes from a file as fast as possible. What to do, Toto?!?

OSS's BASIC XL provides a precedent with the Bput and Bget commands that write and read arbitrary length blocks of memory directly to and from a file as fast as the device can transfer bytes which is far faster than BASIC's ability to loop for individual bytes. How does BASIC XL manage this? Is it simply reading individual characters in a loop at machine language speed? Or is it doing another trick?

It turns out that bulk input and output of bytes is a feature provided by the Atari OS's Centralized I/O (CIO) routines, but the problem is that Atari BASIC does not support all the commands that CIO provides.



Gaming Atari's CIO (or not)


Many systems of the 8-bit era require the programmer use unique calls to read and write information to each kind of device. Each act of reading a disk file, a cassette file, and user input typed from a keyboard may require calling different entry points in the OS. Likewise, writing to a disk file, a cassette, the screen, or a printer may also require calling different OS routines. Even using two disk drives could require different OS calls. Input/Output programming on these systems can be tedious affairs of device-specific coding for every possible input/output option which deters programmers from even providing choices to users.

But the Atari system is different. The Atari includes a standard, modular, reusable input/output model. Simply put, the programmer fills out a common data structure describing the input/output operation and makes a call to one address in the OS. This is a high level abstraction for device input/output. There are no sectors to consider, no serial communications to worry about, no fixed buffers in the system. Everything is a stream of data in or out, read or written on demand. In a very general way this is similar to the unix world's “everything-is-a-file” philosophy. Changing disk file output to screen or printer output requires only a different device/file name. The setup and the call to the OS are the same for all. Considering the Atari's Central I/O (CIO) was written in the late 70s this is nearly magical, and very under-appreciated behavior in an 8-bit computer.



Atari CIO

The Atari CIO defines a basic set of commands that every handler must be prepared to accept. (Listed in the chart below.) This doesn't necessarily mean every device must do every I/O command. A keyboard cannot do output, and a printer cannot do input. However, the device handler is responsible for sanely managing commands and safely replying that an incompatible command request is not implemented.
 
Command numbers above this range are handler-specific operations. For example, commands 17 and 18 are specific to the screen device (“S:”) to perform graphics DRAWTO and Fill, respectively. Commands from 32 to 38 (and higher) perform various DOS file management functions for the “D:” device.

Per the list above, everything provided by the base CIO commands appear to correspond to a BASIC language I/O command. Well, almost, but not quite – “Similar” is not the same as “Same”. There is a bit of a disconnect between how BASIC uses these CIO actions, and what the CIO actions can really accomplish.

Atari BASIC does not actually use the 5/Get Text Record and 9/Put Text Record. These commands read and write a stream of text characters ending with the Atari End Of Line character which is the Atari OS/CIO's general definition of “Text Record”. Atari BASIC's PRINT and INPUT behaviors are more complicated than the “Text Record” model, because BASIC I/O activity occurs incrementally rather than in a complete record. INPUT can handle multiple variables in one line. PRINT outputs variables and strings as BASIC interprets the values. PRINT also uses semicolons to suppress the end of line, and commas cause tabs/columnar output which are not inherent abilities in the CIO 9/Put Text Record command. So, BASIC is not using the Text Record I/O commands, managing the I/O by other means.

Additionally, notice that the titles for command 7/Get Characters and command 11/Put Characters do not exactly match how BASIC uses those commands. Both commands move bytes – stress the plural: b-y-t-e-S. However, Atari BASIC uses these CIO commands in the most minimal manner to serve as PUT and GET which move only one byte. Since we're looking for a way to read or write an arbitrary number of bytes (plural) these two CIO commands appear to be exactly what we need.

The CIO command list above comes from Atari's “BASIC REFERENCE MANUAL” discussion of BASIC's XIO command. This suggests that XIO is capable of exercising these commands. If this is true, then the XIO command could be used to run these CIO operations as CIO intended. That's the theory.



The IOCB

First, let's learn how CIO operations work. The next section will examine how the XIO command relates to the CIO operations. Earlier it was stated that the programmer fills out a common data structure describing the input/output operation. This data structure is called the Input/Output Control Block (or IOCB). Each of the eight input/output channels has a 16-byte, IOCB dedicated to it at fixed locations in memory starting at $340/832dec for channel 0, $350 for channel 1, and so on up to $3B0 for channel 7. So, it is easy to find the IOCB for a channel. Multiply the channel number by 16 and add that to $340.

The IOCB tracks the state of I/O operations to the channel. Though the IOCB is 16 bytes long the programmer need only interact with a few of the bytes. Some of the bytes are maintained by the OS, and some are not typically used at all except for special device commands. The values are referred to by their offset from the start of the IOCB:



ICCMD: IOCB + $2 This is the CIO command discussed above.





ICSTA: IOCB + $3 This is the last error/status of the previously completed CIO command.


ICBAL/ICBAH: IOCB + $4 and + $5 This is the 16-bit address (low byte and high byte) of the input or output buffer here.


 

 

  • In the case of 3/Open and CIO commands for DOS operations on disk files this buffer is the address of the string for the device/file name.

ICBLL/ICBLH: IOCB + $8 and + $9 This is the 16-bit length (low byte, high byte) of the data here.


 

 

  • In the case of read and write operations (5, 7, 9, or 11) this is the number of bytes to read in or write out from the buffer. CIO will update this value indicating the number of bytes actually transferred.

  • In the case of 3/Open and commands for DOS operations on disk files this is the length of the device/file name.

ICAX1: IOCB + $A Auxiliary byte for the handler. This commonly indicates the mode for the 3/Open command. CIO will maintain this value here.


For files combine these values:
  • 8 - write
  • 4 - read
  • 1 - append


For the disk/DOS specifically:

  • 2 - open the directory.


For the “S:” device additional values direct how the OS opens graphics mode displays:

  • 16 - Create the text window.

  • 32 - Do not clear screen memory.


Finally, For the “E:” handler the value 1 added to the open state (12 + 1 = 13) enables forced read mode from the screen as if the Return key were held down.

ICAX2: IOCB + $B Auxiliary byte for the handler.

For the “S:” handler this specifies the graphics mode to open.

For the “C:” device value 0 is normal inter-record gaps, and 128 is the faster, short inter-record gaps. Other serial devices may use specific values here.


The programmer need not be concerned with much more in most situations. The other bytes in the IOCB are either the responsibility of CIO, or only used for specific device functions. For the sake of reference:

ICHID: IOCB + $0 Set by the OS to the index into the handler table when the channel is currently open. If the channel is closed the default value is 255/$FF.

ICDNO: IOCB + $1 Set by the OS to the device number when multiple devices of the same kind are in use - e.g. 1 for “D1:”, 2 for “D2:”.

ICPTL/ICPTH: IOCB + $6 and + $7 CIO populates this with the jump address for the handler's single-character PUT routine. Atari BASIC uses this as a shortcut to output characters.

ICAX3, ICAX4, ICAX5, ICAX6: IOCB + $C through + $F Optional auxiliary bytes for the handler. NOTE and POINT DOS operations use these locations.



Atari BASIC's XIO

BASIC's XIO command provides a generic interface to access CIO functions that BASIC does not directly implement with a built-in command. The diagram below shows how the XIO values relate to the CIO's IOCB fields:
   
The XIO arguments conveniently match the list of CIO's IOCB values that a programmer should manage. This should mean the XIO command can substitute for any other I/O command.

An important distinction concerning XIO behavior vs the usual CIO expectations is that the 3/Open command uses ICAX1 and ICAX2 values where most other CIO commands don't need to specify those values. Depending on the device and the CIO command the original values set by the 3/Open could be significant and should not be disturbed. However, XIO accepts ICAX1 and ICAX2 values on every command which potentially could allow a later command to overwrite the values set by 3/Open. If ICAX1 and ICAX2 are important for later commands then for safety's sake they should be remembered and restated on subsequent use of XIO.

A variety of documentation describing XIO share a common mistake by describing the last argument as always a filespec (the device identifier, and filename if applicable). In reality, the last argument simply provides the data buffer and length for the CIO command. This buffer has variable purpose. It is expected to contain the filespec for 3/Open and DOS-specific file manipulation commands. For most other commands this argument provides the input or output buffer for data. Atari BASIC can use an explicit string or a DIMensioned string for the buffer argument to XIO. The contents of this string does not even matter for some commands – for example, the 18/Fill and 17/Drawto CIO commands for the “S:” device do not use the buffer, so they do not care what value is passed.

Below is an example of XIO performing the 18/Fill command for the non-existent device “Q:”. This works, because the real device identifier, “S:”, is only needed when the screen device is opened and the 18/Fill command does not use the buffer.
   


XIO and CIO 11/Put Bytes

Assuming the XIO command works as advertized it should give access to the CIO commands 7/Get Bytes and 11/Put Bytes to manage a stream of bytes of arbitrary length. Below is a program using XIO to write a binary file using the 11/Put Bytes command. For the sake of demonstrating the global nature of XIO the other BASIC commands to OPEN and CLOSE the file are also replaced by XIO performing CIO commands 3/Open and 12/Close:
   
Hey! That seemed to work! So, lets take a look at the MEMORY2.BIN file...
  
What on earth?!? The string is 8 bytes, but the file contains 255 bytes? How weird. Let's see what hexdump in linux says about the contents...
  
The 8 bytes that the program intended to write are there at the beginning of the file where they should be, so that's one good thing. But, the rest of the file appears to be junk out to the length of 255 bytes. It looks like the write started with the designated data string and then kept on going through whatever memory followed the string. Make the following additions to test this theory...
   
After running the program again and putting the file through hexdump:
  
So, that confirms it. The eight characters from D$ string were output and then followed by whatever happened to be in memory after that string to a total length of 255 bytes. This would suggest a simple bug with the way Atari BASIC passed the output length to CIO, but there's something else in the output that tells us the problem is worse than that. The byte that follows the first eight bytes is $9b, the Atari End of Line character and which occupies the place in the output where the first character of E$ should appear.

That End of Line character is not part of the real data of either of the strings. This must mean that Atari BASIC intentionally output the End of Line after the D$ content as if it were performing PRINT. The odd part is that Atari BASIC accounted for this End of Line by skipping the next actual byte in memory which is the first character of E$. Then Atari BASIC figured out how many bytes remained until it reached 255 bytes and directed output of those bytes from memory. In the Olympics of buggy behavior this is a Track and Field gold medalist. What could have possessed them to do this?

Perhaps this could be salvageable. The good part is the intended bytes were output intact at the beginning of the file. If the behavior is simply that Atari BASIC pads additional content, then this is a usable and (barely) tolerable situation for binary file output.

So, let's see what happens with more data – such as half a kilobyte simulating an Antic mode 6 or 7 character set:
   
And after running the program the actual file length is... disappointment again...
   
One point of good news is the hexdump utility shows the file does contain only the real bytes of the intended string up to 255 characters and that the last byte in the file is not substituted with the End Of Line:
    
That nails the coffin shut. Atari BASIC's use of the CIO 11/Put Bytes command is almost completely broken. For some reason Atari BASIC believes the 11/Put Bytes must output a 255 byte, fixed-length block and Atari BASIC doctors the I/O activity to meet this expectation.

Next we will look at CIO's 7/Get Bytes command and see if Atari BASIC's support is also broken for this command.



XIO and CIO 7/Get Bytes

First, let's verify that 7/Get Bytes can at least read the first 8 bytes from the 255 byte binary file:
$(=" "15 ICAX1=4:ICAX2=019 REM OPEN20 XIO 3,#1,ICAX1,ICAX2,"H1:MEMORY2.BIN"24 REM GET BYTES25 XIO 7,#1,ICAX1,ICAX2,D$29 REM CLOSE30 XIO 12,#1,ICAX1,ICAX2,"H:"35 FOR I=1 TO 840 ? ASC(D$(I,I)),45 NEXT I:?

When it runs BASIC reports:
   
It looks like it is reading the data into the string correctly. But, is there anything else going on? Remember Atari BASIC thinks the write has to output 255 bytes. Maybe it is also doing something strange during the read. Lets see if it is reading more than directed:
$(=" "11 DIM E$(255):E$="!":E$(255)="!":E$(2)=E$15 ICAX1=4:ICAX2=019 REM OPEN20 XIO 3,#1,ICAX1,ICAX2,"H1:MEMORY2.BIN"24 REM GET BYTES25 XIO 7,#1,ICAX1,ICAX2,D$29 REM CLOSE30 XIO 12,#1,ICAX1,ICAX2,"H:"35 FOR I=1 TO 840 ? ASC(D$(I,I)),45 NEXT I:?50 ? E$

This is basically the same program with the addition of a large E$ declared and populated entirely with exclamation points, "!". If Atari BASIC reads only the 8 bytes it needs to populate D$ then E$ should be undisturbed and the program will end by displaying 255 uninterrupted exclamation points. But, this is the actual result:



Unfortunately, this did not do what is expected or needed. BASIC overwrote E$ with the remaining data from the file. (An interesting bit of trivia is that the first character of E$ is preserved. This character corresponds to the End Of Line that BASIC had inserted in the output.) So, now we know that the CIO 7/Get Bytes command works well enough from Atari BASIC to retrieve short sequences of binary data as long as a buffer of 255 bytes is provided.

Now to figure out if it can load longer data. Given the 255 character limit when using 11/Put Bytes the next program will test that limit for reading. First, we need a file that has more than 255 bytes. Recall that the earlier attempt to write 512 bytes actually failed, so here is a (much slower) program to make the 512 byte file as intended:
$="0123456789ABCDEF"15 OPEN #1,8,0,"H1:MEM512.BIN"20 FOR I=1 TO 3225 FOR J=1 TO 1630 D=ASC(D$(J))35 PUT #1,D40 NEXT J45 NEXT I50 CLOSE #1

Next is the program to test if Atari BASIC has a 255 byte limit for the 7/Get Bytes command similar to the way it limits 11/Put Bytes:
 
D$ is declared 512 bytes long and filled with exclamation points. Then it attempts to read the 512 bytes from the file into D$. If this works properly the entire contents of D$ should show the "0" to "F" pattern read from the file all the way to the end. But this is what happens, instead:



No joy for Atari BASIC's XIO. The result shows the first 255 characters of D$ are populated with the values from the file and the remainder of D$ is the exclamation points set during the program initialization.

But, the horror is not over. Recall the MEMORYT0.BIN file created earlier that contains only 8 bytes? This program attempts to read just those 8 bytes from the file:
 
And this is what happens when the program runs:
   
The program gets an End Of File error during the XIO performing the 7/Get Bytes request. BASIC expects to read 255 bytes from the file no matter the size of the string buffer supplied, even if the buffer is defined shorter.

So, the bottom line is that Atari BASIC has mangled use of the CIO 7/Get Bytes command similar to the 11/Put Bytes command. Here's a summary of what we have left of these CIO command as Atari BASIC has abused them:
  • XIO for 7/Get Bytes is usable for reading binary data from 1 to 255 bytes. If the intended data is shorter than actual file size (of 255 bytes) then Atari BASIC will continue retrieving data from the file and storing it into memory up to that 255 byte maximum length. In order to complete this activity the file must have at least 255 bytes, and in the interest of safety the string buffer accepting the data should be 255 bytes.
  • XIO for 11/Put Bytes is usable to write 1 to 255 bytes to a file with the understanding that anything shorter than 255 bytes will be padded to the length of 255 bytes with junk.

In addition, Atari BASIC's capability is further restricted, because it only performs input and output to a a string variable. Saving and loading data to any arbitrary memory location is problematic. This could be circumvented with more involved coding to manipulate a string table entry to point to a specific memory addresses before the read or write. But, again, since BASIC insists on moving 255 bytes at a time, even this method isn't a great solution. We're not going to even bother trying that.

In the next episode we'll look at a machine language replacement for XIO and its test program.




For my thoughts are not your thoughts, neither are your ways my ways, declares the Lord. For as the heavens are higher than the earth, so are my ways higher than your ways and my thoughts than your thoughts.
Isaiah 55:8-9





Attached thumbnail(s)
  • blogentry-32235-0-13510000-1469679159.pn
  • blogentry-32235-0-62614400-1469679430.pn


http://atariage.com/forums/blog/576/entry-13185-part-10-of-11-simple-assembly-for-atari-basic/
Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...