+Andrew Davie Posted May 26, 2003 Share Posted May 26, 2003 We've had a brief introduction to DASM, and in particular mnemonics (6502 instructions, written in human-readable format) and symbols (other words in our program which are converted by DASM into a numeric form in the binary). Now we're going to have a brief look at how DASM uses the symbols (and in particular the value for symbols it calculates and stores in its internal symbol table) to build up the binary ROM image. Each symbol the assembler finds in our source code must be defined (ie: given an actual value) in at least one place in the code. A value is given to a symbol when it appears in our code starting in the very first column of a line. Symbols typically cannot be redefined (given another value). In an earlier session we examined how the code "sta WSYNC" appeared in our binary file as $85 $02 (remember, we examined the listing file to see what bytes appeared in our binary. At that point, I indicated that the assembler had determined the value of the symbol "WSYNC" was 2 (corresponding to the TIA register's memory address) - through its definition in the standard vcs.h file. But how does the assembler actually determine the value of a symbol? The answer is that the symbol must be defined somewhere in the source code (as opposed to just being referenced). Definition of a symbol can come in several forms. The most straightforward is to just assign a value... WSYNC = 2 or... WSYNC EQU 2 The above examples are equivalent - DASM supports syntax (style) which has become fairly standard over the years. Some people (me!) like to use the = symbol, and some like to use EQU. Note that the symbol in question must start in the very first column, when it is being defined. In both cases, the value 2 is being assigned to the symbol WSYNC. Wherever DASM encounters the symbol WSYNC in the code, it knows to use the value 2. That's fairly straightforward stuff. But symbols can be defined in terms of other symbols! Also, DASM has a quite capable ability to understand expressions, so the following is quite valid... AFTER_WSYNC = WSYNC + 1 In this case, the symbol "AFTER_WSYNC" would have the value 3. Even if the WSYNC label was defined after the above code, the assembler would successfully be able to resolve the AFTER_WSYNC value, as it does multiple passes through the code until symbols are all resolved. Symbols can also be given values automatically by the assembler. Consider our sample kernel where we see the following code near the start (here we're looking at the listing file, so we can see the address information DASM outputs)... 10 0000 ???? SEG 11 f000 ORG $F000 12 f000 13 f000 Reset 14 f000 15 f000 16 f000 17 f000 18 f000 19 f000 20 f000 StartOfFrame 21 f000 22 f000 ; Start of vertical blank processing 23 f000 24 f000 a9 00 lda #0 25 f002 85 01 sta VBLANK "Reset" and "StartOfFrame" are two symbols which are definitions at this point because they both start at the first column of the lines they are on. The assembler assigns the current ROM address to these symbols, as they occur. That is, if we look at these "labels" (=symbols) in the symbol table, we see... StartOfFrame f000 (R ) Reset f000 (R ) They both have a value of $F000. This form of symbol (which starts at the beginning of a line, but is not explicitly assigned a value) is called a label, and refers to a location in the code (or more particularly an address). How and why did DASM assign the value $F000 to these two labels, in this case? As the assembler converts your source code to a binary format, it keeps an internal counter telling it where in the address space the next byte is to be placed. This address increments by the appropriate amount for each bit of data it encounters. For example, if we had a "nop" (a 1-byte instruction), then the address counter that DASM maintains would increment by 1 (the length of the nop instruction). Whenever a label is encountered, the label is given the value of the current internal address counter at the point in the binary image at which the label occurs. The label itself does not go into the binary - but the value of the label refers to the address in the binary corresponding to the position of the label in the source code. In the above code snippet, we can see the address in column 2 of the output, and it starts at 0 (with ???? after it, indicating it doesn't actually KNOW the internal counter/address at this point), and (here's the bit I really want you to understand) it is set to $F000 when we get the "org $F000" line. "Org" stands for origin, and this is the way we (the programmer) indicate to the assembler the starting address of next section of code in the binary ROM. Just to complicate things slightly, it is not the actual offset from the start of the ROM (for a ROM might, for example, be only 4K but contain code assembled to live at $F000-$FFFF - as in a 4K cartridge). So it's not an offset, it's a conceptual address. These labels are very useful to programmers to give a name to a point in code, so that that point may be referred to by the label, instead of us having to know the address. If we look at the end of our sample kernel, we see... 70 f3ea 4c 00 f0 jmp StartOfFrame The "jmp" is the mnemonic for the jump instruction, which transfers flow of control to the address given in the two byte operand. In other words, it's a GOTO statement. Look carefully at the binary numbers inserted into the ROM (again, the columns are left to right, line number, address, byte(s), source code). We see $4C, 0, $f0. The opcode for JMP is $4C - whenever the 6502 fetches this instruction, it forms a 16-bit address from the next two bytes (0,$F0) and code continues from that address. Note that the "StartOfFrame" symbol/label has a value $F000 in our symbol table. It's time to understand how 16-bit numbers are formed from two 8-bit numbers, and how 0, $F0 translates to $F000. The 6502, as noted, can address 2^16 bytes of memory. This requires 16 bits. The 6502 itself is only capable of manipulating 8-bit numbers. So 16-bit numbers are stored as pairs of bytes. Consider any 16-bit address in hexadecimal - $F000 is convenient enough. The binary value for that is %1111000000000000. Divide it into two 8-bit sections (ie: equivalent to 2 bytes) and you get %11110000 and %00000000 - equivalent to $F0 and 0. Note, any two hex digits make up a byte, as hex digits require 4 bits each (0-15, ie: %0000-%1111). So we could just split any hex address in half to give us two 8-bit bytes. As noted, 6502 manipulates 16-bit addresses through the use of two bytes. These bytes are generally always stored in ROM in little-endian format (that is, the lowest significant byte first, followed by the high byte). So $F000 hex is stored as 0, $F0 (the low byte of $F000 followed by the high byte). Now the binary of our jmp instruction should make sense. Opcode ($4C), 16-bit address in low/high format ($F000). When this instruction executes, the program jumps to and continues executing from address $F000 in ROM. And we can see how DASM has used its symbol table - and in particular the value it calculated from the internal address counter when the StartOfFrame label was defined - to "fill in" the correct low/hi value into the binary file itself where the label was actually referred to. This is typical of symbol usage. DASM uses its internal symbol table to give it a value for any symbol it needs. Those values are used to create the correct numbers for the ROM/binary image. Let's go back to our magical discovery that the "org" instruction is just a command to the assembler (it does not appear in the binary) to let the assembler know the value of the internal address counter at that point in the code. It is quite legal to have more than one ORG command in our source. In fact, our sample kernel uses this when it defines the interrupt vectors... 70 f3ea 4c 00 f0 jmp StartOfFrame 71 f3ed 72 f3ed 73 fffa ORG $FFFA 74 fffa 75 fffa 00 f0 .word.w Reset; NMI 76 fffc 00 f0 .word.w Reset; RESET 77 fffe 00 f0 .word.w Reset; IRQ Here we can see that after the jmp instruction, the internal address counter is at $F3ED, and we have another ORG which sets the address to $FFFA (the start of the standard 6502 interrupt vector data). Astute readers will notice the use of the label "Reset" in three lines, with the binary value $F000 (if the numbers are to be interpreted as a low/high byte pair) appearing in the ROM image at address $FFFA, $FFFC, $FFFE. We briefly discussed how the 6502 looks at the address $FFFC to give it the address at which it should start running code. Here we see that this address points to the label "Reset". Magic. It's quite legal to use one symbol as the value for an ORG command. Here's a short snippet of code which should clarify this... START = $F800; start of code - change this if you want ORG START HelloWorld In the above example, the label HelloWorld would have a value of $F800. If the value of START were to change, so would the value of HelloWorld. We've seen how the ORG command is used to tell DASM where to place bits of code (in terms of the address of code in our ROM). This command can also be used to define our variables in RAM. We haven't had a play with RAM/variables yet, and it will be a few sessions before we tackle that - but if you want a sneek peek, have a look at vcs.h and see how it defines its variables from an origin defined as "ORG TIA_BASE_ADDRESS". That code is way more complex than our current level of understanding, but it gives some idea of the versatility of the assembler. We're almost done with the basic commands inserted into our source code to assist DASM's building of the binary image. Now you should understand how symbols are assigned values (either by their explicit assignation of a value, or by implicit address/location value) - and how those values - through the assembler's internal symbol table - are used to put the correct number into the ROM binary image. We also understand that DASM converts mnemonics (6502 commands in human-readable form) directly into opcodes. There's not much more to actual assembly - so we shall soon move on to actual 6502 code, and playing with the TIA itself. Link to comment Share on other sites More sharing options...
Gateway Posted May 27, 2003 Share Posted May 27, 2003 ORG + (D)ASM = Orgasm. Clever. With titles like that you know I'm going to read it! Great stuff here, Andrew! I've taken a break from 2600 programming but I'm getting excited about trying my hand at it again soon thanks to these posts and to J. Redant's "classes" posts. Keep up the good work! Link to comment Share on other sites More sharing options...
NE146 Posted May 27, 2003 Share Posted May 27, 2003 ORG + (D)ASM = Orgasm. Clever. I think everyone who's programmed in assembly has one time or another named a file "ORG.ASM" But back to the subject. A Davie.. this is friggin GREAT STUFF Link to comment Share on other sites More sharing options...
buta Posted August 25, 2003 Share Posted August 25, 2003 First of all, thanks for the great tutorial. It is extremely easy to follow and very well written. I have a question though about the assembler and I wanted to confirm something about startup. You say in the examples that the assembler puts 'code' at certain bytes in memory starting with the ORG value. I have looked at a couple of ROM bins in a hex editor though and I do not see anywhere in the binary file that specifies the memory location of the bytes in the file. It looks like the binary files just start out with a stream of instructions. Am I missing something? I assume a simulator like stella would store a ROM file in a byte array to simulate memory. If so it must know somehow what locations in the array to start copying the bytes to. Also, I understand that the RESET interrupt is used to set the program counter to the start of the executable. Are there exceptions to this on the 2600? Is this common practice as far as CPU's go? Link to comment Share on other sites More sharing options...
Happy_Dude Posted August 25, 2003 Share Posted August 25, 2003 I beleive its all relative, that is using "ORG $80" for RAM makes all your variables reside in that memory space. when you compile the code all the variable names are replaced with the address locations so that when played on a real Atari (or emulator) all reads and writes to ram are correct. its the same with the start of ROM, so all memory addresses are correct on the target system As for the reset's, thats redundant because the 6507 doesn't have an interupt line, but we still include them anyway Link to comment Share on other sites More sharing options...
buta Posted August 25, 2003 Share Posted August 25, 2003 Thanks for your reply! I understand that the ORG command sets the relative point from which the address of labels and variables will be calculated. I should have asked the question more clearly. I was wondering about the code in a 2600 simulator/emulator that loads a rom file. The 2600 doesn't have to do any work to load a cartridge. The cartridge rom maps onto the 6507's address lines. But for simulators, its different. It seems to me that the contents of the file would be copied to a byte array, and the byte array would be used to simulate memory. In order to do this though you would have to know somehow where in the array to store the bytes from the rom bin file. I meant to ask: if I were writing this function to load a rom file and copy it to an array(in what ever language), where would I look in the rom file to figure out at what address (array index) I should start copying to the array. I mean to ask about basic 2k and 4k cartridges without any bank switching stuff. For a 2k cartridge would it be valid to just count backwards 2k from the top of memory, count backwards 4k for a 4k cartridge? It seems from the tutorial above that the programmer can ORG the code segment and other segments anywhere they wish in the address space. So technically someone could put a data segment at 0x1000 and the code segment at 0x1100. Which would mean reading the RESET vector to determine the start address of the ROM code is invalid because it assumes that the code segment is always the first segment. So it must calculate by file size??? I appreciate your help! Link to comment Share on other sites More sharing options...
Eckhard Stolberg Posted August 25, 2003 Share Posted August 25, 2003 It's correct that the 6507 doesn't have an external interrupt connection, but it still has a reset line. Therefore you must point the reset vector to the start of your code. You could ignore the other two interrupt vectors, but it's good practice to make them point to the start of the code too. On the 6507 these three system vectors are located at fixed addresses at the very end of the processors address range. So, since the last six bytes in your binary must contain the system vector, an emulator would copy the binary to the byte array, so that the last six bytes will end up on the addresses for the system vectors. The ORG instruction in the source code is nessessary, because the assembler doesn't know you are creating a binary for the 2600. Other systems, like certain home computers, used a similar processor too. On these systems the program could be loaded anywhere in the processors address range. For these cases you can tell DASM to create binaries with two extra bytes at the start of the ROM that contain the address that the binary is supposed to be loaded to. On the 2600 this isn't nessessary. But you still have to use the ORG instruction, so that DASM knows where to start assembling your code to, so that it can replace the jump- or load-labels with the correct values. Ciao, Eckhard Stolberg Link to comment Share on other sites More sharing options...
buta Posted August 25, 2003 Share Posted August 25, 2003 Thanks for your help! That was exactly what I wanted to know. This is a great forum. I only started reading it yesterday but I am hooked already. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.