Session 10: Orgasm

+Andrew Davie · May 26, 2003

We've had a brief introduction to DASM, and in particular mnemonics (6502 instructions, written in human-readable format) and symbols (other words in our program which are converted by DASM into a numeric form in the binary).

Now we're going to have a brief look at how DASM uses the symbols (and in particular the value for symbols it calculates and stores in its internal symbol table) to build up the binary ROM image.

Each symbol the assembler finds in our source code must be defined (ie: given an actual value) in at least one place in the code. A value is given to a symbol when it appears in our code starting in the very first column of a line. Symbols typically cannot be redefined (given another value).

In an earlier session we examined how the code "sta WSYNC" appeared in our binary file as $85 $02 (remember, we examined the listing file to see what bytes appeared in our binary. At that point, I indicated that the assembler had determined the value of the symbol "WSYNC" was 2 (corresponding to the TIA register's memory address) - through its definition in the standard vcs.h file.

But how does the assembler actually determine the value of a symbol?

The answer is that the symbol must be defined somewhere in the source code (as opposed to just being referenced). Definition of a symbol can come in several forms. The most straightforward is to just assign a value...


WSYNC = 2

or...


WSYNC EQU 2

The above examples are equivalent - DASM supports syntax (style) which has become fairly standard over the years. Some people (me!) like to use the = symbol, and some like to use EQU. Note that the symbol in question must start in the very first column, when it is being defined. In both cases, the value 2 is being assigned to the symbol WSYNC. Wherever DASM encounters the symbol WSYNC in the code, it knows to use the value 2.

That's fairly straightforward stuff. But symbols can be defined in terms of other symbols! Also, DASM has a quite capable ability to understand expressions, so the following is quite valid...


AFTER_WSYNC = WSYNC + 1

In this case, the symbol "AFTER_WSYNC" would have the value 3. Even if the WSYNC label was defined after the above code, the assembler would successfully be able to resolve the AFTER_WSYNC value, as it does multiple passes through the code until symbols are all resolved.

Symbols can also be given values automatically by the assembler. Consider our sample kernel where we see the following code near the start (here we're looking at the listing file, so we can see the address information DASM outputs)...

    10  0000 ????          SEG
    11  f000           ORG	$F000
    12  f000
    13  f000       Reset
    14  f000
    15  f000
    16  f000
    17  f000
    18  f000
    19  f000
    20  f000       StartOfFrame
    21  f000
    22  f000      ; Start of vertical blank processing
    23  f000
    24  f000         a9 00        lda	#0
    25  f002         85 01        sta	VBLANK

"Reset" and "StartOfFrame" are two symbols which are definitions at this point because they both start at the first column of the lines they are on. The assembler assigns the current ROM address to these symbols, as they occur. That is, if we look at these "labels" (=symbols) in the symbol table, we see...

StartOfFrame             f000              (R )
Reset                    f000              (R )

They both have a value of $F000. This form of symbol (which starts at the beginning of a line, but is not explicitly assigned a value) is called a label, and refers to a location in the code (or more particularly an address). How and why did DASM assign the value $F000 to these two labels, in this case?

As the assembler converts your source code to a binary format, it keeps an internal counter telling it where in the address space the next byte is to be placed. This address increments by the appropriate amount for each bit of data it encounters. For example, if we had a "nop" (a 1-byte instruction), then the address counter that DASM maintains would increment by 1 (the length of the nop instruction). Whenever a label is encountered, the label is given the value of the current internal address counter at the point in the binary image at which the label occurs. The label itself does not go into the binary - but the value of the label refers to the address in the binary corresponding to the position of the label in the source code.

In the above code snippet, we can see the address in column 2 of the output, and it starts at 0 (with ???? after it, indicating it doesn't actually KNOW the internal counter/address at this point), and (here's the bit I really want you to understand) it is set to $F000 when we get the "org $F000" line. "Org" stands for origin, and this is the way we (the programmer) indicate to the assembler the starting address of next section of code in the binary ROM. Just to complicate things slightly, it is not the actual offset from the start of the ROM (for a ROM might, for example, be only 4K but contain code assembled to live at $F000-$FFFF - as in a 4K cartridge). So it's not an offset, it's a conceptual address.

These labels are very useful to programmers to give a name to a point in code, so that that point may be referred to by the label, instead of us having to know the address. If we look at the end of our sample kernel, we see...


    70  f3ea         4c 00 f0        jmp	StartOfFrame

The "jmp" is the mnemonic for the jump instruction, which transfers flow of control to the address given in the two byte operand. In other words, it's a GOTO statement. Look carefully at the binary numbers inserted into the ROM (again, the columns are left to right, line number, address, byte(s), source code). We see $4C, 0, $f0. The opcode for JMP is $4C - whenever the 6502 fetches this instruction, it forms a 16-bit address from the next two bytes (0,$F0) and code continues from that address. Note that the "StartOfFrame" symbol/label has a value $F000 in our symbol table.

It's time to understand how 16-bit numbers are formed from two 8-bit numbers, and how 0, $F0 translates to $F000. The 6502, as noted, can address 2^16 bytes of memory. This requires 16 bits. The 6502 itself is only capable of manipulating 8-bit numbers. So 16-bit numbers are stored as pairs of bytes. Consider any 16-bit address in hexadecimal - $F000 is convenient enough. The binary value for that is %1111000000000000. Divide it into two 8-bit sections (ie: equivalent to 2 bytes) and you get %11110000 and %00000000 - equivalent to $F0 and 0. Note, any two hex digits make up a byte, as hex digits require 4 bits each (0-15, ie: %0000-%1111). So we could just split any hex address in half to give us two 8-bit bytes. As noted, 6502 manipulates 16-bit addresses through the use of two bytes. These bytes are generally always stored in ROM in little-endian format (that is, the lowest significant byte first, followed by the high byte). So $F000 hex is stored as 0, $F0 (the low byte of $F000 followed by the high byte).

Now the binary of our jmp instruction should make sense. Opcode ($4C), 16-bit address in low/high format ($F000). When this instruction executes, the program jumps to and continues executing from address $F000 in ROM. And we can see how DASM has used its symbol table - and in particular the value it calculated from the internal address counter when the StartOfFrame label was defined - to "fill in" the correct low/hi value into the binary file itself where the label was actually referred to.

This is typical of symbol usage. DASM uses its internal symbol table to give it a value for any symbol it needs. Those values are used to create the correct numbers for the ROM/binary image.

Let's go back to our magical discovery that the "org" instruction is just a command to the assembler (it does not appear in the binary) to let the assembler know the value of the internal address counter at that point in the code. It is quite legal to have more than one ORG command in our source. In fact, our sample kernel uses this when it defines the interrupt vectors...

    70  f3ea         4c 00 f0        jmp	StartOfFrame
    71  f3ed
    72  f3ed
    73  fffa           ORG	$FFFA
    74  fffa
    75  fffa         00 f0        .word.w	Reset; NMI
    76  fffc         00 f0        .word.w	Reset; RESET
    77  fffe         00 f0        .word.w	Reset; IRQ

Here we can see that after the jmp instruction, the internal address counter is at $F3ED, and we have another ORG which sets the address to $FFFA (the start of the standard 6502 interrupt vector data). Astute readers will notice the use of the label "Reset" in three lines, with the binary value $F000 (if the numbers are to be interpreted as a low/high byte pair) appearing in the ROM image at address $FFFA, $FFFC, $FFFE. We briefly discussed how the 6502 looks at the address $FFFC to give it the address at which it should start running code. Here we see that this address points to the label "Reset". Magic.

It's quite legal to use one symbol as the value for an ORG command. Here's a short snippet of code which should clarify this...

START = $F800; start of code - change this if you want

  ORG START
HelloWorld

In the above example, the label HelloWorld would have a value of $F800. If the value of START were to change, so would the value of HelloWorld.

We've seen how the ORG command is used to tell DASM where to place bits of code (in terms of the address of code in our ROM). This command can also be used to define our variables in RAM. We haven't had a play with RAM/variables yet, and it will be a few sessions before we tackle that - but if you want a sneek peek, have a look at vcs.h and see how it defines its variables from an origin defined as "ORG TIA_BASE_ADDRESS". That code is way more complex than our current level of understanding, but it gives some idea of the versatility of the assembler.

We're almost done with the basic commands inserted into our source code to assist DASM's building of the binary image. Now you should understand how symbols are assigned values (either by their explicit assignation of a value, or by implicit address/location value) - and how those values - through the assembler's internal symbol table - are used to put the correct number into the ROM binary image. We also understand that DASM converts mnemonics (6502 commands in human-readable form) directly into opcodes. There's not much more to actual assembly - so we shall soon move on to actual 6502 code, and playing with the TIA itself.

Gateway · May 27, 2003

:lol: ORG + (D)ASM = Orgasm. Clever.

With titles like that you know I'm going to read it!

Great stuff here, Andrew! I've taken a break from 2600 programming but I'm getting excited about trying my hand at it again soon thanks to these posts and to J. Redant's "classes" posts.

Keep up the good work!

NE146 · May 27, 2003

ORG + (D)ASM = Orgasm. Clever.

I think everyone who's programmed in assembly has one time or another named a file "ORG.ASM"

But back to the subject. A Davie.. this is friggin GREAT STUFF

buta · August 25, 2003

First of all, thanks for the great tutorial. It is extremely easy to follow and very well written.

I have a question though about the assembler and I wanted to confirm something about startup.

You say in the examples that the assembler puts 'code' at certain bytes in memory starting with the ORG value. I have looked at a couple of ROM bins in a hex editor though and I do not see anywhere in the binary file that specifies the memory location of the bytes in the file. It looks like the binary files just start out with a stream of instructions. Am I missing something? I assume a simulator like stella would store a ROM file in a byte array to simulate memory. If so it must know somehow what locations in the array to start copying the bytes to.

Also, I understand that the RESET interrupt is used to set the program counter to the start of the executable. Are there exceptions to this on the 2600? Is this common practice as far as CPU's go?

Happy_Dude · August 25, 2003

I beleive its all relative, that is using "ORG $80" for RAM makes all your

variables reside in that memory space. when you compile the code all the

variable names are replaced with the address locations so that when

played on a real Atari (or emulator) all reads and writes to ram are correct.

its the same with the start of ROM, so all memory addresses are correct on

the target system

As for the reset's, thats redundant because the 6507 doesn't have an

interupt line, but we still include them anyway

buta · August 25, 2003

Thanks for your reply!

I understand that the ORG command sets the relative point from which the address of labels and variables will be calculated. I should have asked the question more clearly.

I was wondering about the code in a 2600 simulator/emulator that loads a rom file. The 2600 doesn't have to do any work to load a cartridge. The cartridge rom maps onto the 6507's address lines. But for simulators, its different. It seems to me that the contents of the file would be copied to a byte array, and the byte array would be used to simulate memory. In order to do this though you would have to know somehow where in the array to store the bytes from the rom bin file.

I meant to ask: if I were writing this function to load a rom file and copy it to an array(in what ever language), where would I look in the rom file to figure out at what address (array index) I should start copying to the array.

I mean to ask about basic 2k and 4k cartridges without any bank switching stuff. For a 2k cartridge would it be valid to just count backwards 2k from the top of memory, count backwards 4k for a 4k cartridge?

It seems from the tutorial above that the programmer can ORG the code segment and other segments anywhere they wish in the address space. So technically someone could put a data segment at 0x1000 and the code segment at 0x1100. Which would mean reading the RESET vector to determine the start address of the ROM code is invalid because it assumes that the code segment is always the first segment.

So it must calculate by file size???

I appreciate your help!

Eckhard Stolberg · August 25, 2003

It's correct that the 6507 doesn't have an external interrupt connection, but it still has a reset line. Therefore you must point the reset vector to the start of your code. You could ignore the other two interrupt vectors, but it's good practice to make them point to the start of the code too.

On the 6507 these three system vectors are located at fixed addresses at the very end of the processors address range. So, since the last six bytes in your binary must contain the system vector, an emulator would copy the binary to the byte array, so that the last six bytes will end up on the addresses for the system vectors.

The ORG instruction in the source code is nessessary, because the assembler doesn't know you are creating a binary for the 2600. Other systems, like certain home computers, used a similar processor too. On these systems the program could be loaded anywhere in the processors address range. For these cases you can tell DASM to create binaries with two extra bytes at the start of the ROM that contain the address that the binary is supposed to be loaded to.

On the 2600 this isn't nessessary. But you still have to use the ORG instruction, so that DASM knows where to start assembling your code to, so that it can replace the jump- or load-labels with the correct values.

Ciao, Eckhard Stolberg

buta · August 25, 2003

Thanks for your help! That was exactly what I wanted to know.

This is a great forum. I only started reading it yesterday but I am hooked already.

Sign In

Session 10: Orgasm

Recommended Posts

+Andrew Davie

Link to comment

Share on other sites

Gateway

Link to comment

Share on other sites

NE146

Link to comment

Share on other sites

buta

Link to comment

Share on other sites

Happy_Dude

Link to comment

Share on other sites

buta

Link to comment

Share on other sites

Eckhard Stolberg

Link to comment

Share on other sites

buta

Link to comment

Share on other sites

Archived

Recently Browsing 0 members

Apps

My Activity Streams

More