The job of an assembler is to convert our source code into a binary image which can be run by the 6502. This conversion process ultimately replaces the mnemonics (the words representing the 6502 instructions we use when writing in assembler) and the symbols (the various names we use for things, such as labels to which we can branch, and various other things like the names of TIA registers, etc) with numerical values.
So ultimately, all the assembler needs to do is figure out a numerical value for all the things which become part of the binary - and place that value in the appropriate place in the binary.
We've already had a brief introduction to a 6502 instruction - the one called "nop". This is the no-operation instruction which simply takes 2 cycles to execute. Whenever we enter "nop" into our source code, the assembler recognises this as a 6502 instruction and inserts into the binary the value $EA. This shows that there can be a simple 1:1 relationship between source-code and the binary.
"nop" is a single-byte instruction - all it requires is the opcode, and the 6502 will happily execute it. Some instructions require additonal "parameters" - the "operands". The 6502 microprocessor can use an additional 1 or 2 bytes of operand data for some instructions, so the total number of bytes for a 6502 "instruction" can be 1, 2 or 3.
DASM is the assembler used by most (if not all) modern-day '2600 programmers. It is a multi-platform assembler written in 1988 by Matt Dillon (you should all find his email address and send him a "thank-you" sometime). It's a great tool.
DASM isn't just capable of assembling 6502 (and variant) code - it also has inbuilt capability to assemble code for several other microprocessors. Consequently, one of the very first things that it is necessary to do in our source code is tell DASM what processor the source code is written for.
This should be just about the first line in any '2600 program you write. If you don't include it, DASM will probably get confused and spit out errors. That's simply because it is trying to assemble your code as if it were written for another processor.
We've just seen how mnemonics (the standard names for instructions) are converted into numerical values by the assembler. Another job the assembler does is convert labels and symbols into values. We've already encountered both of these in our previous sessions, but you may not be familiar with their names.
Whenever DASM is doing its job assembling, it keeps a list of all the "words" it encounters in a file in an internal structure called a symbol table. Think of a symbol as a name for something. Remember the "sta WSYNC" instruction we used to halt the 6502 and wait for the scanline to be rendered? The "sta" is the instruction, and "WSYNC" is a symbol. When it first encounters this symbol, DASM doesn't know much about it, other than what it's called (ie: "WSYNC"). What DASM needs to do is work out what the *value* of that symbol is, so that it can insert that value into the binary file.
When it's assembling, DASM puts all the symbols it finds into its symbol table - and associated with each of these is a value. If it doesn't "know" the value, that's OK - DASM will keep assembling the rest of the file quite happily. At some point, something in the code might tell DASM what the value for a symbol actually IS - in which case DASM will put that value in its symbol table alongside the symbol. So whenever that symbol is used anywhere, DASM now knows its correct value to put into the binary file.
In fact, it is absolutely necessary for all symbols which go into the binary file to be given values at some point. DASM can't guess values - it's up to you, the programmer, to make sure this happens. A symbol doesn't have to be given a value at any PARTICULAR point in the code, but it does have to be given a value somewhere in the code. DASM will make multiple "passes" - basically going through the code from beginning to end again and again until it manages to resolve all the symbols to correct values.
We've already seen in some sample code how "sta WSYNC" appears in our binary file as the bytes $85 $02. The first byte $85 is the "sta" instruction (one variant of many - but let's keep it simple for now) and it is followed by a single byte giving the address of the location into which the byte in the "A" register is to be stored. We can see this address is location 2 in memory. Somehow, DASM has figured out from the code that the symbol WSYNC has a value of 2, and when it creates the binary file it replaces all occurences of the symbol with the numeric value 2.
How did it get the value 2? Remember, WSYNC is one of the TIA registers. It appears to the 6502 as a memory location, as the TIA registers are "mapped" into locations 0 - $7F. The file "vcs.h" defines (in a roundabout way) the values and names (symbols) for all of the TIA registers. By including the file "vcs.h" as a part of the assembly for any source file, we automatically tell DASM the correct numeric value for all of the TIA register "names".
That's why, at the top of most files, just after the processor statement, we see...
You don't really need to know much about vcs.h at this stage - but be aware that a "standardised" version of this file is distributed with the DASM assembler as the '2600 support files package. I would advise you to always use the latest and greatest version of this file. Standards help us all.
So now we know basically what DASM does with symbols - it keeps an internal list of symbols - and their values, if known. DASM will keep going through the code and "resolving" the symbols into numeric values, until it is complete (or it couldn't find ANYTHING to resolve, in which case it gives an error). Once all symbols have been resolved, your code has been completely processed by the assembler, and it creates the binary image/file for you - and assembly is complete.
To summarise: DASM converts source-code consisting of instructions (mnemonics) and symbols into a binary form which can be run by the 6502. The assembler converts mnemonics into opcodes (numbers), and symbols into numbers which it calculates the value of during the assembly process.
DASM is a command-line program - that is, it runs under DOS (or whatever platform you happen to choose, provided you have a runnable version for that platform). DASM is provided with full source-code (it's written in C) so as long as you have a C-compiler handy, you can port it to just about any platform under the sun.
It does come with a manual - and it's always a good idea to familiarise yourself with its capabilities. In the interests of getting you up and running quickly, so you can actually assemble the sample kernel posted a session or two ago, here's what you need to type on the command-line...
dasm kernel.asm -lkernel.txt -f3 -v5 -okernel.bin
This is assuming that the file to assemble is named "kernel.asm" (.asm is a standard prefix for assembler files, but some prefer to use .s - you can use whatever you want, really, but I always use .asm). Anything prefixed with a minus-sign ("-") is a "switch" - which tells DASM something about what it is required to do. The -l switch we discussed very briefly, and that tells DASM to create a listing file - in this case, it will write a listing to the file "kernel.txt". The -o switch tells DASM what file to use for the output binary - in this case, the binary will be written to "kernel.bin". That file can be loaded into an emulator, or burned on an EPROM - it is the ROM file, in other words.
The other switches "-f3" and "-v5" control some internals of DASM - and for now just assume you need these whenever you assemble with DASM. Remember, if you're curious you can always read the manual!
If all goes well, DASM will output something like this...
DASM V2.20.05, Macro Assembler (C)1988-2003 START OF PASS: 1 ---------------------------------------------------------------------- SEGMENT NAME INIT PC INIT RPC FINAL PC FINAL RPC f000 f000 RIOT [u] 0280 0280 TIA_REGISTERS_READ [u] 0000 0000 TIA_REGISTERS_WRITE [u] 0000 0000 INITIAL CODE SEGMENT 0000 ???? 0000 ???? ---------------------------------------------------------------------- 1 references to unknown symbols. 0 events requiring another assembler pass. --- Symbol List (sorted by symbol) AUDC0 0015 AUDC1 0016 AUDF0 0017 AUDF1 0018 AUDV0 0019 AUDV1 001a COLUBK 0009 (R ) COLUP0 0006 COLUP1 0007 COLUPF 0008 CTRLPF 000a CXBLPF 0006 CXCLR 002c CXM0FB 0004 CXM0P 0000 CXM1FB 0005 CXM1P 0001 CXP0FB 0002 CXP1FB 0003 CXPPMM 0007 ENABL 001f ENAM0 001d ENAM1 001e GRP0 001b GRP1 001c HMBL 0024 HMCLR 002b HMM0 0022 HMM1 0023 HMOVE 002a HMP0 0020 HMP1 0021 INPT0 0008 INPT1 0009 INPT2 000a INPT3 000b INPT4 000c INPT5 000d INTIM 0284 NUSIZ0 0004 NUSIZ1 0005 Overscan f02c (R ) PF0 000d PF1 000e PF2 000f Picture f01d (R ) REFP0 000b REFP1 000c RESBL 0014 Reset f000 (R ) RESM0 0012 RESM1 0013 RESMP0 0028 RESMP1 0029 RESP0 0010 RESP1 0011 RSYNC 0003 StartOfFrame f000 (R ) SWACNT 0281 SWBCNT 0283 SWCHA 0280 SWCHB 0282 T1024T 0297 TIA_BASE_ADDRESS 0000 (R ) TIM1T 0294 TIM64T 0296 TIM8T 0295 TIMINT 0285 VBLANK 0001 (R ) VDELBL 0027 VDELP0 0025 VDELP1 0026 VerticalBlank f014 (R ) VSYNC 0000 (R ) WSYNC 0002 (R ) --- End of Symbol List. Complete.
Here we can actually SEE the symbol table, and the numeric values that DASM has assigned to the symbols. If you look at the listing file, wherever any of these symbols is used, you will see the corresponding number in the symbol table has been inserted into the binary.
There are lots of symbols there, as the vcs.h file defines just about everything you'll ever need to do with the TIA. The symbols which are actually USED in your code are marked with a (R ) - indicating "referenced".
Now you should be able to go and assemble the sample kernel I provided earlier. Don't be afraid to have a play with things, and see what happens! Experimenting is a big part of learning.
Soon we'll start playing with some TIA registers and seeing what happens to our screen when we do that! For now, though, make sure you are able to assemble and run the first kernel. If you have any problems, ask for assistance and I'm sure somebody will leap to your aid.