Jump to content
Cybearg

How Do Data Arrays Work?

Recommended Posts

This isn't really a programming question per se, but more of a conceptual thing.

 

When I do something like this:

 

 

LDX index
LDA DataSet,x
 
...
 
DataSet
 .byte 128
 .byte 64
 .byte 32
 .byte 16
 .byte 8
 .byte 4

 

... What exactly is going on, technically? Is the program saving the current location of the program's execution in the stack, then jumping to DataSet, incrementing by index bytes, reading a single byte, and returning to the location pushed to the stack?

 

Is there any modern equivalent of this in, say, C? Normally in C, you have to specify actual memory to be used to hold an array, but is there any way to do a "hard array" like you can do with the 6502 in Assembly (or batariBasic)?

Share this post


Link to post
Share on other sites

This isn't really a programming question per se, but more of a conceptual thing.

 

When I do something like this:

LDX index
LDA DataSet,x
 
...
 
DataSet
 .byte 128
 .byte 64
 .byte 32
 .byte 16
 .byte 8
 .byte 4

... What exactly is going on, technically? Is the program saving the current location of the program's execution in the stack, then jumping to DataSet, incrementing by index bytes, reading a single byte, and returning to the location pushed to the stack?

 

Is there any modern equivalent of this in, say, C? Normally in C, you have to specify actual memory to be used to hold an array, but is there any way to do a "hard array" like you can do with the 6502 in Assembly (or batariBasic)?

Cyberg,

this is a lot like BASIC:

 

' using x and a vars instead of the x and a registers

for x = 0 to 5

read a

next

data 128

data 64

data 32

data 16

data 8

data 4

 

The trouble with this analogy is that should we run that through a compiler we would get x and a variables (or some other) being manipulated by the registers something like this:

 

lda #0

sta x

Forloop ldx x

lda DataSet,x

sta a; "loaded the variable"

inx

stx x; "loaded the variable"

cpx #6

bne Forloop

 

Functionally this really isn't too different but the pure Assembly example uses less instructions because it can manipulate the registers natively as variables.

 

You can build and manipulate data areas in C and Java just like BASIC and I suspect the compiled output would show the same differences to native asm.

Share this post


Link to post
Share on other sites

Reading from or writing to memory locations-- whether using arrays or not-- doesn't affect the program counter. The address bus and data bus are separate from the program counter, so the 6502/6507 can read from or write to memory without needing to save the program counter's current location.

Share this post


Link to post
Share on other sites

Is there any modern equivalent of this in, say, C? Normally in C, you have to specify actual memory to be used to hold an array, but is there any way to do a "hard array" like you can do with the 6502 in Assembly (or batariBasic)?

Its possible to get C to read from an arbitrarily located array, but it's only typical if you're writing low-level code on a platform without the usual C libraries and/or MMU.

 

unsigned char *arbitrarymem = 0x8000 ;
arbitrarymem[0] = 69;

If you try it on a platform with memory protection, the above will almost certainly cause a segfault.

 

I've never actually seen this used to read hardcoded data though, like the 6502/7 example, since there's no advantage over just declaring the array along with its contents. The compiler will just do more or less the same thing under the hood - ie. embed the data and position your pointer at the start of the embedded data.

Share this post


Link to post
Share on other sites

You know, no matter how many times I see C code meant for a 6502 based machine, I can never convince myself that C makes sense on a 6502. Between tight memory constraints; the acrobatics required since the '02 lacks a big stack, I found my high level language balance on the '02 in FORTH...

 

(I say this, as I write C and C++ code for x86 and ARM, every day...)

 

-Thom

Share this post


Link to post
Share on other sites

... What exactly is going on, technically? Is the program saving the current location of the program's execution in the stack, then jumping to DataSet, incrementing by index bytes, reading a single byte, and returning to the location pushed to the stack?

Not much going on here:

- the first instruction loads the value at address "index" to the X register

- the second instructions loads the value at address "DataSet" + "the value in the X register" into the accumulator.

 

As others pointed out the "LDA DataSet,X" would normally be executed in a loop. And you probably would rather use an immediate LDX #index (the difference is the '#' which means take the value of index rather than the value at the address index is pointing to).

 

The stack is not affected by those instructions. Don't confuse the "LDA DataSet,X" with a subroutine call - it's just the 6502 pendant to the C idiom:

 

DataSet[index]

 

take care!

Share this post


Link to post
Share on other sites

Not much going on here:

- the first instruction loads the value at address "index" to the X register

- the second instructions loads the value at address "DataSet" + "the value in the X register" into the accumulator.

 

As others pointed out the "LDA DataSet,X" would normally be executed in a loop. And you probably would rather use an immediate LDX #index (the difference is the '#' which means take the value of index rather than the value at the address index is pointing to).

 

The stack is not affected by those instructions. Don't confuse the "LDA DataSet,X" with a subroutine call - it's just the 6502 pendant to the C idiom:

 

DataSet[index]

 

take care!

No, I understand all that. I'm asking, what happens behind-the-scenes?

 

This all came from me asking my C++ professor if I could define a static data set that didn't use memory, like these Assembly data sets do. You can move through the data like an array, but it doesn't actually put all those bytes into a memory array, so it saves a lot of RAM.

 

He seemed insistent that it's impossible to read data without loading it into memory, unless there was some kind of super-specialized, rare hardware in the CPU.

 

So I'm asking--how does it read the data without loading the full array into memory, and is there some simple equivalent in C that doesn't actually use RAM but merely increases the program's size, like how adding a data array uses up ROM instead of RAM?

Share this post


Link to post
Share on other sites

Not sure I completely understand your question or if this will answer it, but here's my take on it.

 

I think a modern equivalent to the code you posted might be closer to getting data of a disk or external storage. PCs run programs from RAM instead of ROM like the 2600. Since the 2600's program is stored and executed in ROM (which can be accessed in about the same time as it's RAM), you can have lists of unchangeable data which are useable by the 6502's various instructions. Since disks take a long time to read in comparison to a single instruction's execution time on a modern system, it wouldn't make sense to directly access it like you would on a system executing programs from ROM.

 

Behind the scenes of the execution of an lda absolute,x might look something like this (disclaimer, this is speculation, I don't know exactly how the 6502 executes it's instructions, but it would be similar to this):

 

  - fetch opcode/decode ($BD = lda absolute,x)
  - fetch next byte and add contents of X register.  place on low address bus (A0-A7)
  - fetch next byte and place on high address bus(A8-A15) *   
  - place contents of data bus into accumulator 

* if overflow was detected on previous sub instruction, take another cycle to increment byte by 1.

Share this post


Link to post
Share on other sites

So I'm asking--how does it read the data without loading the full array into memory, and is there some simple equivalent in C that doesn't actually use RAM but merely increases the program's size, like how adding a data array uses up ROM instead of RAM?

 

 

The array is alread "in memory". It is in the ROM address space, otherwise you wouldn't be able to access it.

The access is merely calculating the address from which to fetch the correct byte. It's just a simple memory access with an offset from a base address.

Nothing magic. Internally the CPU adds the X register to the label's address, and fetches a byte from the memory address that points to.

No modification to the program counter required -- as i said, just a simply memory fetch. So, as I said, the array is already "in" memory.

Share this post


Link to post
Share on other sites

This all came from me asking my C++ professor if I could define a static data set that didn't use memory, like these Assembly data sets do. You can move through the data like an array, but it doesn't actually put all those bytes into a memory array, so it saves a lot of RAM.

If you define a static data set it'll always use memory. In case of the 2600 it'll use ROM, because it's the only memory this poor device has (apart from the 128 bytes RAM) - in case of a modern computer it'll use RAM.

 

When programming the 2600 it is often sufficient to load static data to one of the processor registers and then store it directly to some RIOT register - without using a temporary RAM variable, e.g.

 

COLUP0 equ $46

 

LDA #7

STA COLUP0

 

But, you can do exactly the same in C (if that's what you're after):

 

int* colup0 = (int *)0x46;

*colup0 = 7;

 

Any decent C-Compiler would optimize this to the same asm code as above.

Share this post


Link to post
Share on other sites

If you define a static data set it'll always use memory. In case of the 2600 it'll use ROM, because it's the only memory this poor device has (apart from the 128 bytes RAM) - in case of a modern computer it'll use RAM.

 

When programming the 2600 it is often sufficient to load static data to one of the processor registers and then store it directly to some RIOT register - without using a temporary RAM variable, e.g.

 

COLUP0 equ $46

 

LDA #7

STA COLUP0

 

But, you can do exactly the same in C (if that's what you're after):

 

int* colup0 = (int *)0x46;

*colup0 = 7;

 

Any decent C-Compiler would optimize this to the same asm code as above.

Agree Jan, your C code is the equivelent of "poke &h46,7" in BASIC and should get translated directly.

 

However you cannot access the processors internal registers in C anymore than you can in BASIC - stepping through a dataset with the internal index register in Cybearg's example or conducting register dances and swaps is low level only.

Share this post


Link to post
Share on other sites

Agree Jan, your C code is the equivelent of "poke &h46,7" in BASIC and should get translated directly.

 

However you cannot access the processors internal registers in C anymore than you can in BASIC - stepping through a dataset with the internal index register in Cybearg's example or conducting register dances and swaps is low level only.

 

No. The code is not equivalent.

int is not an 8-bit (one byte) data type. The c code int *colup0 = (int*)0x46; *colup0 = 7 will most likely write to both colup0 AND colup1. TWO bytes.

you should use char instead. char *colup0.... etc

Share this post


Link to post
Share on other sites

However you cannot access the processors internal registers in C anymore than you can in BASIC - stepping through a dataset with the internal index register in Cybearg's example or conducting register dances and swaps is low level only.

Absolutely true, because this is the big idea behind C - abstract away the processor details to increase portability.

One last thing, you could use the "register" keyword in C. It's a hint to the compiler to keep a variable in a processor register if possible.

Share this post


Link to post
Share on other sites

No. The code is not equivalent.

int is not an 8-bit (one byte) data type. The c code int *colup0 = (int*)0x46; *colup0 = 7 will most likely write to both colup0 AND colup1. TWO bytes.

you should use char instead. char *colup0.... etc

Yeah, my C-skills are a bit rusty - but since we're at it. Shouldn't it be "unsigned char*" ;-)

Share this post


Link to post
Share on other sites

Yeah, my C-skills are a bit rusty - but since we're at it. Shouldn't it be "unsigned char*" ;-)

 

Yes, it should. Well spotted. However, for the case of 7 it makes no difference.

If you were trying to write, say, 128... then the compiler SHOULD give you a value out of range error, so no harm done.

But yes, unsigned char * is the way to go.

Share this post


Link to post
Share on other sites

 

No. The code is not equivalent.

int is not an 8-bit (one byte) data type. The c code int *colup0 = (int*)0x46; *colup0 = 7 will most likely write to both colup0 AND colup1. TWO bytes.

you should use char instead. char *colup0.... etc

Andrew,

I disagree, the code looks functionally equivalent because C lets us poke and peek directly into memory; it would require two basic poke statements to push a 16-bit value but we're only storing a 3-bit value so one 8-bit poke is enough.

 

If the target processor had 16-bit Registers like the 6809 the compiled asm would even look the same other than using a larger register:

 

LDD #7 ; load the D register with a 16-bit value

STD $3002 ; store 16-bit value starting at $3002

 

Agree it is more efficient to use smaller registers since the value fits:

 

LDA #7 ; load 1/2 of the D register with an 8-bit value

STA $3002 ; store just the 8-bit value

 

Of course if your C is being compiled for an 8-bit processor you would see two loads and stores to push a 16-bit value, like two pokes in BASIC; In this instance the C is even more abstract and further removed from the hardware (good argument against C for 8-bit CPU's).

Share this post


Link to post
Share on other sites

Andrew,

I disagree, the code looks functionally equivalent because C lets us poke and peek directly into memory; it would require two basic poke statements to push a 16-bit value but we're only storing a 3-bit value so one 8-bit poke is enough.

 

If the target processor had 16-bit Registers like the 6809 the compiled asm would even look the same other than using a larger register:

 

LDD #7 ; load the D register with a 16-bit value

STD $3002 ; store 16-bit value starting at $3002

 

Agree it is more efficient to use smaller registers since the value fits:

 

LDA #7 ; load 1/2 of the D register with an 8-bit value

STA $3002 ; store just the 8-bit value

 

Of course if your C is being compiled for an 8-bit processor you would see two loads and stores to push a 16-bit value, like two pokes in BASIC; In this instance the C is even more abstract and further removed from the hardware (good argument against C for 8-bit CPU's).

 

:)

I think you've missed the point.

If you are writing a C equivalent of a byte store to a register, you need to use a byte pointer.

If you use an int pointer, then the C code/compiler IS going to write at least two bytes.

Cheers

A

 

Share this post


Link to post
Share on other sites

 

:)

I think you've missed the point.

If you are writing a C equivalent of a byte store to a register, you need to use a byte pointer.

If you use an int pointer, then the C code/compiler IS going to write at least two bytes.

Cheers

A

 

 

Andrew,

my point was that the internal registers are not exposed in C, only memory locations (COLUP0 is not an internal register on the CPU).

 

Do you know of a way to set a pointer to an index register on the CPU and use it to step through a Dataset in C? ;)

Share this post


Link to post
Share on other sites

 

Andrew,

my point was that the internal registers are not exposed in C, only memory locations (COLUP0 is not an internal register on the CPU).

 

Do you know of a way to set a pointer to an index register on the CPU and use it to step through a Dataset in C? ;)

 

I don't see the point of your point :)

 

But in answer to your question, the register is not a memory location so you can't point to it. But you CAN use it as a pointer... use a register variable. It's a hint to the C compiler that the variable is a register.

register unsigned char *x = &COLUP0;
*x = 7;

Of course, it's just a hint to the compiler. And the above would generate horrible code, not at all what you would expect.

ldx #<COLUP0
lda #7
sta 0,x

You want to step through a dataset in C, what you're basically saying/asking is that you want a C-way to use an index register to access an array.

You leave that sort of stuff to the compiler. A good compiler will do it automatically. You MIGHT be able to tweak it by using a register variable.

register unsigned char x = 0;
unsigned char *ptr = &DataSet;
while (ptr[x]) {
 x++;
}

The above might work. It *might* generate the following....

 ldx #0
more lda DataSet,x
 beq end
 inx
 jmp more


end

More likely, something like this...

 ldy #0
 lda #<DataSet
 sta ptr
 lda #>DataSet
 sta ptr+1
more lda (ptr),y
 beq end
 iny
 jmp more
end

... and now our 'X' is actually in 'Y" because the compiler needed to access via indirect zero page addressing. Emphasising that the register modifyer is just a hint.

Share this post


Link to post
Share on other sites

 

I don't see the point of your point :)

 

But in answer to your question, the register is not a memory location so you can't point to it. But you CAN use it as a pointer... use a register variable. It's a hint to the C compiler that the variable is a register.

register unsigned char *x = &COLUP0;
*x = 7;

Of course, it's just a hint to the compiler. And the above would generate horrible code, not at all what you would expect.

ldx #<COLUP0
lda #7
sta 0,x

You want to step through a dataset in C, what you're basically saying/asking is that you want a C-way to use an index register to access an array.

You leave that sort of stuff to the compiler. A good compiler will do it automatically. You MIGHT be able to tweak it by using a register variable.

register unsigned char x = 0;
unsigned char *ptr = &DataSet;
while (ptr[x]) {
 x++;
}

The above might work. It *might* generate the following....

 ldx #0
more lda DataSet,x
 beq end
 inx
 jmp more


end

More likely, something like this...

 ldy #0
 lda #<DataSet
 sta ptr
 lda #>DataSet
 sta ptr+1
more lda (ptr),y
 beq end
 iny
 jmp more
end

... and now our 'X' is actually in 'Y" because the compiler needed to access via indirect zero page addressing. Emphasising that the register modifyer is just a hint.

 

Good post Andrew! My point was in response to Cybearg's opening question "is there a modern equivelent in C" to use an index register to step through data.

 

Jan made a good point about dropping a register hint but as you've illustrated this is largely futile; we're not writing in Assembly and the optmiser easily gets confused!

 

We can do this in C by escaping the function to Assembly - we can even let the compiler know which registers we're taking direct control over so that it can preserve and restore them after the escape:

 

http://locklessinc.com/articles/gcc_asm/

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...