Jump to content

Photo

Any new updates on Caves of Kroz or Mystery Castle games


91 replies to this topic

#76 SoulBlazer OFFLINE  

SoulBlazer

    River Patroller

  • 3,903 posts
  • Location:Providence RI

Posted Wed May 2, 2012 12:57 PM

Hey, stop 'nerding' up the place with all this code talk.

Just kidding! If it helps you guys figure out how to make more awesome Intellivision games...GO RIGHT AHEAD!!! :grin: :grin: :grin:


Tell me about it. I don't even know what programing language the NES uses, much less how to program for it. ;)

#77 Carl Mueller Jr OFFLINE  

Carl Mueller Jr

    Moonsweeper

  • 304 posts
  • Location:Kagoshima, Japan

Posted Wed May 2, 2012 1:05 PM


The Inty CPU is is a ton of fun to write code for and that is a big draw for me.


I'm glad I'm not the only person who thinks this way. I cut my teeth on the TMS9900 and later 6502, but of the three, I feel like I can crank out CP-1610 the best. It just feels more straightforward, so I can focus on writing the program rather than figuring out, say, the best way to multiplex the accumulator. Don't get me wrong: I had a blast writing the 6502 code that I did. I just feel more productive on the CP-1610. I've poked at Z80 briefly, and while I can manage it, it's not my fave. I've been spoiled by 16 bit registers. :-)

(TMS9900 wasn't too bad, actually. It had slightly better addressing modes, and more registers. It's been 20 years since I've written any TMS9900 assembly though, so I can't really compare it to anything. And, any machine that renames "logical OR" to "Set Ones Corresponding," doesn't even have a proper stack, and numbers its buses so that bit 0 is the most significant bit has gotta be a little wacky in my opinion.)


Yeah, the best thing about the CP1610 is that it just seems to simplify programming. It may not be very fast, but it seems a lot easier to implement more complex algorithms. It would be interesting to do a comparison to see what operations the CP1610 could do faster at its lower clock rate than the 6502 or Z80, for example.

On the other hand, I like the 6502 because there simply aren't a lot of ways to do things so you don't have to spend a lot of time juggling registers and figuring out how to optimize something. And it has that nifty zero page that shrinks the instruction size and speeds it up a bit.

The Z80 is overly complicated, but it does in fact have 16-bit registers and some 16-bit instructions. I particularly like the stack instructions since you can push and pull 16 bit values quite quickly – I use this technique for my Intellivision for Gameboy Color emulator to quickly grab new opcodes from the game ROMs.

The 8048 is a piece of junk, in my opinion. What, a 256 byte address space and something like a 16 byte stack? It does have two register sets, but this is a very, very limited processor. I can't believe they chose this as the main CPU for the Odyssey 2. Fortunately, it's kind of speedy… It was good enough to generate the waveform data for DK's sound.

Carl

#78 revolutionika OFFLINE  

revolutionika

    Quadrunner

  • 10,504 posts
  • Location:NC

Posted Wed May 2, 2012 2:05 PM

I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking.

#79 Carl Mueller Jr OFFLINE  

Carl Mueller Jr

    Moonsweeper

  • 304 posts
  • Location:Kagoshima, Japan

Posted Wed May 2, 2012 2:38 PM

I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking.

A joke post, I gather ;-)

The 8048 doesn't have any 16 bit registers to my recollection. And the 6502's only 16 bit register is the program counter. Plus, both processes run faster than the CP1610, so I doubt its clock rate is necessarily to any programmer's liking. :-)

#80 DZ-Jay ONLINE  

DZ-Jay

    Quadrunner

  • 5,030 posts
  • Ranger Elf: Saviour of Christmas!
  • Location:NC, USA

Posted Wed May 2, 2012 2:39 PM

I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking.


Well, I think it needs a higher bit-rate in the cowbell waveform.

#81 Fushek OFFLINE  

Fushek

    Stargunner

  • 1,087 posts
  • The saga of the Walking Dead continues soon!
  • Location:Clevelandish

Posted Wed May 2, 2012 3:03 PM


I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking.


Well, I think it needs a higher bit-rate in the cowbell waveform.


I have a video game fever ... and the only prescription ... is more cowbell waveform.

Attached Thumbnails

  • cowbell-blue_oyster_cult.gif


#82 intvnut OFFLINE  

intvnut

    Stargunner

  • 1,145 posts
  • Location:@R6 (top of stack)

Posted Wed May 2, 2012 6:34 PM

Yeah, the best thing about the CP1610 is that it just seems to simplify programming. It may not be very fast, but it seems a lot easier to implement more complex algorithms. It would be interesting to do a comparison to see what operations the CP1610 could do faster at its lower clock rate than the 6502 or Z80, for example.



I believe something as simple as a memory copy would be faster on the CP-1610, if you measured in bytes/sec. Here's two loops. The 6502 version below is limited to a maximum 128 byte copy, and I assume you can do the copy "backwards" for it, to merge your index register with the loop counter, which requires a "pre-decrement" and to terminate at -1 rather than 0 because LDA sets flags...

        ; CP-1610 version
loop:   MVI@    R4,     R0    ; 8 cycles
        MVO@    R0,     R5    ; 9 cycles
        DECR    R1            ; 6 cycles
        BNEQ    loop          ; 9 cycles

vs.

        ; 6502 version.  Assume 'src' ptr is in ($10), 'dst' ptr is in ($12), 
        ; and X is number of bytes to copy.
        DEX                   ; pre-adjust X so ($10),X points to last byte
loop:   LDA     ($10), X      ; 5 cycles
        STA     ($12), X      ; 6 cycles
        DEX                   ; 2 cycles
        BMI     loop          ; 3 cycles

So, it takes CP-1610 32 cycles for the Intellivision to copy 2 bytes, and 16 cycles for the 6502 to copy 1 byte. At the same clock rate, they copy at the same rate in bytes per second. But, consider all the provisos that come with the 6502 version, such as being limited to 128 bytes, etc. And if you unroll even just one time, the advantage starts to tip toward the CP-1600 (24.5 cycles/byte vs. 27 cycles/byte).

(Ok, I expect a 6502 expert to tell me all the ways I screwed up in 3... 2... 1...)

On the other hand, I like the 6502 because there simply aren't a lot of ways to do things so you don't have to spend a lot of time juggling registers and figuring out how to optimize something. And it has that nifty zero page that shrinks the instruction size and speeds it up a bit.


The zero-page isn't just nifty. It's a necessity, since there aren't enough registers. :-) The ZP is your register set.

The Z80 is overly complicated, but it does in fact have 16-bit registers and some 16-bit instructions. I particularly like the stack instructions since you can push and pull 16 bit values quite quickly – I use this technique for my Intellivision for Gameboy Color emulator to quickly grab new opcodes from the game ROMs.


I admit my brief brush with it stuck largely to the 8080 subset, which I believe only has HL (sometimes called "M").

The 8048 is a piece of junk, in my opinion. What, a 256 byte address space and something like a 16 byte stack? It does have two register sets, but this is a very, very limited processor. I can't believe they chose this as the main CPU for the Odyssey 2. Fortunately, it's kind of speedy… It was good enough to generate the waveform data for DK's sound.


It's a reasonable microcontroller meant mainly for tasks such as scanning a keyboard or controlling simple children's toys. It was stretched beyond those limits in the O2. The 8051 is a bit nicer and even has an external fetch mode. (I don't remember if the 8048 does.)

Edited by intvnut, Wed May 2, 2012 6:35 PM.


#83 Carl Mueller Jr OFFLINE  

Carl Mueller Jr

    Moonsweeper

  • 304 posts
  • Location:Kagoshima, Japan

Posted Wed May 2, 2012 6:48 PM

That's interesting that the CP1610 can match the 6502 for some operations. But you're using variable pointers… Supposing that you use the indexed immediate mode on the 6502, it would definitely win – particularly if your store was to the zero page. And I am well, well aware of the necessity of the zero page. It's the only way to set up a variable pointer. Also, it would be limited to 256 bytes, not 128. After which you would have to increment the high byte of your zero page pointer.

That's one criticism of the CP1610… No index registers.



 

#84 revolutionika OFFLINE  

revolutionika

    Quadrunner

  • 10,504 posts
  • Location:NC

Posted Wed May 2, 2012 7:58 PM


I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking.

A joke post, I gather ;-)

The 8048 doesn't have any 16 bit registers to my recollection. And the 6502's only 16 bit register is the program counter. Plus, both processes run faster than the CP1610, so I doubt its clock rate is necessarily to any programmer's liking. :-)



LOL, did it almost sound like I knew what I was talking about? :grin:

#85 cmart604 OFFLINE  

cmart604

    Quadrunner

  • 7,685 posts
  • Location:Vancouver

Posted Wed May 2, 2012 8:06 PM



I concur, the 8048 is not what it used to be. The 16 bit registers are lacking in comparison to the6502. The processor doesnt run the 16 bit processor smoothly as we had all hoped. The CPU clock speed could be bumped up to compensate the inferior opcodes and in turn run the ROMS much nicer. We should hope for more complex algorithms with the CP1610 for sure as the clock rate is much more to programmers liking.

A joke post, I gather ;-)

The 8048 doesn't have any 16 bit registers to my recollection. And the 6502's only 16 bit register is the program counter. Plus, both processes run faster than the CP1610, so I doubt its clock rate is necessarily to any programmer's liking. :-)



LOL, did it almost sound like I knew what I was talking about? :grin:


Lol! I almost pissed myself laughing when I saw that. I thought "Rev doesn't know what he's talking about, he's just talking out of his ass, and yet it sounds pretty good"! I love it when you guys talk Sanskrit, or whatever the hell language you're speaking. As Rev said, if it leads to more awesome games getting made then I'm all for it. :)

#86 revolutionika OFFLINE  

revolutionika

    Quadrunner

  • 10,504 posts
  • Location:NC

Posted Wed May 2, 2012 8:30 PM

What was this thread about again?


In B 4 lock!

#87 intvnut OFFLINE  

intvnut

    Stargunner

  • 1,145 posts
  • Location:@R6 (top of stack)

Posted Wed May 2, 2012 10:26 PM

That's interesting that the CP1610 can match the 6502 for some operations. But you're using variable pointers… Supposing that you use the indexed immediate mode on the 6502, it would definitely win – particularly if your store was to the zero page. And I am well, well aware of the necessity of the zero page. It's the only way to set up a variable pointer. Also, it would be limited to 256 bytes, not 128. After which you would have to increment the high byte of your zero page pointer.


Fair enough -- if the "to" and "from" buffers are fixed, indexed immediate does shave a couple cycles. That's useful in some, but not all cases. The peculiar limitation to 128 in my code above was kinda to illustrate a point. It could easily have been modified to allow copying 256 bytes, but only if you pre-decremented the two pointers, so you could terminate the loop at 0 rather than -1.

The point is -- yes, the 6502 may be somewhat faster at certain things, but it's often less straightforward. At least to me, it can seem that way.

That's one criticism of the CP1610… No index registers.


Yes, some sort of indexed addressing mode would be nice. A "@R3[5]" type of mode (ie. access 5 words after R3) would be very helpful. It would make accessing data structures much more convenient.

Edited by intvnut, Wed May 2, 2012 10:28 PM.


#88 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • 7,797 posts
  • Busy bee!
  • Location:North, England

Posted Thu May 3, 2012 1:08 AM

		; 6502 version.  Assume 'src' ptr is in ($10), 'dst' ptr is in ($12),
		; and X is number of bytes to copy.
		DEX				   ; pre-adjust X so ($10),X points to last byte
loop:
	LDA	 ($10), X	  ; 5 cycles
	STA	 ($12), X	  ; 6 cycles
	DEX				   ; 2 cycles
	BMI	 loop		  ; 3 cycles


Theres a couple of invalid instructions in there ;). You have to use the Y register for the lda/sta using indirect addressing and if you pass a value in X which less than or equal to 127 you'll only copy one byte because the value won't be negative the first time through the loop.

To copy a straight 256 bytes you'd use something like :-

; 6502 version.  Assume 'src' ptr is in ($10), 'dst' ptr is in ($12),
; Copy 256 bytes
	LDY #0	; 2 cycles
loop:
	LDA ($10), Y 	 ; 5 cycles
	STA ($12), Y 	 ; 6 cycles
	INY 		 ; 2 cycles
	BNE loop	 ; 3 cycles if taken, 2 if not

If you used absolute addressing, y for the source and destination addresses you'd save 2 cycles per loop unless you crossed a page boundary.

#89 intvnut OFFLINE  

intvnut

    Stargunner

  • 1,145 posts
  • Location:@R6 (top of stack)

Posted Thu May 3, 2012 6:01 AM

Theres a couple of invalid instructions in there ;). You have to use the Y register for the lda/sta using indirect addressing and if you pass a value in X which less than or equal to 127 you'll only copy one byte because the value won't be negative the first time through the loop.


Which kinda underscores my point... I could never remember which of X and Y does indirect-indexed and indexed-indirect. :-) Ironic thing is, I had the answer in front of me when I looked up the cycle counts, but I guess I had Teflon-brain at that moment.

To copy a straight 256 bytes you'd use something like :-

; 6502 version.  Assume 'src' ptr is in ($10), 'dst' ptr is in ($12),
; Copy 256 bytes
	LDY #0	; 2 cycles
loop:
	LDA ($10), Y 	 ; 5 cycles
	STA ($12), Y 	 ; 6 cycles
	INY 		 ; 2 cycles
	BNE loop	 ; 3 cycles if taken, 2 if not


... which only copies exactly 256 bytes. What about "up to 256 bytes?"

If you used absolute addressing, y for the source and destination addresses you'd save 2 cycles per loop unless you crossed a page boundary.


I believe that's what Carl was calling "indexed immediate". I understood it to mean "LDA $1234, X" and "STA $1234, X". Works if source and destination are fixed, and all copies are less than or equal to 256 bytes.

Again, it kinda underscores my point: 6502, you can get there and often get faster code, but the path is generally never as straight or straightforward as it is for CP-1610. That's not to say you can't do tricky things on the CP-1610 -- you can do tricky things on any CPU -- but contrary to Carl's claim that there's fewer ways to do it on the 6502 and so you remained focused, I'd claim otherwise. :-)

Here's a fun one I think the 6502 might have more trouble with, esp when you consider some values are larger than 8 bits: http://spatula-city....y/dist_fast.asm Sure, you can use some zero-page variables and direct addressing to get there. Thankfully, most 6502 machines don't have to worry about reentrancy...

#90 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • 7,797 posts
  • Busy bee!
  • Location:North, England

Posted Thu May 3, 2012 6:30 AM

... which only copies exactly 256 bytes. What about "up to 256 bytes?"


Without adjusting the source and destination memory pointers to be off by one I can't think of a quick way without using the X register to keep track in the loop. Adjusting the pointers isn't too bad because you'd just use something like (CC65 syntax) :-

	lda #.lobyte(SrcAddress-1)
	sta $10
	lda #.hibyte(SrcAddress-1)
	sta $10+1

Or make a macro to wrap around the function call and the assembler would do all the work ;).

Here's a fun one I think the 6502 might have more trouble with, esp when you consider some values are larger than 8 bits: http://spatula-city....y/dist_fast.asm Sure, you can use some zero-page variables and direct addressing to get there. Thankfully, most 6502 machines don't have to worry about reentrancy...


Cool! I need a distance computation for a project. I might even convert it to 6502 at some point :lol:.

#91 intvnut OFFLINE  

intvnut

    Stargunner

  • 1,145 posts
  • Location:@R6 (top of stack)

Posted Thu May 3, 2012 7:18 AM


Here's a fun one I think the 6502 might have more trouble with, esp when you consider some values are larger than 8 bits: http://spatula-city....y/dist_fast.asm Sure, you can use some zero-page variables and direct addressing to get there. Thankfully, most 6502 machines don't have to worry about reentrancy...


Cool! I need a distance computation for a project. I might even convert it to 6502 at some point :lol:.


Go for it. BTW, that particular copy of the source file says "GPL v2", but later I went and re-released pretty much all my library code in the public domain. The algorithm itself, as I said in the comments, comes from Graphics Gems, and is itself available to all I believe.

It does work pretty well. For amusement sometime, you might try plotting an "error map" for the function. It's actually rather interesting.

#92 GroovyBee OFFLINE  

GroovyBee

    Games Developer

  • 7,797 posts
  • Busy bee!
  • Location:North, England

Posted Thu May 3, 2012 7:24 AM

The Graphic Gems are an interesting set of books. I also have several of the Game Programming Gems series too.




0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users