Jump to content

Photo

Smooth Scrolling in C

gcc smooth scrolling

23 replies to this topic

#1 TheMole OFFLINE  

TheMole

    Dragonstomper

  • 819 posts
  • Location:Belgium

Posted Wed Feb 26, 2014 8:47 AM

After seeing Rasmus's great work, I decided I wanted to start working on my own smooth scrolling games for our beloved TI. Initially, I was set on using the F18A for the scrolling functionality, but alas... since I don't have a hardware setup right now and no current emulator supports the thing's scrolling registers I'm stuck with the good ol' tms9918a (for now).

 

Since I'm lazy by nature and I didn't feel like programming a whole game in assembly, I couldn't use Rasmus' excellent scrolling example code and had to re-implement it in C. I also didn't want to spend too much time transforming the assembly output from Magellan into something directly usable from C. I looked at adding a C exporter to Magellan, but the export .java source file alone was so daunting I decided to write my own tool to generate the scrolling patterns. Since I prefer to work in The Gimp to create the level, I wrote a simple command line program that takes a 16-color bitmap file that represents the scrolling map and generates a C header file with the pattern, color and nametable data (graphics mode only, for now). Maybe I'll look at turning this into a Gimp export filter at some point.

 

For those that are interested, the simplest horizontal scrolling C application that uses the exported header file is only 100 lines of code and looks something like this:

// Includes
#include "libti99/vdp.h"  	// Tursi's libti99, VDP functions
#include "tistdio.h"		// Quick set of functions for keyboard scanning
#include "level.h"		// Generated header file containing map data

#define SIT1	0x01
#define SIT2	0x03
#define CT	0xFF

// copy 8 pattern tables into VDP RAM
void init_patterntables()
{
	int frame = 0;
	
	// Write 8 pattern tables to VDP memory
	vdpmemcpy(0x800 * frame, patt_frame0, 768); frame++;
	vdpmemcpy(0x800 * frame, patt_frame1, 768); frame++;
	vdpmemcpy(0x800 * frame, patt_frame2, 768); frame++;
	vdpmemcpy(0x800 * frame, patt_frame3, 768); frame++;
	vdpmemcpy(0x800 * frame, patt_frame4, 768); frame++;
	vdpmemcpy(0x800 * frame, patt_frame5, 768); frame++;
	vdpmemcpy(0x800 * frame, patt_frame6, 768); frame++;
	vdpmemcpy(0x800 * frame, patt_frame7, 768);
}

// Copy colortable to VDP RAM
void init_colortable()
{
	// Init the first two black, the third one gray
	vdpmemcpy((CT * 0x40), colortable, 13);
}

void init_nametable()
{
	int x, y;
	
	for (x = 0; x < 32; x++)
		for (y = 0; y < 24; y++)
			vdpmemcpy((SIT1 * 0x400) + (x + (y * 32)), &(map[y][x]), 1);
}

void copy_pattern_block(int col, int frame, int backbuffer_sit)
{
	int row = frame * 3;

	col++;
	vdpmemcpy((backbuffer_sit * 0x400) + (row * 32), &(map[row][col]), 32); 	row++;
	vdpmemcpy((backbuffer_sit * 0x400) + (row * 32), &(map[row][col]), 32); 	row++;
	vdpmemcpy((backbuffer_sit * 0x400) + (row * 32), &(map[row][col]), 32);
}

int main(int argc, char *argv[])
{
	int x, prev_x;
	int frame, backbuffer_sit;
	
	// Init graphics system
	x = set_graphics(1);
	VDP_SET_REGISTER(VDP_REG_MODE1, x);
	VDP_SET_REGISTER(VDP_REG_PDT, 0);	
	VDP_SET_REGISTER(VDP_REG_CT, CT);	
	VDP_SET_REGISTER(VDP_REG_SIT, SIT1);
	VDP_SET_REGISTER(VDP_REG_COL, 0xF1);
	init_patterntables();
	init_colortable();
	init_nametable();

	prev_x = x = 0;
	backbuffer_sit = SIT2;
	while(1)
	{
		// Scan keys and do movement
		// scan_keys();

		// UP/'E' pressed, move forward
		if (check_key(2,0x4000)) 
			x++;

		frame = (x) % 8;
		if (x != prev_x)
		{
			// Move backbuffer to front and vice-versa
			if (frame == 0)
			{
				VDP_SET_REGISTER(VDP_REG_SIT, backbuffer_sit);
				backbuffer_sit = (backbuffer_sit == SIT1) ? SIT2 : SIT1;
			}

			// Advanced frame 1 pixel (aka move pattern descriptor table pointer one position up)
			VDP_SET_REGISTER(VDP_REG_PDT, frame);	

			// Write 3 rows of the next full frame to the backbuffer
			// Depending on frame this is either 0-2 (frame 0), 3-5 (frame 1), 6-8 (frame 2), ...
			copy_pattern_block(x >> 3, frame, backbuffer_sit);
			
			if (x > 1016)
				x = 0;

			prev_x = x;
		}
	}

        return 0;
}

Project files, FIAD file and disk image attached for those who want to see it in action (EA#5, ALEXKIDD). The generated code is in level.h and is untouched, what you see is what the tool generated. I tried making a video, but the result looked anything but smooth. If you want to run it in an emulator, I suggest MESS as Classic99's timing is a bit off and makes it look a bit jittery. As you'll see, it runs quite fast as-is, but I'm sure there's room for improvement as this is completely unoptimized code.

 

I'll make the tool available soon as well, but I need to clean it up a bit before it's fit for public consumption. I also need to add a binary file export function, 'cause the C header files are actually way too memory hungry for any practical use. In the future I hope to also add up-down and bitmap mode export functionality. Currently it just does what I needed it to do, and that's it.

 

Attached Files



#2 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,113 posts
  • Location:Denmark

Posted Wed Feb 26, 2014 10:08 AM

Great, but it doesn't look like you're synchronizing with vertical refresh. I think that would do a lot for the smoothness... 



#3 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Wed Feb 26, 2014 3:59 PM

True, I justed wanted to see how fast it could run without waiting for vsync. That's a thousand frame in slightly under 6 seconds, so 175fps or so. Plenty of room for game logic. Tursi's lib makes it as simple as one call to add vsync though, so when I turn this into a game it'll definitely be there.

I recorded a video (no vsync) on my mac, through mess. It still looks a lot smoother on screen than in the recording, and the colors don't do well with the youtube compression, but a video is always a bit nicer for people to look at, so here goes:

 



#4 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,113 posts
  • Location:Denmark

Posted Wed Feb 26, 2014 4:39 PM

Did you try to measure the time of a main loop using a T(XXXX-YYYY) 'breakpoint' in Classic99?



#5 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Thu Feb 27, 2014 2:52 AM

Did I do what with the what now? :)

Actually, I didn't know there was such a function. I can't seem to get it to work though. The loop for 8 frames starts at >CBCC and ends at >A2A6, so I would suspect adding a 'breakpoint' with T(CBCC-A2A6) would do the trick, but I see no output in the debug window... I'm probably just doing it wrong.



#6 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,113 posts
  • Location:Denmark

Posted Thu Feb 27, 2014 5:02 AM

You must have Scroll Lock on for breakpoints to work.



#7 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Thu Feb 27, 2014 12:30 PM

Ah, ok. Took me a while to find the scroll lock key on my laptop (had to find an external keyboard actually...).

 

It's a bit of a hassle figuring out where the while loop starts and ends, but if I instrumented the code correctly it seems that one iteration of the main while loop (without scanning for keys or vsync so free-running only automatic scrolling from left to right) takes on average 15000 cycles, or 0.005 seconds, which puts the framerate at around 200fps (which is close enough to my hand-timed number to make me believe I've used accurate breakpoints). I'd be curious to know how many cycles it takes with hand written assembly code.


Edited by TheMole, Thu Feb 27, 2014 12:31 PM.


#8 artrag OFFLINE  

artrag

    Stargunner

  • 1,258 posts

Posted Fri Feb 28, 2014 12:58 AM

Very nice! when do you release the tool ?



#9 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Fri Feb 28, 2014 4:44 AM

If all goes well that should be very soon, I need to clean it up a little (add proper getopt support and put some safeguard tests in there to ensure it doesn't crash) and make sure it compiles on windows, but that shouldn't take long. Hopefully I'll find a couple of hours this week-end. Should also be useful for the MSX and (memory permitting) Colecovision folks.



#10 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,113 posts
  • Location:Denmark

Posted Fri Feb 28, 2014 7:26 AM

Ah, ok. Took me a while to find the scroll lock key on my laptop (had to find an external keyboard actually...).

 

It's a bit of a hassle figuring out where the while loop starts and ends, but if I instrumented the code correctly it seems that one iteration of the main while loop (without scanning for keys or vsync so free-running only automatic scrolling from left to right) takes on average 15000 cycles, or 0.005 seconds, which puts the framerate at around 200fps (which is close enough to my hand-timed number to make me believe I've used accurate breakpoints). I'd be curious to know how many cycles it takes with hand written assembly code.

 

I will check what I get from pure assembler. It all boils down to the speed of updating the name/screen image table since all the patterns are pre-loaded. I think you're scrolling one pixel at a time, i.e. do you update 3 rows per frame, right?



#11 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Fri Feb 28, 2014 8:30 AM

 

I will check what I get from pure assembler. It all boils down to the speed of updating the name/screen image table since all the patterns are pre-loaded. I think you're scrolling one pixel at a time, i.e. do you update 3 rows per frame, right?

 

Indeed, per-pixel. The following block of code copies the three rows from the map array, which is a 2D array organized per row (that way, you can bulk-copy 32 bytes per loop).

vdpmemcpy((backbuffer_sit * 0x400) + (row * 32), &(map[row][col]), 32); 	row++;
vdpmemcpy((backbuffer_sit * 0x400) + (row * 32), &(map[row][col]), 32); 	row++;
vdpmemcpy((backbuffer_sit * 0x400) + (row * 32), &(map[row][col]), 32);

The vdpmemcpy function comes from Tursi's lib, but it's so simple it probably compiles to the most efficient assembly already:

void vdpmemcpy(int pAddr, const unsigned char *pSrc, int cnt) {
	VDP_SET_ADDRESS_WRITE(pAddr);
	while (cnt--) {
		VDPWD=*(pSrc++);
	}
}

I can of course win some cycles by inlining the function, or replacing it by a macro to avoid the context switch overhead.

All-in-all, the code is so simple I'd be surprised if there's a huge speed difference once the obvious optimizations are done.


Edited by TheMole, Fri Feb 28, 2014 8:32 AM.


#12 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,113 posts
  • Location:Denmark

Posted Fri Feb 28, 2014 9:35 AM

 

All-in-all, the code is so simple I'd be surprised if there's a huge speed difference once the obvious optimizations are done.

 

In the sample code I made for kl99 (see http://atariage.com/forums/topic/210888-smooth-scrolling/page-11#entry2894640) the scrolling code alone is taking about 10500 clock cycles. You could probably reduce this to about 9000 if you moved the copying loop into scratch pad. However, the technique I have been using in slightly different from yours.

 

You are using 8 pattern tables located all over VDP RAM. I'm only using the first 8K for pattern tables, so I have to split each table into two. This means I either need two maps (one for frame 0-3 and another for frames 4-7) or I need to set the msb of each the byte I copy in frames 4-7. In the demo I'm only using one map, so frames 4-7 take about 18000 clock cycles.

 

I guess your technique is better if you are able to fit your sprites patterns into the top half of a 2K segment. There is also an issue if you want to use disk access, in which case you have to save the buffers in the top of VDP RAM. But all in all it looks like it's perfectly possible to make a smooth scrolling game in C. Well done!  :)

 

BTW, I never had time to follow all the steps in setting up gcc on my own computer, is it possible to zip an installation and copy it over?



#13 artrag OFFLINE  

artrag

    Stargunner

  • 1,258 posts

Posted Sat Mar 1, 2014 2:45 AM

Actually TheMole's solution seems pretty brilliant, but there are side effects :-)

The problem is that 8 tables of patterns take 8*2K = 16K of vram.

This implies that in order to allocate sprites, two PNT's (768 bytes each) and the color tables you have to cut down the max number of  patterns used in 3 tables out of 8.

You will end to have 256 tiles for offsets 0,1,2,3,4 and about 128 tiles for offset 5,6 and 7

 

In the end, as hardly you can use 256 tiles on offsets 0-4 without having the same amount of tiles on offset 5-7 you will not get very different results by RasmusM's option that uses only 4 tables shared among the 8 offsets.

 

On MSX the PNT can safely be updated on ISR without need of double buffering it, so I used a more complex solution.

 

I use a single level map where each entry represents a couple of adjacent 8x8 tiles in the PNG image I use as input: say N the number of couples and assume 8 offsets for the scrolling. 

My tool generates an intermediate table Nx8 that for each couple (byte on the level map) returns the VRAM tile number for each offset phase.

 

The gain is that if the same tile appears on the screen in different offsets and the two offsets are stored in the same pattern table in VRAM, that tile can be loaded in VRAM only once.

The loss is that I have to update the PNT at each scroll step and in order to update the PNT I need to read a byte in the MAP, access to the intermediate table to get the tile number in VRAM, and then update the VRAM

Anyway on msx this extra passage is perfectly bearable and all the PNT update is fits in the vblank time.


Edited by artrag, Sat Mar 1, 2014 2:46 AM.


#14 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,113 posts
  • Location:Denmark

Posted Sat Mar 1, 2014 3:33 AM

On MSX the PNT can safely be updated on ISR without need of double buffering it, so I used a more complex solution.

 

I use a single level map where each entry represents a couple of adjacent 8x8 tiles in the PNG image I use as input: say N the number of couples and assume 8 offsets for the scrolling. 

My tool generates an intermediate table Nx8 that for each couple (byte on the level map) returns the VRAM tile number for each offset phase.

 

The gain is that if the same tile appears on the screen in different offsets and the two offsets are stored in the same pattern table in VRAM, that tile can be loaded in VRAM only once.

The loss is that I have to update the PNT at each scroll step and in order to update the PNT I need to read a byte in the MAP, access to the intermediate table to get the tile number in VRAM, and then update the VRAM

Anyway on msx this extra passage is perfectly bearable and all the PNT update is fits in the vblank time.

 

I finally understand how your algorithm  works  :)

 

Can you give an estimate of how many tiles you save by this method, i.e. how often is the same pattern used in different offsets? Except for blank -> blank tile transitions I wouldn't think is was very often.



#15 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Sat Mar 1, 2014 3:57 AM

I use a single level map where each entry represents a couple of adjacent 8x8 tiles in the PNG image I use as input: say N the number of couples and assume 8 offsets for the scrolling. 

My tool generates an intermediate table Nx8 that for each couple (byte on the level map) returns the VRAM tile number for each offset phase.

 

I think I understand what you're doing, and the concept is pretty neat! Depending on the level design there's often a lot of repeating patterns even within a single pair of tiles (look at some of my "grass" tiles, there's only 2 variants for the 8 scroll positions). I imagine you can save quite a bit there.

 

But a couple of things are still unclear to me. Like, your VRAM memory map layout. And do you simply upload the entire nametable during vblank, or do you only update the bytes that have a different resolution in the Nx8 table?

 

Also, wouldn't mind having a look at the tool you have.



#16 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Sat Mar 1, 2014 4:15 AM

 

You are using 8 pattern tables located all over VDP RAM. I'm only using the first 8K for pattern tables, so I have to split each table into two. This means I either need two maps (one for frame 0-3 and another for frames 4-7) or I need to set the msb of each the byte I copy in frames 4-7. In the demo I'm only using one map, so frames 4-7 take about 18000 clock cycles.

 

I guess your technique is better if you are able to fit your sprites patterns into the top half of a 2K segment. 

 

BTW, I never had time to follow all the steps in setting up gcc on my own computer, is it possible to zip an installation and copy it over?

 

No problem. Mac, Windows or Linux? 

 

As for VRAM consumption, I think the SGT might be a problem or I'll have to be very smart about how I organize the sprite animations. My player character alone consists of 4 2x2 sprites (to get two color sprites of 32 pixels high), and 7 stances (standing, walking (3 animation frames), jumping, ducking and punching) in two directions (facing left and right). So if there were no duplication in the patterns that'd be 112 patterns (out of a maximum of 128 for the upper half of the table) for each direction. Enemies are typically 2 2x2 sprites, with two animation frames and two directions, so each enemy takes a theoretical maximum of 16 patterns. You'd need at least three enemy types per level, so worst case scenario I'd need to be able to store 320 patterns. In my VRAM map, there's room for 5 tables, but the animations for the different on-screen characters will need to line up nicely. I think that'll require another tool to prepare the data efficiently...

 

Splitting out the different animation frames in different SGT locations would be a good start (also minimizes the number of bytes I need to send to the VDP for each individual sprite update). Of course, that would mean that the enemy sprite animations cycles need to be synced for all on-screen enemies.

 

Still lots of interesting challenges to overcome before I can turn this into the game I have in mind :).



#17 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Sat Mar 1, 2014 4:18 AM

Since I'm on my Mac right now, Mac toolchain is available here: https://dl.dropboxus...gcc-tms9900.zip

Let me know if you need Windows or Linux as well.

 



#18 Asmusr OFFLINE  

Asmusr

    River Patroller

  • 3,113 posts
  • Location:Denmark

Posted Sat Mar 1, 2014 4:44 AM

Since I'm on my Mac right now, Mac toolchain is available here: https://dl.dropboxus...gcc-tms9900.zip

Let me know if you need Windows or Linux as well.

 

 

Thank you, it's really Windows I need it for.



#19 artrag OFFLINE  

artrag

    Stargunner

  • 1,258 posts

Posted Sat Mar 1, 2014 6:26 AM

 

I think I understand what you're doing, and the concept is pretty neat! Depending on the level design there's often a lot of repeating patterns even within a single pair of tiles (look at some of my "grass" tiles, there's only 2 variants for the 8 scroll positions). I imagine you can save quite a bit there.

 

But a couple of things are still unclear to me. Like, your VRAM memory map layout. And do you simply upload the entire nametable during vblank, or do you only update the bytes that have a different resolution in the Nx8 table?

 

Also, wouldn't mind having a look at the tool you have.

My VRAM layout is very complex as I use a special hybrid mode that allows the use of what you call bitmap mode and two pages of tiles.

If you can run an msx emulator (try e.g.  http://www.bluemsx.com/ ) you can look at this rom

https://sites.google...redirects=0&d=1

 

My tools are scripts in Matlab, if you can run them, I've not problem to send you my files

 

PS
this is a tech demo on youtube


Edited by artrag, Sat Mar 1, 2014 11:06 AM.


#20 artrag OFFLINE  

artrag

    Stargunner

  • 1,258 posts

Posted Sun Mar 2, 2014 3:25 AM

 

I finally understand how your algorithm  works  :)

 

Can you give an estimate of how many tiles you save by this method, i.e. how often is the same pattern used in different offsets? Except for blank -> blank tile transitions I wouldn't think is was very often.

In general it depends on the graphic of your levels.

In the Uridium demo you know I save a huge amount of tiles: all the 8 levels take the very same tile banks where I have 

- 240 tiles in the middle bank and 245 tiles in the lower bank for offsets 0 and 2

- 253 tiles in the middle bank and 255 tiles in the lower bank for offsets 1 and 3



#21 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Sun Mar 2, 2014 4:15 AM

PS

this is a tech demo on youtube

 

As I said before, this is an awesome demo. I've always wondered how you're doing the 3D scrolling though. I assume this is actually a pre-rendered animation? Would it be possible to use the same technique to show a 256*128 full motion video? If so, this could be a great way to port the famous bad apple demo to the MSX :).



#22 artrag OFFLINE  

artrag

    Stargunner

  • 1,258 posts

Posted Sun Mar 2, 2014 4:32 AM

My video shows only the double buffering feature of the VRAM layout. Actually any prerendered animation could be shown provided it fits in 256x128 and that lower and middle banks can share their color definition. if this is the bad apple demo 

it would be perfect for a port as it i black and white. Where can I find the loose frames ?

The Matlab tools that generate the data for the video computes a new PNT (512 bytes) per frame and which tiles have to be changed wrt the content of the current VRAM banks.

So the animations you see need a magarom of 1M (with page mapper) to be executed.

 

For scrolling things are a bit different : I do not update tiles, but only the PNT at each frame, passing through the Nx8 table you know.

As I can store in VRAM 256+256 tiles in page 0 and 256+256 tiles in page 1, nice levels can safely fit without other load than the I/O of 512 bytes per frame



#23 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Sun Mar 2, 2014 4:34 AM

Here's the code for the pattern data generation tool. I haven't done any clean-up on this, and the makefile only works on OSX, but it should be straightforward for anyone with knowledge of C and makefiles to compile this on Windows as well.

 

It currently depends on SDL 2.0, so if you want to compile the code as-is, you'll need to have the developer libraries for that installed. I used it to import the bitmap and have a way of visualizing my progress. When this is in a more finished state I'll probably rip it out so it can be a real console program.

 

To use it, just launch the app with a 16-color index bmp as the only argument. It'll show a double-sized 256*192 screen of the image, you can use the arrow keys to scroll in 4 directions on 8-pixels steps, and use the numpad + and - keys to change the current frame. Press escape to exit. At exit it will write a file called level.h that contains the patterns for the 8 frames, the color table and the map data in a 2D array.

 

I will be updating this in the future, but I'm going on a business trip next week and don't know when I'll have time to work on this again, so I wanted to share what I had already.

Attached Files


Edited by TheMole, Sun Mar 2, 2014 4:35 AM.


#24 TheMole OFFLINE  

TheMole

    Dragonstomper

  • Topic Starter
  • 819 posts
  • Location:Belgium

Posted Sun Mar 2, 2014 4:42 AM

if this is the bad apple demo it would be perfect for a port as it i black and white. Where can I find the loose frames ?

 

Yup, that's the genesis version, which is definitely the best home console version. There's a NES and gameboy version out there as well, which are of course less impressive, but are cool as well.

 

No idea where the raw frames can be gotten, but the developer is an active poster on the sega16 forum (his username is Stef), and has posted the sources of the mega drive version, so maybe you can find something useful there.

http://www.sega-16.c...ple-demo-thread

 

By the way, this is a better version of the video. I probably would have had a geek teen orgasm if I saw this running on my mega drive back in the day...


Edited by TheMole, Sun Mar 2, 2014 4:47 AM.






Also tagged with one or more of these keywords: gcc, smooth scrolling

0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users