Jump to content


Gauntlet by Donald R. Lebeau 1984

169 replies to this topic

#101 donlebeau OFFLINE  


    Space Invader

  • 29 posts

Posted Wed Jan 30, 2008 12:30 PM

Ring Buffers Introduction:

I have a a design philosophy that can be summed up in one sentence: "Make it work reliably and fast." This sentence can be broken down into the basic steps of a project:

1. Make it.
2. Make it work.
3. Make it work reliably.
4. Make it work reliably and fast.

The important thing is to approach a project in order. For instance a common mistake is to try to optimize a design (making it "fast") before you've laid down a foundation that works reliably. I tend to beat up on my developers if they are trying to find "clever" ways to optimize a design before they have the thing built, working, and bombproof. Usually they will miss a reliability issue whil trying to cut corners and can waste a ton of time.

I've worked on projects the took over a year to develop the first "proof of concept". It would be slow as heck, but bombproof. Then *after* everyone was satisfied with the functionality, we would start optiimizing. And you know what? We were able to optimize the project very quickly and make great performance because we had defined the entire project as a whole before looking at where time could be saved. Because we had the complete overall picture, we could see optimizations that we would have never seen if we had tried to optimize the individual parts separately before they were integrated. We also accounted for any reliably issues first and dealt with them before trying to speed things up. There's nothing more frustrating that trying to debug intermittent problems in cryptic, cleverly optimized code. Get those issues out of the way before you get clever and your life will be a lot easier.

Ok, why the lecture? Because I'm going the split the ring buffer discussion into four parts. Each will show one step in the process. The first three parts wil show what ring buffers are and how they work. The fourth part will show how to optimize them. By the time we get to the fourth part, you should understand ring buffers enough that the optimizations make sense. (You may want to punch me after reading the first part, but bear with me for a bit Besides, I'm having fun writing this stuff. :) )

For Gauntlet, these optimizations made the difference between a playable game and an unplayable excercise in drawing graphics on a screen.

Hopefully at some point in the discussion everyone will get *something* out of it.

Ring Buffers:

What is a ring buffer?

A ring buffer conceptually is a First-In First-Out (FIFO) buffer that doesn't run out of space. It is implemented as a fixed sized array that uses seperate "Put" and "Get" pointers to store and retreive data from the array. Any type of data can be stored in the ring buffer, all the way from bytes to, pointers to objects, to actual objects. For the sake of this discussion we will only be storing bytes in the ring buffer, but the mechanics (and code) is the same for whatever data is stored in them.

A ring buffer is basically a place to temporily store data. It is commonly refered to as a "queue" or just generically as a FIFO. The terms "ring buffer", "queue", and "FIFO" are pretty much interchangable. Usually the same underlying code supports any of these programming abstractions. For instance, when someone says that they are queuing data, there is usually a ring buffer implemented to act as the queue.

Why would you need them?

Ring buffers allow two or more asyncronous processes to talk to each other. One process can put stuff into the ring buffer and sometime later on another process can take it out. The order that the data is put into the queue is preserved, so the FIFO keeps the data in sequence. (If you didn't stumble on me switching the terms around in this description, we're doing good, otherwise go back and read the last paragraph again. :) )

Anytime you want to store data temporarily or buffer it over to to another process, a ring buffer is usually the approach you would take.

There are several advantages to using ring buffers:

1. Asyncronous passing of data between processes: Data can be passed between unrelated processes. There doesn't have to be a one-for-one relationship between putting the data into the ring buffer and taking it out. As long as the buffer doesn't overflow items can be put into the buffer and stored there until it's convienient to take them out. Sometimes a process will use a ring buffer to store data for itself asyncroniously - for example a command parser might gather enough keystrokes until it has enough an entire command sequence and then process it all at once.

2. Flow control: Ring buffers can be fitted with controls to throttle the process that is putting data into them so that the buffer doesn't overrun. This "handshaking" allows, for example, a high speed communication like to interface with a slower command parser.

3. Allows communication between seperate processes running in interrupt and background space: Passing data from an interrupt to a background task or vice versa is one of ths primary uses of ring buffers. The interrupt can react quickly to a communication or timer interrupt and then queue that work for later. This is how the ring buffers were used in Gauntlet. The ring buffer routines are re-entrant, so calling they can be called from an interrupt or from a background task without corrupting data.

4. Supports multiple sources and consumers of data: there is no limit to the number of data sources or syncs that a ring buffer can have. The ring buffer contents are stored in the order that they were received in *regardless* of the source, so it's also a way to sort a sequencing of events by time.

5. Can be monitored and used to optimize workload: Ring buffers can be fitted with counters to tell how much data is in them. This information can then be used to manage the workload across a number of data sources. I describe how Gauntlet does some of this in a previous post.

6. Keeps a history of data events which is useful for debugging: Because ring buffers have very low overhead, you can insert them into software components to track data streams. This allows you to dump them for dubugging purposes. For example, in the machine I'm currently working on, I get encoder updates as it spins. We use the encoder for positioning and product tracking. Although the software uses the encoder values immediately, it also stuffs them into a ring buffer. If we ever suspect that the machine is having an encoder problem, we can dump the contents of the ring buffer and see the last 1000 encoder updates. If there is a hardware problem, it becomes very obvious.

7. Very simple code: Ring buffer code is very simple and totally bombproof. One of the amazing thins is that as you optimize it for speed, the code actually gets *simpler*. That's becasue you can take advantage of microcomputer architecture to optimize some functionality that you would normally have to do in code. You'll see what I'm talking about as we get to the "and fast" part of the discussion.

So now we've laid the foundation and we can proceed to Step 1: Making our own ring buffer!

#102 happymonster OFFLINE  


    Chopper Commander

  • 121 posts

Posted Wed Jan 30, 2008 1:43 PM

Eagerly awaiting more! :)

#103 donlebeau OFFLINE  


    Space Invader

  • 29 posts

Posted Wed Jan 30, 2008 3:57 PM

Ring Buffers step 1 - Make it:

Let's make our own ring buffer! We're going to do this in the C language because the logic will be clearer and some of the features that we add later on will very convoluted to code in 6502 assmbly. I will give you the *final* code used in Gauntlet at the end - the optimized code is very tiny. It's the intermediate stages of the code that get unweildy and it would be a waste of time to present it here.

The first thing that we need is the ring buffer itself. We're going to be storing bytes in our example, but remember you can store *any* data type that your language supports. We want our ring buffer to hold 100 bytes of data. ere it is:

#define BUFSIZE 100

byte RingBuf[BUFSIZE];

There it is! Isn't it pretty? I know it looks a lot like the data arrays I described in my previous post, but it's different - it's named a "RingBuf". Duh. (Told you I was going to have fun with this... :) )

What makes a ring buffer different from an array is how the data is accessed. In an array you use a single index to access the data. A ring buffer uses two indexes to access the data. By convention these indexes are named "Put" and "Get". Let's make those:

int Put = 0;
int Get = 0;

I've allocated my indexes and set them both to zero. In an array, the first location is at index 0 and the last location is at index (ArraySize = 1). For our sample ring buffer which holds 100 bytes, the first data byte is at index 0 and the last one is at index 99. (I know I'm boring at least half of you by now...)

Our ring buffer is set up and already to go. BTW, did you notice that I never cleared the data in the ring buffer? Clearing the data in the buffer itself is optional, and most people never do it. Once you allocate the buffer and clear the Put and Get indexes it's ready for use. Even during use, most people never clear the contents - that is very time consuming and doesn't really accomplish anything. With ring buffers, all the internal parts are self cleaning.

This is a bare bones ring buffer. For 90% of your applications this is all that you'll need. You can add features like data counters and flow control if you need them, and I'll show you how to do that later on in the discussion. Keep in mind though that any feature you add to a ring buffer will drastically affect it's performance. So just adding bells and whistles that you don't really need is usually a bad idea.

One more thing: The ring buffer indexed have a concept of "ownership". The routine that is putting data into the ring buffer "owns" the Put index. The routine that gets data from the ring buffer "owns" the Get index. Routines never touch an index that they don't own. In designs where a routine may want to clear out the buffer, it would have to take ownership of all of the buffer variables temporarily in order to clear them. In those instances that routine may be required to temporarily disable interrupts or otherwise shut out any other routines that may be trying to access the ring buffer until it is safe to do so.

Ring buffers are by nature asyncronous devices, so all the code that uses them must follow the conventions to make it thread safe. The ring buffer code itself is thread safe, but sometimes the applications that use the ring buffer must also be aware of the context that the ring buffer is used in.

Here's our completed ring buffer code:

#define BUFSIZE 100

byte RingBuf[BUFSIZE];
int Put = 0;
int Get = 0;

There, we did it! Our ring buffer is now complete. We "made" it.

In our next discussion we'll learn how to "make it work".

#104 happymonster OFFLINE  


    Chopper Commander

  • 121 posts

Posted Wed Jan 30, 2008 4:32 PM

Shouldn't it be:

char RingBuf[BUFSIZE];


#105 Heaven/TQA ONLINE  



  • 11,196 posts
  • Location:Baden-Württemberg, Germany

Posted Wed Jan 30, 2008 4:35 PM

great read dan... ;) you should write a book... :)

#106 happymonster OFFLINE  


    Chopper Commander

  • 121 posts

Posted Wed Jan 30, 2008 5:08 PM

Yes. Sorry, didn't mean to derail your posts. :)

#107 donlebeau OFFLINE  


    Space Invader

  • 29 posts

Posted Wed Jan 30, 2008 7:36 PM

Shouldn't it be:

char RingBuf[BUFSIZE];

Depends on the compiler. Older compilers use "char". I'm currently using Visual Studio 6 which supports "byte".

Good catch though.

#108 happymonster OFFLINE  


    Chopper Commander

  • 121 posts

Posted Wed Jan 30, 2008 10:48 PM

I thought char was part of the C standard, while byte was maybe C99?

#109 potatohead OFFLINE  


    River Patroller

  • 4,404 posts
  • Location:Portland, Oregon

Posted Wed Jan 30, 2008 11:14 PM

This is great discussion. For me personally, seeing this stuff in context, is very educational. Appreciated.

#110 batari OFFLINE  



  • 6,680 posts
  • begin 644 contest

Posted Thu Jan 31, 2008 3:39 AM

Just do this in your head, and let's continue the discussion:

#ifndef byte
#define byte unsigned char


#111 remowilliams OFFLINE  



  • 10,607 posts
  • Location:Detonation Boulevard

Posted Thu Jan 31, 2008 12:53 PM

I'll chime in as well to say that this thread is a very interesting and informational read as well. It'd be neat in any context, but an awesome 8bit classic - that's just too cool :D

#112 donlebeau OFFLINE  


    Space Invader

  • 29 posts

Posted Thu Jan 31, 2008 2:11 PM

Ring Buffers step 2 - Make it work:

Now that we have our ring buffer, we can start making it work. But before we continue I wanted to address nameing conventions. We've been working with a sample ring buffer, but most applications have more than one. To keep my code readable, I name each of my ring buffers after the process that it gets the data from it. For example, in Gauntlet the ring buffer used to hold think module requests is named like this:

#define THINKBUFSIZE 100char ThinkBuf[THINKBUFSIZE];int ThinkPut = 0;int ThinkGet = 0;

Since the ring buffer is named after the consumer of the data, you immediately know you are making a request to the Think Module when you see the data getting put into a ring buffer.

Scary performance rant:

As a side note, ring buffers are very flexible and each one is usually optimized for a specific use. Because of this, there's a good chance that your running code won't have any "generic" ring buffers in it. For instance the ring buffer that you use to get data off of a serial port will be vastly different that the one that you use to request and object's action. Each of these will have only the features that they require and nothing extra. Because of their nature, every feature that you add to a ring buffer has a huge impact on it's throughput.

Many times the performance of the ring buffers is what makes or breaks system.

With today's blindingly fast CPU's and optimizing compilers, the problem isn't as bad as it used to be. You can write horrendously inefficient code and got away with it most of the time, but "back in the day" every CPU cycle counted and some people (like me) made a living pushing every ounce of perfomance out of the existing hardware. Yes, today you can have generic ring buffers
with all the bells and whistles in your toolbox. I have one that I built using the STL library that I use everywhere. It's horribly inefficient, but for most applications I can just drop it in and it performs well enough. But if I drop it in and it kills the syste performancee, then I have to write a custom one without all the unessasary fat. (BTW, If anyone's interested in my STL ring buffer code, let me know and I'll send it to you. If you have to ask what STL is, don't worry, you won't need it. :) )

The point is that you must understand ring buffers enough to know what features you really need and how to optimize the ring buffer if it ends up being too slow for your application.

An important thing to keep in mind as we go forward is that every ring buffer performance and reliability issue is solved during design time. You must know your performance requirements and your ring buffer's place in the architecture first in order to determine the required functionality, buffer size, flow control, and minimum throughput. It is possible to calculate all this stuff ahead and determine if your ring buffer will "work" before even writing a line of code. Sometimes you might have to take drastic measures like redesigning the hardware to get the performance that you need. That's the kind of stuff you better know ahead of time, not later on when the hardware is cast in stone.

Using ring buffers:

Ring buffers follow these four basic rules:

1. You put data into into them by storing the data at the Put indexed location and then incrementing the Put index.
2. You get data out of them by incrementing the Get index and reading the data out of the Get indexed location.
3. Both the Put and Get indexes must wrap around to the beginning of the buffer every time they pass the end.
4. If the Put index equals the Get index, the buffer is empty.

That's everything you need to know to implement a basic ring buffer. Let's see how this looks in C. For clarity, I've written this as wordy as possible. We can distill it down a bit after we discuss some common C notation. (I want to make sure everyone understands what is going on and nobody is stumbling around because of the C language itself.)

#define TESTBUFSIZE 100char TestBuf[TESTBUFSIZE];int TestPut = 0;int TestGet = 0;bool BufferIsEmpty(){   if ( TestPut == TestGet )     {       return True;     }   else     {       return False;     }}void PutData(char value){  TestBuf[TestPut] = value;  TestPut = TestPut + 1;  if ( TestPut > TESTBUFSIZE )  {     TestPut = 0;  }}char GetData(){  TestGet = TestGet + 1;  if ( TestGet > TESTBUFSIZE )  {     TestGet = 0;  }  return TestBuf[TestGet];}

Ok, we've defined a ring buffer and created functions to find out if the buffer is empty, to put data into it, and to get data from it. Let's explain what we did and clean up the code as we go along.

First is the function to determine if the buffer is empty:

bool BufferIsEmpty(){   if ( TestPut == TestGet )     {       return True;     }   else     {       return False;     }}

This function IsBufferEmpty() compares the TestPut index to the TestGet index and returns True if they are same and False if they are different. This follows rule number 4 above. The function returns a boolian value which can be either True or False. This code can be cleaned up because the test in the "if" statement ( TestPut == TestGet ) also returns a True or a False. Most programmers would just return the result of the test directly and would rewrite the function as:

bool BufferIsEmpty(){   return ( TestPut == TestGet );}

... which means the same thing and is a lot cleaner (trust me on this).

So we have a function to determine if the buffer is empty. It can be used like this:

if ( BufferIsEmpty() )  {     printf("The Buffer is empty\n");  }else  {     printf("The Buffer is not empty\n");  }

Now let's look at the function to put data into the buffer:

void PutData(char value){  TestBuf[TestPut] = value;  TestPut = TestPut + 1;  if ( TestPut > TESTBUFSIZE )  {     TestPut = 0;  }}

We pass in a byte value. The first line "TestBuf[TestPut] = value;" puts that value into the buffer location indexed by the Put index. The next line "TestPut = TestPut + 1;" increments the Put index. That implements rule number 1 on how to put data into the ring buffer.

Because we incremented the Put index, we must now check to see if we should wrap it around to the beginning of the buffer. This is rule number 3. If we don't do this, it's not a ring buffer anymore. It's a memory fill routine. :)

The most obvious way to do this is to test if the Put index is greater than the size of the buffer and to set it to zero if it is. That's what the code starting with the "if" statement does.

Let's clean up this code a bit. In C you can increment a variable directly. The statement "TestPut = TestPut + 1;" can be rewritten as "TestPut++;". The "++" means to increment the variable. Where you put the "++" is also important. If it appears before the variable name "++var" the variable is incremented before the variable isuse. If it appears after the variable name "var++" the variable is incremented after the variable is used. For example:

int xvar = 5;
int X = ++xvar;

int yvar = 5;
int y = yvar++;

In this example x = 6 and y = 5. Both xvar and yvar end up as 6. Got it?

We now have:

void PutData(char value){  TestBuf[TestPut] = value;  TestPut++;  if ( TestPut > TESTBUFSIZE )  {     TestPut = 0;  }}

We can furthur clean up the code by putting the "TestPut++" inside the brackets in the first line. This works because TestPut will be used before it is incremented. The code will function exactly the same as the original. Here's our final cleaned up version:

void PutData(char value){  TestBuf[TestPut++] = value;  if ( TestPut > TESTBUFSIZE )  {     TestPut = 0;  }}

Now let's look at the function that gets the data out of the ring buffer:

char GetData(){  TestGet = TestGet + 1;  if ( TestGet > TESTBUFSIZE )  {     TestGet = 0;  }  return TestBuf[TestGet];}

Using the same code cleanup methods we get:

char GetData(){  if ( ++TestGet > TESTBUFSIZE )  {     TestGet = 0;  }  return TestBuf[TestGet];}

Notice that the statement "if ( ++TestGet > TESTBUFSIZE )" increments the Get index before it uses it. Then it wraps it to the beginning of the buffer if necessary and after wrapping it, it uses it to return the contents of the buffer. It's important the wrap the Get index first before using it or you will be reading data from off the end of the buffer.

Now we have a working ring buffer, but before we continue I'd like to clean it up somewhat. The method that we use to wrap the indexes is simple, but it's not the best method. There is a faster and cleaner way to do it if the CPU supports it. The 6502 doesn't, but I'm showing this to you because any current CPU does and it's a huge performance increase: Here's the old way and the new way side-by-side:

Old:  if ( Index > BUFSIZE )    {    Index = 0;    }New:  Index = Index MOD BUFSIZE;

We're using the MOD operation to wrap out buffer. The Mod operator divides the first value by the second value and returns only the remainder as result. Here's some of the values Mod will return using 100 for the BUFSIZE:

1 MOD 100 returns 1
99 MOD 100 returns 99
100 MOD 100 returns 0
101 MOD 100 returns 1

As you can see, MOD will wrap the index just the way we want it to. It is also faster because it is an internal CPU operation instead of a branch like and "if" statement is. It took me years before I saw this trick, I always used to do it with the "if". Here's the final (I mean it this time!) ring buffer code along with some test code to run it:

#define TESTBUFSIZE 100char TestBuf[TESTBUFSIZE];int TestPut = 0;int TestGet = 0;bool BufferIsEmpty(){   return ( TestPut == TestGet );}void PutData(char value){  TestBuf[TestPut++] = value;  TestPut = TestPut MOD TESTBUFSIZE;}char GetData(){  TestGet = ++TestGet MOD TESTBUFSIZE;  return TestBuf[TestGet];}main(){  Char i;  char temp;  for ( i = 1, i <= 10; i++ )    {      PutData( I );    }  while ( !BufferIsEmpty() )    {      temp =  GetData();      Printf("\nRead data value %d", temp);    }  Printf("\nDone!");}

If you have any questions so far, let me know. After this we're going to add a feature or two and then we'll "Make it work reliably". Right now it works, but it's nowhere near bombproof (can you figure out why?).

Thanks again to Potatohead and Goochman who are just about to tell me how to make the code blocks work. :D

Edited by donlebeau, Thu Jan 31, 2008 6:16 PM.

#113 potatohead OFFLINE  


    River Patroller

  • 4,404 posts
  • Location:Portland, Oregon

Posted Thu Jan 31, 2008 2:12 PM

Donald. You can use the code tag for eazy cheezy intents with non-proportional font.

Edited by potatohead, Thu Jan 31, 2008 2:15 PM.

#114 Goochman OFFLINE  



  • 6,986 posts
  • Moongates to the Past

Posted Thu Jan 31, 2008 2:46 PM

In case you havent seen it - the codebox is inserted via:

[ codebox ]

[ / codebox ]

tags - remove the spaces I added so the tag shows up ;)

#115 Shannon ONLINE  


    Born To Be Insane

  • 7,840 posts
  • Pac-man Fever
  • Location:Arcade

Posted Fri Feb 1, 2008 12:46 AM

MOD is the same as using % right?

#116 donlebeau OFFLINE  


    Space Invader

  • 29 posts

Posted Fri Feb 1, 2008 12:02 PM

MOD is the same as using % right?

Yeah, it's the same thing. Thanks for point that out!

Last night I was wondering if I was wasting my time on the ring buffer description since there were probably some good tutorials already on the net. I searched around and found a number of them , but they were all either totally theory or way too cryptic to be understandable. Also the "conventional" way to use Put and Get indexes makes it impossible to do the CPU Architecture optimizations that actually make ring buffers useful. Made me feel a little better knowing that the stuff I'm writing isn't just a re-hash of other information that's already available.

I kind of feel bad for anyone that tries to learn about ring buffers using one of these other turorials. Even the stuff that was understandable was flat out wrong (if you ever want to optimize the performance).

It also strikes me as strange becasue I'm not presenting anything "new" here. I know a lot of other engineers that do it this way and we share tips that we've picked up over the years. I guess that most of the people who really know this stuff never have time to write turorials, so it's left to academics to write long winded white papers packed with theory and llittle practical experience.

That shouldn't really surprise because I've worked with too many PhD's who spout endless trivia about "software design practices" but couldn't program their way out of a paper bag. Those were the guys that wrote specs that caused fist fights. :)

Anyways, I'm moving this weekend so I won't be able to finish working on the next section until next week. Had hoped to get it finished by now, but it's still half done.

Have a nice weekend everybody!

Go <Insert your favorite team here>!!!!

Edited by donlebeau, Fri Feb 1, 2008 12:06 PM.

#117 happymonster OFFLINE  


    Chopper Commander

  • 121 posts

Posted Fri Feb 1, 2008 12:12 PM

Thanks for this Don. I'm following it so far and it makes sense to me. I've never used Ring Buffers as I am essentially a self-taught programmer and not an amazing one at that. So I'm eager to learn more and see how they are used in games.

Thanks! :)

#118 bf2k+ OFFLINE  



  • 1,773 posts
  • Location:Boot Factory BBS 2k+

Posted Fri Feb 1, 2008 1:46 PM

Ring Buffers Introduction:

(You may want to punch me after reading the first part, but bear with me for a bit Besides, I'm having fun writing this stuff. :) )


I am having a great time reading it!! Carry On!

#119 Bennet OFFLINE  


    Space Invader

  • 19 posts
  • Location:California, USA

Posted Fri Feb 1, 2008 1:59 PM

Go Don Go!

Or to be really optimal, you make it so that your ring buffer is 256 elements and then your get/put indices are bytes. Then when incrementing the byte past 255 it naturally wraps back to 0 for you. I was thinking this is what you'd have used on the 800, but 256 entries per list is a lot and would potentially use up the memory too fast.

Also, I've always avoided MOD (%) since it involves a divide instruction. Making your buffers powers of two in size, then just AND'ing against the appropriate bit fields is a much faster instruction. (Ex: buffer holds 64 elements, then you do 'index &= 64-1;' or 'index &= 0x3F'.


PS - Thanks for the email, it totally explained why the enemies would move through a mostly destroyed wall

#120 donlebeau OFFLINE  


    Space Invader

  • 29 posts

Posted Sat Feb 2, 2008 9:38 AM

Bennet, you must have been reading ahead. :) Will get to that in the "reliably" and "fast" sections.

Yup, Gauntlet used five of those 256 byte queues. Some of them were massively huge compared to the amount of data they actually handled, but I needed the speed more than anything else. One of the reasons that Gauntlet was such a memory hog is that usually to make something faster you have to use more memory or make duplicate optimized sections of code. Both approaches eat up your available memory pretty fast.

Sizing buffers on a byte or word boundry is always my first choice. If I can afford the space I just do it. It's just so much faster and cleaner than the other methods.

If I can't do that, I'll use the AND'ing method if I can get away with it. You get a big performance boost on older CPUs, but not as much on today's machines with their prefetches and single cycle math operations. It's probably not any faster than using MOD on the current crop of CPU's, but the code looks cleaner and it just feels better when you're being "clever". :)

It's kind of weird because I got into programming on the "ground floor". and watched some dramatic changes over the years. When I first started, you programmed in machine language - not "assembly language", you did the work of the assembler yourself on paper and then entered all the bytes in using front panel switches. They they made assemblers which saved you a ton of work. You didn't have any disassemblers or source code debuggers but that was ok because most of us could just read the machine code directly.

The big leap came with the C compilers. For the first time one line of code could put out many assembly language instructions. You were getting further away from the hardware and it was not a comfortable feeling. With machine or assembly you knew exactly what instructions the CPU would run and how many cycles it would take. Now it wasn't obvious and a lot of use would print listings with the assembly code interspaced with the C code just so we could see what was "really going on".

After a while we knew exactly what code the C compiler would spit out and we would write "the most efficient code" which sometimes looked pretty odd. But we knew exactly what the CPU was running and that was all that mattered. If you wanted to do anything time critical, you still used assembly language because of it's speed and predictability. But for everything else C worked just fine.

Then along came C++ with it's objects and overriding and all kinds of great programming features. Now we're in a world where a simple statement like "a = b;" could spew out thousands of assembly language instructions. We're so far from the hardware that we don't even thing about it. And CPUs have gotten so fast and compilers so good that code efficiency almost doesn't matter. I work on time-critical stuff today and it's creating objects on the fly, passing them around like there's no tomorrow, calling down ten levels to get some status (using text strings!!!) and all kinds of outlandishly inefficient stuff. But when I run this stuff against a performance meter it doesn't even register - it's just that fast.

Sometimes we look at the stuff we're writing today and we just have to say to ourselves "Get over it" and move on. The world has moved on I guess you have to get shed those dinosaur roots sometimes... :)

#121 donlebeau OFFLINE  


    Space Invader

  • 29 posts

Posted Fri Feb 15, 2008 3:21 PM

Ring Buffers step 3 - Make it work reliably:

So far, we have a ring buffer that works, but it's pretty dangerous. We can tell if it's empty, but have no way to tell if it's full or almost full. That may be ok under one of the following conditions:

1. You can always take items out of the buffer at least as fast as they get put into it.
2. There is only a set number of items that can be in the buffer, and the buffer is large enough to hold all of them.

If neither of these conditions apply, you must somehow control how fast the buffer gets filled. This is important because once a ring buffer overflows, you don't just lose a couple of items, you lose an entire buffer's worth of data! If the buff size is 100 and you stuff 101 items into it, the Put index will equal the Get index and you will now have an "empty" buffer. Stuff 102 items into it and the Put index will pass the Get index and you will have 1 item left in the buffer.

The first step to controlling how fast the buffer gets filled is to keep track of how much data is in the buffer. Let's add a counter to our test buffer:

#define TESTBUFSIZE 100char TestBuf[TESTBUFSIZE];int TestPut = 0;int TestGet = 0;int TestCount = 0;bool BufferIsEmpty(){   return ( TestPut == TestGet );}bool BufferHasSpace(){   return ( TestCount < TESTBUFSIZE );}void PutData(char value){  TestCount++;  TestBuf[TestPut++] = value;  TestPut = TestPut MOD TESTBUFSIZE;}char GetData(){  TestGet = ++TestGet MOD TESTBUFSIZE;  return TestBuf[TestGet];  TestCount--;}main(){  Char i;  char temp;  for ( i = 1, i <= 10; i++ )    {      PutData( i );    }  while ( !BufferIsEmpty() )    {      temp =  GetData();      Printf("\nRead data value %d.  There are %d Items left in the buffer", temp, TestCount);    }  Printf("\nDone!");}

Ok, we've added a new variable TestCount which will keep track of how much data is in our buffer. There are ways to calculate this on the fly, but all of them are klutzy and slow and not worth the effort. Using TestCount is easy, we increment it whenever we put data into the buffer and decrement it whenever we take data out of the buffer.

We've also added a BufferHasSpace() funtion to see if our buffer has any space left in it. Note that it checks if there's a maximum of 99 items in the buffer even though the buffer size can hold 100 items. Your ring buffer normally holds one less than the size of the buffer. This is because the BufferIsEmpty() function checks if the Put and Get indexes are equal, and they will be equal whne the buffer is empty or full.

You can use either the Put vs Get test or the TestCount to determine if the buffer is empty and each has it's advantages. If you've added TempCount use that because it's faster and your buffer can hold one more element - change the test in BufferHasSpace() to be ( TestCount = TESTBUFSIZE ) to get that extra byte. If you haven't added a buffer counter, just use the Put vs Get test.

Why not always use the counter? Because it's a big overhead hit. You're incrementing and decrementing the counter for every byte of data that goes through the buffer. That adds up, if you pass a megabyte of data through you buffer, you're performing 2 million extra operations. Depending on the CPU, that may be too costly.

By adding the counter, the routine that's putting data into the buffer can check if it's full before calling PutData(). Unfortunately, that keeps the buffer from overflowing, but it's usually not good enough. What happens if the buffer full? You usually can't hold on to the data until "later" unless you have another buffer somewhere, so you're only option is to drop the data on the floor and hopefully the buffer won't be full by the time the next piece of data arrives. Since most applications can't afford to lose data, you usually have to implement...

Flow Control:

If your data source can possibly overrun your buffer, you have to be able to tell it to slow down before it's too late. Most communications devices have the ability to turn on and off the source by some form of handshaking, but few of them happen instantly. If data has already arrived when the buffer is full, it's too late. You must determine when the buffer is "getting too full" and tell the input to stop sending and then tell it to start sending again when there's "enough room".

For this example, let's make two imaginary functions to start and stop the input and call them StartInput() and StopInput(). We don't really care what they do, but somehow they tell the sender to start and stop sending the data. For better performance We're also going to add a flag called InputStopped to keep track of what state the input is in.

The next thing we'll need is a point to tell us when to stop the input. How full do we want the buffer to get before we stop teh input? This point is called the HighWaterMark.

After the buffer hits the HighWaterMark, we want to empty it out to make room for more data. There's a point where we'll consider it "empty enough" to start accepting data again. This point is called the LowWatermark.

Depending on your application, the high and low water marks will be adjusted to work well with your input stream. If, for instance, your input will send about ten more characters before reacting to the StopInput() call, you must tell it to stop the input before you have less than ten spaces left in the buffer. Or if you input sends bursts of 50 charcters, you won't tell it to start sending until you have room for 50 characters.

For our example, let's assume that we'll stop the input when our buffer is 90% full and then start it again when it 10% full. Here's the code with the flow control added:

#define TESTBUFSIZE 100char TestBuf[TESTBUFSIZE];int TestPut = 0;int TestGet = 0;int TestCount = 0;bool InputStopped = false;#define HighWaterMark 90#define LowWaterMark 10bool BufferIsEmpty(){   return ( TestPut == TestGet );}bool BufferHasSpace(){   return ( TestCount < TESTBUFSIZE );}void PutData(char value){  TestCount++;  TestBuf[TestPut++] = value;  TestPut = TestPut MOD TESTBUFSIZE;  if ( InputStopped == false)    {      if ( TestCount >= HighWaterMark )       {         StopInput();         InputStopped = true;       }    }}char GetData(){  TestGet = ++TestGet MOD TESTBUFSIZE;  return TestBuf[TestGet];  TestCount--;  if ( InputStopped == true)    {      if ( TestCount <= LowWaterMark )       {         StartInput();         InputStopped = false;       }    }}

Ok, first let's look at the PutData() function. After buffering the character, we check if the input has been stopped. If it hasn't, we check if the buffer is too full and tell the input to stop if it is, and then set out InputStopped flag to true. We test if ( InputStopped == false) first becasue if we already told the input to stop, we don't want to call StopInput() again. This is a bit of an optimization becasueit's faster to test ( InputStopped == true) than to test ( TestCount <= LowWaterMark ). Also calling StopInput() repeatedly is probably not a good thing. We don't know what this function does, so let's just call it once to ensure we don't waste any time or break anything. (Some communications protocols send messages to the transmitter and you want want to spam them needlessly with redundant packets).

The GetData() function does the same thing only in reverse. After it gets a byte of data it checks to see if the input was stopped. If it was, it checks to see if it's time to start it up again. If the buffer is empty enough, it calls StartInput() and clears InputStopped. Now the buffer will start filling up again.

You now know enough to handle about almost every ring buffer application you'll ever run across. We haven't optimized anything yet and we'll cover some ways to do that in the next section. After that we'll look at how ring buffers were used in Gauntlet (finally!) along with some examples of why they were necessary.

#122 happymonster OFFLINE  


    Chopper Commander

  • 121 posts

Posted Fri Feb 15, 2008 3:34 PM

Well, that makes sense. But I'm still not really seeing how it's used so much.
I guess I've been spoilt by PC's and don't have to worry about data storage and storage methods so much. :)

#123 djmips OFFLINE  



  • 636 posts
  • scrolling
  • Location:Seattle

Posted Sat Feb 16, 2008 2:10 AM

Sometimes we look at the stuff we're writing today and we just have to say to ourselves "Get over it" and move on. The world has moved on I guess you have to get shed those dinosaur roots sometimes... :)

Well, there is still a place in this world for those who know how to 'program on the bare metal' and ironically it's a specialist position in the gaming industry. I'm sure there exists other fields as well, but uniquely, engine programming on consoles (coding 3D rendering, physics and collision systems for example) requires working with a spec that is unchanging, memory challenged and expected to improve without the ability to upgrade the hardware. Also these consoles often have specialized hardware that doesn't map to C or C++. For instance the Sony PS2 has a VLIW geometry processor called the VU which is almost exclusively programmed in assembler. The Sony PSP has a math coprocessor called the VFPU which also has a specialized instruction set. The PS3 has 7 SPUs (Cells) which require a very custom approach to keep the data flowing and processers occupied. All the recent Graphics Processors are now programmed in a Shader Language but it is but a small step up from the hardware level. Recently I coded a system for dynamically generating 3D terrain and it uses priority queues, and very efficient yet clean code. It ran on Sony PSP hardware and took advantage of the peculiarities of the PSP design. Optimized doesn't mean have to mean hard to read and I think that my code was simple yet fast.

With regard to the mantra about not optimizing until the end. It's not really true that experienced programmers don't optimize until the end. It should be stated that optimized is an overloaded term and the warning is for inexperienced programmers who really should be just getting something done instead of embedding themselves into a tarry mess of code optimization. It's also so very important to have good tools to measure performance. Optimizing blindly can be worse than not optimizing at all.

In fact, most experienced programmers who are concerned about performance, naturally think a lot about optimization from the very beginning, because the well know that the bad decisions that you make at the beginning might completely paint yourself into a corner towards the end of a project and you will be faced with insufficient time and resources to rewrite everything to accomodate a drastically different data format and associated algorithms. Using the correct data structures and algorithms at the beginning is a very important part of an optimized or high performing result. Paying attention to a lot of details like aliasing and cache friendly designs from the very start and as you go, is the only way to be sucessful at the end. It is a painful process indeed to rewrite vast amounts of code when projects are much too large and have airtight deadlines. It's the exact same issue with memory usage. You have to be on top of it from the beginning because trying to cram at the end can be disastrous.

Very good discussion on ring buffers especially with the 6502 angle. Very impressed with your game. I still love coding for the 6502 because it's a fun simple system and even though I considered myself an expert, I am constantly learning new things!

So don't think of yourself as a Dinosaur, CPUs are not going keep getting faster forever anyhow. If you really want to challenge yourself, get a job engine coding on the Sony PS3.

- David Galloway

edit - fix typo

Edited by djmips, Sat Feb 16, 2008 2:12 AM.

#124 Heaven/TQA ONLINE  



  • 11,196 posts
  • Location:Baden-Württemberg, Germany

Posted Sat Feb 16, 2008 2:26 AM

David... fully agree regarding optimisation... if you are experienced in a field you always "think" in a optimum way or in a way you know which could be pitfalls if you do not keep an eye on it at the beginning...

a simple but good examples are f.e.

- store data upside down so you can have the drawing loop skipping some CMPs
- table sizses 256 bytes on 8bit systems so you avoid checks again
- or on A800 using graphics mode 9 store gfx or textures preshifted (0x,x0)
etc etc etc

we all have done things "slow" in the past but when doing new things we think ahead or in such terms...

another example while coding Beyond Evil.

the item system is done not in a fast way as first I have to make it work... so here comes Don and tells wisely...make it run and then optimise...

in the gfx field where I have more "senior" experience I have wrote my sprite routines together with the pick up item routine "on the fly" in a optimum way because I know what slow things down prior writing the code...

But I guess that's what a "Senior level" makes the difference from a "newbie"... ;)

Edited by Heaven/TQA, Sat Feb 16, 2008 2:27 AM.

#125 Oswald OFFLINE  



  • 679 posts

Posted Sat Feb 16, 2008 2:18 PM

w o w, what an ammazing programming lesson. all my hats down, and all my thumbs up and my dog's :thumbsup: :) btw I more often than not fail into the trap of making it fast first, instead of working :o) waiting for the next chapter.

0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users