Jump to content
  • entries
  • comments
  • views

Propeller breaks the space time continuum

Sign in to follow this  


Most developers understand it is possible to trade off space for speed, e.g. unrolling loops or using table lookups instead of complex calculations. This is particularly true for low level programming where you are often trying to squeeze out the maximum speed in the minimum space. But I've recently discovered Propeller Assembly (PASM) it's possible to maximize speed and minimize space simultaneously.The Propeller is different from most processors in there are no dedicated ALU registers. Instead, any 32 bit entry in processor RAM may be used as a register. Thus processor RAM can be looked at as a 512 entry register file. Or 496 instructions (16 of the registers are dedicated to I/O), with each instruction taking a single register. Or 496 32 bit data values. It is even possible (and sometimes necessary) to modify an instruction using ALU operations.In the program I'm working on, I've programmed two 16.16 x 0.16 fixed point multiplies; multiplying one value twice (by cos & sin). In pseudo code:

  frac = fixed & 0xFFFF  int  = fixed >> 16  frac_cos = cos * frac  int_cos  = cos * int/* because the algorithm destroys the inputs, they must be reloaded */  frac = fixed & 0xFFFF  int  = fixed >> 16  frac_sin = sin * frac  int_sin  = sin * int

Now, in a normal speed optimization, you'd look and see that the same value is being calculated twice. So to speed things up, you save the value (more space) so the calculations don't have to be done again. But in PASM this sometimes doesn't save any time and ends up taking additional space!

  mov  frac, fixed       // frac = fixed  and  frac, H0000FFFFF  // frac &= 0xFFFF   H0000FFFF is a register which has the appropriate value preset  mov  sav_frac, frac    // sav_frac = frac  save the value in a temporary register...  mov  frac, sav_frac    // reload value

As you can see, calculating the value requires 2 instructions. Saving the value requires 1 additional instruction, and the reload is an instruction. So calculating the value requires the same number of instructions as save+reload, but the save+reload requires an extra register (sav_frac). So save+reload isn't any faster and requires more space!Obviously this is a simple example which only required one instruction to calculate each value. But the converse is also true. In another part of the program I am finding it's better to maintain additional variables rather than calculating the necessary value from the loop counters.

Sign in to follow this  


Recommended Comments

There are no comments to display.

Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...