Most developers understand it is possible to trade off space for speed, e.g. unrolling loops or using table lookups instead of complex calculations. This is particularly true for low level programming where you are often trying to squeeze out the maximum speed in the minimum space. But I've recently discovered Propeller Assembly (PASM) it's possible to maximize speed and minimize space simultaneously.The Propeller is different from most processors in there are no dedicated ALU registers. Instead, any 32 bit entry in processor RAM may be used as a register. Thus processor RAM can be looked at as a 512 entry register file. Or 496 instructions (16 of the registers are dedicated to I/O), with each instruction taking a single register. Or 496 32 bit data values. It is even possible (and sometimes necessary) to modify an instruction using ALU operations.In the program I'm working on, I've programmed two 16.16 x 0.16 fixed point multiplies; multiplying one value twice (by cos & sin). In pseudo code:
frac = fixed & 0xFFFF int = fixed >> 16 frac_cos = cos * frac int_cos = cos * int/* because the algorithm destroys the inputs, they must be reloaded */ frac = fixed & 0xFFFF int = fixed >> 16 frac_sin = sin * frac int_sin = sin * int
Now, in a normal speed optimization, you'd look and see that the same value is being calculated twice. So to speed things up, you save the value (more space) so the calculations don't have to be done again. But in PASM this sometimes doesn't save any time and ends up taking additional space!
mov frac, fixed // frac = fixed and frac, H0000FFFFF // frac &= 0xFFFF H0000FFFF is a register which has the appropriate value preset mov sav_frac, frac // sav_frac = frac save the value in a temporary register... mov frac, sav_frac // reload value
As you can see, calculating the value requires 2 instructions. Saving the value requires 1 additional instruction, and the reload is an instruction. So calculating the value requires the same number of instructions as save+reload, but the save+reload requires an extra register (sav_frac). So save+reload isn't any faster and requires more space!Obviously this is a simple example which only required one instruction to calculate each value. But the converse is also true. In another part of the program I am finding it's better to maintain additional variables rather than calculating the necessary value from the loop counters.