Could one get the subpixel movement with a simple bitmask?
Under some circumstances. Suppose you have a bunch of objects whose sub-pixel velocity (0-255) is stored in v0..v7 and whole-pixel position is in p0..v7. Keeping track of the subpixel position of all those objects would require another eight bytes, which may not be available. As an alternative, if you have a free-running frame counter and a spare temp variable, you can use something like this:
temp=not (frame-1) and frame
if temp and v0 then p0=p0+1
if temp and v1 then p1=p1+1
etc.
This will have a pixel worth of 'jitter', but will save a bunch of memory. This approach is nice when combined with the other approach you allude to, because using both approaches together provides less jitter than using the first alone, and more precision than using the second.
Lets say we wanted three bits of subpixel movement. Before doing graphics calls, shift the numbers down? Too cycle expensive?
This is a fine approach if you can deal with the restricted range of motion it would imply. If you shift your positions downward by three bits, you'll be limited to 31 pixels (or double-pixels) worth of motion. Probably excessively confining. On the other hand, taking off one bit might not be a problem.