I'm sure somebody else has been doing this already, but I just thought of it yesterday and wanted to share!
I am currently working on the menu kernel for my game, and I had a "SLEEP 19" in the kernel. (I honestly can't believe I have that much wiggle room, but that is during the simplest point of the kernel!). Since here, SLEEP just inserts 8 NOP's and a "NOP 0" or "BIT VSYNC", that is 10 bytes of ROM. Of course, for longer sleep durations, you could set up a BNE loop, but I am using both index registers at this point. Instead, I am just using PLA, since I am not using the stack pointer. Since you are not writing to the stack, there is no danger of corrupting your TIA registers or any RAM. It uses 4 cycles, and only 1 byte of ROM, allowing me to save 4 bytes of ROM. Plus I am using it again later to save another single byte of ROM.
Of course, when I have to reset the stack, it will take another 3 bytes, so it doesn't save much in this case, but it can surely be helpful if you need it.