The Altirra hardware reference has some good info there.
When your DLI starts, there's not a great deal of time to do stuff before the end of normal display. Normally you can push 2 registers and STA WSYNC safely, fairly sure trying to push a third risks overrun.
But you can get creative and just leave the WSYNC out altogether. Just ensure any relevant changes like colour and character stuff occurs offscreen.
Pushing 3 registers and preloading 2 registers e.g. for the colour change should be about enough to ensure you're offscreen. If not, padding out with a NOP or 2 should ensure it.
The "offscreen" amount of cycles you have available will vary depending on DMA width, whether H-Scrolling is active and whether PMG DMA is enabled.
Actual cycles are 34, but you lose 1 for DList fetch, another 5 if PMG DMA is enabled, even more if HScrolling is enabled. When HScrol is enabled, the cycle loss will be variable.
Time saving tips for DLIs - use zero page, use immediate mode if possible for colour/character changes although this will typically mean multiple DLI routines will be needed rather than just one.
You can also save time by not pushing registers to the stack, but storing them in LDA/LDX immediate instructions before the RTI.
; dli stuff
lda #$ff ; this gets modified
ldx #$ff ; so does this
Edited by Rybags, Fri Jan 20, 2012 7:54 AM.