It's not just about speed, but also how much space and registers a long call takes.
ATXBasic's bank mechanism works as follows...
Thanks for the explanation. I had to look up 'thunk' (since I'd never heard of the term before), but having done so, I realise I use thunks too.
I opted to keep the bank and destination address directly behind the JSR to the 'thunk' since I wanted to pass arguments in all three registers. It is pretty expensive, but by avoiding inter-bank calls in performance-critical situations, it works out well enough. Similar methods are used in the GOS ROM, U1MB/Incognito PBI BIOS (which use 4 x 2KB banks) and the XEX loader (which uses 2 x 8K banks). The advantage of the expensive method is that I don't need to care much whether the JSR macro results in bank traversal or not (the macro picks the appropriate method depending on whether or not the target is in the same bank), although obviously one tries to pre-optimise things by positioning code such that inter-bank jumps are kept to a minimum.