I'm not exactly sure what the limitations of Batari basic are (I've heard it leverages the ARM), but I'm writing this for a stock 4k cart in assembly, so I was very particular (read:overcomplicated) with how I handled it.
Apologies if this is a little hard to follow, please let me know if I need to clarify something.
Background info:
The enemies are drawn using a rotating 16 byte buffer, each byte applying to a 4 kernel line subroutine (move, move+4, draw, ect.) and NUSIZ instructions (1 copy, two close, ect.) which is updated every 4 kernel lines (the subroutine is in the top 4 bits, NUZIO in the bottom 3). The play area is 64 kernel lines tall, so the 16 byte buffer covers the screen.
The player is drawn with the ball in a rotating 64 byte buffer, updated every kernel line.
I make sure to track the player and enemy positions every kernel line so I know their absolute positions on screen.
For collision:
In the enemy update cycle (once every 4 scanlines), there is a collision check. This uses the (player position - enemy position) (which ever one you collided with) and references a 256-bit array (covering all possible relative positions) to determine which enemy was hit (#%10000000 is the leftmost, #%01000000 is the middle, #%00100000 is the rightmost). This is then XORed with the 16 byte enemy buffer pointer (I only need the lowest 4 bits, as it's only 16 bytes long) to store which enemy the player collided with. This is then stored in memory to be referenced later (as this was already too much logic in the kernel)
Outside the kernel, I used the enemy pointer to determine which byte in the enemy table the player collided with, and the surrounding bytes to determine which enemies were present on the line, updating the enemy buffer to "remove" that enemy so it won't appear next frame.
So, I'm actually only doing one collision check every 4 kernel lines (once for each set on enemies).