Anywhere we're waiting for a timer to complete. Just the top and bottom areas of the screen, I seem to recall.
But there are a number of different tasks done with different priorities/orders. Some tasks rather big could only be done in some places in the frame. Some were 'filler' tasks that took very little processing time so they could fill in the gaps left over after the big tasks had run, and so be sure to use every single drip of processing time that was available.
Most excellent design - a tile mapped mario world style game without extra hardware requires all the available time in both blanks and careful load balancing/queueing.
This is one feature I think could be improved in bB, unless I am mistaken only one of the vertical blanks is available for the game loop.
Virtual World BASIC (inspired by bB and BD) sports two game loops to allow BASIC to run in both vertical blanks, giving the programmer instant access to granular load balancing.
It was initially designed to step down to 30 HZ to use an entire frame for repositioning the full screen playfield camera and sprites which kept programming in BASIC very simple like bitd by minimizing the need for the programmer to worry about load balancing.
It's now evolved with DLI's like the Atari home computers that give the programmer fine grain load balancing control to update a region of the screen from either of the blanks with more time left over for additional tasks the programmer can load balance on a per frame basis, but the architecture is more complex as you've described.
I think your analogy about a few weeks to write a great game in BASIC compared to six months in asm is spot on, but without finding a way to alleviate the load (like the first method stealing a frame, or using a 32-bit co-processor to update the framebuffer) then the architecture and concept for load balancing/queueing becomes just as important for the BASIC programmer as for the Assembly programmer.
As I was working with ANTIC I realized DLI's gave Atari BASIC programmers the ability to organize regions of the screen to exert fine grain load balancing control - another great influence for making this design architecture accessible to the BASIC programmer. I also tried to simplify it so it's easier to use in Virtual World BASIC - DLI's are fairly complicated to use in Atari BASIC.
I think much of the speed improvement in BASIC over asm on the VCS comes from being isolated from kernel load balancing - this architecture the BASIC programmer (thankfully) never has to worry about.
I see the BD 168 scanline kernel without WSYNC is yielding 500 extra cycles of processing power to draw that display, that's a fantastic optimization - your kernel tree must be perfectly balanced with no branches taking even a single extra cycle for that to work!