There's a lot of things going on here.
First, about the gap. The Atari OS cassette handler enforces a minimum leader length before it will read blocks from tape. On write, a 20 second leader of mark tone is written, and on read, 10 seconds of leader are ignored. The cassette handler does not read anything at all from this leader, it simply waits 10 seconds before resetting POKEY and beginning to look for a sync mark. Anything encoded on the tape during that time is just ignored. This means that any leaders shorter than 10 seconds will not work on real hardware unless you cheat by pressing Play late after the OS thinks you've done so (by pressing a key at the tone). If the leader is short, the OS will skip part of the first block before starting and then fail.
The problem with this is that many CAS files have too short of a leader, and depending on how the emulator handles the file it may or may not work. If the emulator routes the bytes directly to SERIN without timing delays then it will happen to work, because the tape will effectively be held back until the cassette handler is ready to receive it. In Altirra, this doesn't work because it will advance the tape during the 10 second wait like happens on real hardware. Therefore, to make these tapes work, the emulator checks if first block is a data block with a gap that is too short and extends the gap to 10 seconds if needed.
This brings us to the next part of the puzzle. Note that I said "if the first block is a data block." Starting in Altirra 2.50, I added support for the FSK block type, which encodes raw bit data that can't be encoded in a regular data block. Crucially, this is only produced by newer tape decoders, notably a8cas. The tape images don't have clean leaders, they have small glitches that are encoded as FSK:
CASIMAGE: FSK block @ 0:08.327: 8327ms gap, 1 transition (0.2 ms)
CASIMAGE: FSK block @ 0:10.506: 2179ms gap, 1 transition (0.1 ms)
CASIMAGE: FSK block @ 0:10.941: 435ms gap, 1 transition (0.2 ms)
CASIMAGE: Data block @ 0:10.941: 6047ms gap, 132 data bytes @ 600 baud
CASIMAGE: FSK block @ 0:01.191: 1191ms gap, 1 transition (0.2 ms)
CASIMAGE: FSK block @ 0:03.370: 2179ms gap, 1 transition (0.1 ms)
CASIMAGE: FSK block @ 0:03.805: 435ms gap, 1 transition (0.3 ms)
CASIMAGE: FSK block @ 0:05.124: 1318ms gap, 1 transition (0.1 ms)
CASIMAGE: Data block @ 0:05.124: 4729ms gap, 132 data bytes @ 600 baud
In both cases, because the first block is an FSK block, the 10 second extension doesn't occur, and the tape blocks are used with their original timing. In the first case, the leader is 16 seconds, and crucially, two of the FSK-encoded glitches are after the 10 second mark. The OS cassette handler picks one of these up as a start bit and then runs its sync mark code, computing a bogus baud rate and reading garbage. With the shortened leader, the glitches are all before the 10 second mark, and thus simply ignored by the cassette handler, so it loads.
When C: acceleration is on, the OS cassette handler is bypassed and the emulator handles the leader delay and the sync mark measurement. Crucially, the emulator's decoder has false start bit detection, so when it doesn't see a valid train of pulses after the start bit it resets the state machine and continues looking for the actual sync mark.
That brings us to the difference between Altirra 2.90 and 3.10. This has to do with the width of the glitch. In Altirra 3.00, I increased the precision of the internal tape data stream from 16KHz to 32KHz to better support high baud rates. In 2.90, the glitch is too small to be picked up, but in 3.00+ there is enough precision for it to be picked in SKSTAT, which then causes the OS cassette handler to see the false start bit. Increasing the width of the SKSTAT filter would make the tape load again, but this reduces the maximum data rate that could be loaded -- and there are already problems with people encoding CAS images at data rates that are difficult or impossible to encode in FSK, such as 2000-6000 baud.
So, ultimately, the issue is the glitches in the CAS image. If you don't need FSK encoding I would recommend turning it off during the tape decode, as this is only necessary for tapes that use non-standard blocks. There is a question about whether a 0.2ms pulse can actually be decoded with a standard 410/1010 FSK decoder with 4KHz/5.3KHz tones. I would argue that these probably should be filtered out in the encoding process, but that would require some validation against the real hardware.