Traditionally, ffmpeg would build the mp4 container while transcoded media is written to disk (in a single contiguous mdat box after ftyp) and then put the track description and samples in a moov at the end of the file. That's efficient because you can't precisely allocate the moov before you've processed the media (in one pass).
But when you would load the file into a <video> element, it would off course need to buffer the entire file to find the moov box needed to decode the the NAL units (in case of avc1).
A simple solution was then to repackage by simply moving the moov at the end of the file before the mdat (adjusting chunk offset). Back in the day, that would make your video start instantly!
This is basically what cmaf is. the moov and ftyp gets sent at the beginning (and frequently gets written as an init segment) and then the rest of the stream is a continuous stream of moof's and mdat's chunked as per gstreamer/ffmpeg specifics.
I was thinking progressive MP4, with sample table in the moov. But yes, cmaf and other fragmented MP4 profiles have ftyp and moov at the front, too.
Rather than putting the media in a contiguous blob, CMAF interleaves it with moofs that hold the sample byte ranges and timing. Moreover, while this interleaving allows most of the CMAF file to be progressively streamed to disk as the media is created, it has the same CATCH22 problem as the "progressive" MP4 file in that the index (sidx, in case of CMAF) cannot be written at the start of the file unless all the media it indexes has been processed.
When writing CMAF, ffmpeg will usually omit the segment index which makes fast search painful. To insert the `sidx` (after ftyp+moov but before the moof+mdat s) you need to repackage (but not re-encode).
But when you would load the file into a <video> element, it would off course need to buffer the entire file to find the moov box needed to decode the the NAL units (in case of avc1).
A simple solution was then to repackage by simply moving the moov at the end of the file before the mdat (adjusting chunk offset). Back in the day, that would make your video start instantly!