Fabien Sanglard's non-blog

  




Progressive playback: An atom story.



November, 15th 2011

Introduction


I have been doing a lot of work with video containers recently, especially figuring out interoperability between iOS/Android and optimizing progressive playback. In particular it seems Android devices fail to perform progressive playback on certain files while iOS and VLC succeed:

Why ?

As usual understanding things to the deep down proved extremely worthy.


Analysis


A movie file is called a container. There are several kind of containers but the most common on mobile platforms are:


Within the container, datas are organized as "ATOM"s. As you can see in the drawing on the left a typical movie container features four atoms:

  1. ftyp atom: The magic number part of the file. The body of this atom also contains the branding and version of the container format. With quicktime/MOV it is always "qt  ".
  2. moov atom: The metadatas, containing codec description used in the mdata atom. It also contains sub-atoms "stco" and "co64" which are absolute pointers to keyframes in the mdata atom.
  3. wide atom: A dirty hack explained later.
  4. mdata atom: The interleaved compressed audio and video streams. Account for 95% of the file size. Most of the time codecs used are H.263 for video and AAC for audio.

Note : Why is the wide atom a dirty hack ? Because its only purpose in life is to be overwritten: Atom size are coded on 4 bytes. Hence an mdata atom maximum size is 4GB. To allow itself to grow further the mdata atom header can be moved up by 8 bytes thanks to the padding and a special atom header can be used in order to code its size on 8 bytes instead of 4.... and raise the limit from 4 GigaBytes to 9 ExaBytes.

Now when a file like this is accessed over HTTP, the player performs progressive playback as follow:

  1. Receives the "ftyp" atom and check that the container format, version and branding are supported.
  2. Receives the "moov" atom, check that the required codec are available and use the "stco" sub-atoms to start decoding the video and audio streams.
  3. Receives the "mdat" atom, buffer the content and make it available so codec can decompress it.

Since the "ftyp" and "moov" are a few KB, progressive playback can start within a few seconds.


Problem


In order to start playing a movie file right away its metadata contained in the "moov" atom is paramount to the player. If the movie file atoms are ordered as previously described everything work as expected...but most video editors (ffmpeg, quicktime, flash video) generate atoms in the wrong order (as seen on the right): With the "moov" atom last.

If you try to load a file structured like this on an Android device over the internet, you get an error message like this:



Progressive playback is not possible and you have to download the entire file before you can start watching the video. But if we try to open this file with an iOS device or VLC they are able to start playback within seconds:

How ?



The answer is pretty obvious and can be observed via WireShark:



iOS and VLC open a second HTTP connection to the server using the not so well known "Range" HTTP header:

  1. The first HTTP request features a "Range: bytes=0-" HTTP header field. So the movie is downloaded from the start.
  2. As soon the the player detects a "mdat" atom without the "moov" atom it opens a second connection with a "Range: bytes=4726467-" HTTP header field. This skip most of the file up to the end and retrieve the "moov" atom.

Thanks to the second connection, the "moov" atom is retrieved faster and progressive playback can start right away without waiting for the entire file to be downloaded.

Solution


Android videoplayer elect NOT to open a second connection but wait for the entire file to download. The only solution is to fix those files and reorder the atoms inside. This can be done:


Add a comment



Name Homepage
E-mail
(Will not appear online)
Comment



Comments (16)


#1 - Daniel (NessDan) - 11/22/2011 - 23:32
Incredibly informative! When I got to the part about atoms, I decided to find a small video I took with my cell phone and open it up in a text editor and at first all I saw was gibberish, but before giving up I looked closer and I saw the FTYP! To be exact, I saw this: "ftyp3gp5" I was extremely happy after seeing that. I then looked for MOOV and WIDE but couldn't find them and instead saw MDAT. After that I just went back to reading and once I got to the 2nd part, I realized what could've happened and sure enough, the MOOV was at the bottom of the file!

There was one thing I was trying to understand which was how VLC and iOS open a 2nd connection with "Range: bytes=4726467-" as the HTTP header. Could you shed more light on what exactly this does? I'm not incredibly familiar with what's going on there.

Thanks for the information, I absolutely love learning new stuff like this. Keep it up!
#2 - Fabien Sangladr - 11/23/2011 - 12:07
@Daniel

The range HTTP header allows to start downloading a resource starting at a certain offset. In VLC/iOS case, it allows to skip the mdata atom, reach the moov atom and start playback immediately.
#3 - Daniel (NessDan) - 11/23/2011 - 15:33
Thanks, I understand it more now. So that means the number of bytes will be different depending on the size of the file.
#4 - Fabien Sangladr - 11/23/2011 - 15:35
Yes it does, for each file the number will be the offset to access the moov atom.
#5 - Luc Trudeau - 11/25/2011 - 09:05
Great post, killer stuff! When you say in your problem statement : "Most video editors" do you mean most video encoders?
Also, I'm surprise this post was not in your RSS feed :(
#6 - Fabien Sangladr - 11/25/2011 - 10:05
@Luc:

1/ Yes, this is what I meant.
2/ I did not put it in the RSS because I did not think it would interest a lot of people.
#7 - G Troupel - 11/26/2011 - 09:21
Shouldn't the WIDE chunk be after the FTYP chunk rather than after the MOOV, if its purpose is to extend the FTYP chunk ?
Or maybe I misunderstood you.
#8 - Fabien Sanglard - 11/26/2011 - 13:00
@G Troupel

The purpose of the WIDE atom is to allow the mdata header to move "back" 12 bytes and encode its length on 8 bytes (+4bytes to indicate the length is a special case ). Hence WIDE must be just before the mdata atom
#9 - PypeBros - 11/27/2011 - 06:13
neat use of HTTP range-request option ;)
#10 - Daniel Lew - 11/27/2011 - 09:17
Fascinating. I dealt with this issue years ago, and at the time the engineer working on it told me "some videos can be played progressively, some can't" with no further explanation. We ended up ditching the project for time, so we never got to the root of the problem. Thanks for explaining why this happens.
#11 - Kiran Rao - 04/19/2012 - 10:15
I've been trying to get an H.264 encoded RTSP stream to play properly on various Android devices. I've been struggling no end (and failing of course). Thanks for this incredibly informative post.
The Android docs to mention that "the moov atom must precede any mdat atoms, but must succeed the ftyp atom", but I had absolutely no clue as to what it meant - until I read this post.

I have a question - in an RTSP stream, are the ftyp and moov atoms present only in the first (or initial few) packets? Or does the container mandate them in every packet?
#12 - Fabien Sanglard - 04/21/2012 - 21:00
I would assume it is in the first packet only.
#13 - Bijay - 05/12/2012 - 03:26
This content is so helpful for me to understand why mp4 videos were not playing while streaming. i have been struggling for long to understand these stuff but finally i got it right on my head!

but as for solution part i didn't get it well.... can you please further explain how can i make these video play while streaming and as for android documentation i found that how can i really do this stuffs, is there any programs that would help me or any thing that i can do to stream mp4 videos on android!!!!

Anyway thanks for this very informative post.. at-least i am relief at the moment!!!!
#14 - Arnaud - 07/01/2012 - 08:15
I must be stupid because this makes sense until the part before the code. You say that the container has 4 atoms, and the movie data is in the 4th one. Then why do you have to offset and request the last range of bytes to get the moov atom? In your schematic it's before the mdata. Unless the moov atom is at the very end of the file, which makes zero sense to me, why would you have to do this?
#15 - Sander - 07/12/2012 - 04:36
@Kiran

RTSP doesn't contain atoms since those are part of the container and RTSP is itself a container for the video & audio. For H.264 and Mpeg 4 part 2 the metadata is part of the SDP answer to the RTSP DESCRIBE request. This is done at the start and should get the decoder the right information. For H.264, this same metadata is in most cases also provided as separate frames: The Sequence Property Set (SPS) and Picture Parameter Set (PPS).

Why do some H.264 streams not play on Android? Many reasons, but one I found is that there are hardcoded limits on the H.264 Profile and Level in the Android platform. This is a bummer since these profiles & Levels won't be playable at normal framerates, but they should be playable at far lower framerates (think IP Cameras at 1-5 FPS).
#16 - vaffangool - 03/25/2013 - 08:00
I'm not sure I understand how this applies to VLC.

On the desktop I often use VLC to view incomplete video files--while they are still being downloaded by another application (almost always Google Chrome).

While VLC is capable of making its own connections over various protocols, I never imagined it was aware of the requests being made by the browser. I assumed it guessed the contents of the MOOV atom by analysing MDAT packets.

Now that I think about it, the downloads are usually H.264 streams (with parameter NAL's), often muxed into the Flash container format (with XML metadata), and VLC has problems opening partial downloads of otherwise-encoded or -containered video files. Are you saying that those problematic files can be opened if the download is initiated through VLC?

 

@2011