Atoms, Boxes, Parents, Children & hex (oh my)
An MPEG-4 file is made of a number of discrete units called atoms (well, they were called atoms in the first version of the specification, now they are prosaically called 'boxes'). An atom has a format:
Anything beyond that basic 8 bytes is either optional & defined by the hierarchy it is found in (moov.udta.meta.XXXX atoms have a format defined by Quicktime), or defined by the atom itself. The ftyp atom is ALWAYS first, and has a certain type of format - it tells what type of file it is & the basic versioning of the atom structures.
In the above example the moov atom has a length of 0x00001D38 or 7480 bytes. Immediately following the moov name however is a new atom. This is the mvhd atom, and its length is 0x0000006C or 108 bytes. Because 108 bytes is less than 7480 bytes, the mvhd atom is a child atom of the moov atom. The MPEG-4 specification says that either an atom can be a parent atom (as moov is a parent to mvhd) or it can carry some sort of information on it (as ftyp & mvhd show above), but not both.
The length of the atom is determined by the length of itself PLUS any and all atoms in the level immediately below it - not all the way down to the end of the hierarchy. For example, the moov atom sums the length of the mvhd atom and other atoms on the same level (not shown), but not children of mvhd - mvhd sums those lengths. The atoms in the level below sum the lengths in the atoms below them until you get to the end of a hierarchy. At that point the sum of that atom is:
4 bytes for the atom length
The minimum length of an atom then would be 8 bytes.
The 'Atom Is A Parent Or Holds Data' rule is made to be broken . Often the atom under moov.trak.mdia.minf.stbl.stsd is a parent and contains data. Apple's drm implementation breaks this rule further. The other standard atom that breaks this rule is moov.udta.meta for historical reasons. Still, the MPEG-4 container is relatively easy to understand & highly flexible.
The most important part of an MPEG-4 file is the mdat atom - its where the actual raw information for the file is stored. This top level atom takes up the bulk of an MPEG-4 file. However, the moov atom comprises a number of different atoms and hierarchies, and provides for basic functionality - like specifying the dimensions of a video file, or the duration of a song.
uuid atoms are user-defined atoms, and are similar to normal atoms, but their name is 8 bytes (4 bytes holding uuid and the name of the uuid atom). Sony PSP mp4 files notably use uuid atoms. AtomicParsley supports setting & reading its own uuid atoms to carry supplemental metadata.
stco & mdat
What happens when atoms are added, modified or removed is that the tree gets changed, and then the lengths of the atoms needs to be re-determined. If the mdat atom moves relative to the beginning of the file, further adjustments need to be made. The free atom is meant to minimize this exact behavior.
The mdat data is made up of 'chunks' - these chunks are referenced in moov to provide for seeking within the file, and to tell the player where the beginning of the media data is. This information is stored on the moov.trak.mdia.minf.stbl.stco Sample Table Chunk Offset atom. This atom has a particular structure:
Each entry in the stco atom (and there can be mutliple stco atoms) needs to be readjusted.
Known iTunes Metadata Atoms
Metadata to be used with iTunes comes in the moov.udta.meta.ilst hierarchy. The atoms directly under the ilst atom have specific names, but they do not carry the data directly. The children of these named atoms (the data atom) carry the actual information. The 4 letter code of the parent is listed below, while the atom flags after the data atom are listed in the Class column. It is the class of the data atom that broadly determines whether text or numbers or binary data is contained.
1 Genre comes on 2 atoms - standard genres are on gnre; custom genres are on ©gen; only 1 is permitted at a time.
Text metadata has a limit of 255bytes. It comes in UTF-8 (no BOM), and isn't null terminated.
Unsigned integer metadata is 8bits wide (a limit of 255 for tracknum for example). Most have a format (cpil is 4 NULL bytes, then the value) specific to that atom. Only numerical data can be carried for most of these (except purl & egid). Vinyl taggers of "A1": complain to Apple.
Here is a sample of metadata - compilation (true) & tracknumber (2 of 5):
And for those thinking "Heavens to Murgatroid, how did cpil's 21 become 15 in the pic above... gosh, golly" - hex.
There is also another form of tagging that iTunes uses internally by a few inaccessible tags. Called the reverse DNS style (or something along that line), this form is pictured below:
Atom ---- @ 39852 of size: 72, ends @ 39924
Atom mean @ 39860 of size: 28, ends @ 39888
Atom name @ 39888 of size: 16, ends @ 39904
Atom data @ 39904 of size: 20, ends @ 39924
where the mean atom carries the reverse DNS domain (com.apple.iTunes) & the name atom carries the descriptor for the contents of the data atom.
The only style of metadata defined in the ISO Base Media File Format is what amounts to a single atom cprt - and the format described is in the 3gp asset style. In fact, the ISO copyright notice is identical to the 3gp copyright asset. This copyright notice is the only common tag available to all mpeg-4 files and derivatives.
The major brands that iTunes writes are listed at http://www.mp4ra.org/filetype.html, but iTunes-style metadata isn't defined in any publicly available document - its format is determined by the types of files that iTunes & the iTunes Music Store produce & provide. Since the goal of AtomicParsley is to set metadata is be maximally compatible with iTunes, the iTunes-style format of metadata is fully supported.
The 3GPP assets are family of metadata tags that the 3gp specification allows. These atoms differ in a number of ways from the more common iTunes style. There is no data atom; information is carried directly on the atom. Most 3gp assets have a language setting - so dozens of a like named atom are permitted that differ in the language used (around 480 languages).
A new style of metadata emerged with the foobar2000 0.9.x series. For whatever reason, this style typically duplicates the iTunes-style metadata. There is a double artwork tag, the artist is listed twice - it is heavy with redundancy. It is also non-compliant. A generic tool isn't allowed to create their own atoms - a mechanism exists to extend for supplemental functionality - the uuid atom form. foobar2000 doesn't use this mechanism. Nero has also adopted this tagging style with their freeware tagging tools. It seems to also write some tags in the reverse DNS form in the com.apple.iTunes domain.
The newest style of tagging was recently added at the MPEG4 Registration Authority. Currently, there is no known tool that can read or set this style of metadata.