The Ogg container format is being promoted by the Xiph Foundation for use with its Vorbis and Theora codecs. Unfortunately, a number of technical shortcomings in the format render it ill-suited to most, if not all, use cases. This article examines the most severe of these flaws.
Overview of Ogg
The basic unit in an Ogg stream is the page consisting of a header followed by one or more packets from a single elementary stream. A page can contain up to 255 packets, and a packet can span any number of pages. The following table describes the page header.
Field | Size (bits) | Description |
---|---|---|
capture_pattern | 32 | magic number “OggS” |
version | 8 | always zero |
flags | 8 | |
granule_position | 64 | abstract timestamp |
bitstream_serial_number | 32 | elementary stream number |
page_sequence_number | 32 | incremented by 1 each page |
checksum | 32 | CRC of entire page |
page_segments | 8 | length of segment_table |
segment_table | variable | list of packet sizes |
Elementary stream types are identified by looking at the payload of the first few pages, which contain any setup data required by the decoders. For full details, see the official format specification.
Generality
Ogg, legend tells, was designed to be a general-purpose container format. To most multimedia developers, a general-purpose format is one in which encoded data of any type can be encapsulated with a minimum of effort.
The Ogg format defined by the specification does not fit this description. For every format one wishes to use with Ogg, a complex mapping must first be defined. This mapping defines how to identify a codec, how to extract setup data, and even how timestamps are to be interpreted. All this is done differently for every codec. To correctly parse an Ogg stream, every such mapping ever defined must be known.
Under this premise, a centralised repository of codec mappings would seem like a sensible idea, but alas, no such thing exists. It is simply impossible to obtain a exhaustive list of defined mappings, which makes the task of creating a complete implementation somewhat daunting.
One brave soul, Tobias Waldvogel, created a mapping, OGM, capable of storing any Microsoft AVI compatible codec data in Ogg files. This format saw some use in the wild, but was frowned upon by Xiph, and it was eventually displaced by other formats.
True generality is evidently not to be found with the Ogg format.
A good example of a general-purpose format is Matroska. This container can trivially accommodate any codec, all it requires is a unique string to identify the codec. For codecs requiring setup data, a standard location for this is provided in the container. Furthermore, an official list of codec identifiers is maintained, meaning all information required to fully support Matroska files is available from one place.
Matroska also has probably the greatest advantage of all: it is in active, wide-spread use. Historically, standards derived from existing practice have proven more successful than those created by a design committee.
Overhead
When designing a container format, one important consideration is that of overhead, i.e. the extra space required in addition to the elementary stream data being combined. For any given container, the overhead can be divided into a fixed part, independent of the total file size, and a variable part growing with increasing file size. The fixed overhead is not of much concern, its relative contribution being negligible for typical file sizes.
The variable overhead in the Ogg format comes from the page headers, mostly from the segment_table field. This field uses a most peculiar encoding, somewhat reminiscent of Roman numerals. In Roman times, numbers were written as a sequence of symbols, each representing a value, the combined value being the sum of the constituent values.
The segment_table field lists the sizes of all packets in the page. Each value in the list is coded as a number of bytes equal to 255 followed by a final byte with a smaller value. The packet size is simply the sum of all these bytes. Any strictly additive encoding, such as this, has the distinct drawback of coded length being linearly proportional to the encoded value. A value of 5000, a reasonable packet size for video of moderate bitrate, requires no less than 20 bytes to encode.
On top of this we have the 27-byte page header which, although paling in comparison to the packet size encoding, is still much larger than necessary. Starting at the top of the list:
- The version field could be disposed of, a single-bit marker being adequate to separate this first version from hypothetical future versions. One of the unused positions in the flags field could be used for this purpose
- A 64-bit granule_position is completely overkill. 32 bits would be more than enough for the vast majority of use cases. In extreme cases, a one-bit flag could be used to signal an extended timestamp field.
- 32-bit elementary stream number? Are they anticipating files with four billion elementary streams? An eight-bit field, if not smaller, would seem more appropriate here.
- The 32-bit page_sequence_number is inexplicable. The intent is to allow detection of page loss due to transmission errors. ISO MPEG-TS uses a 4-bit counter per 188-byte packet for this purpose, and that format is used where packet loss actually happens, unlike any use of Ogg to date.
- A mandatory 32-bit checksum is nothing but a waste of space when using a reliable storage/transmission medium. Again, a flag could be used to signal the presence of an optional checksum field.
With the changes suggested above, the page header would shrink from 27 bytes to 12 bytes in size.
We thus see that in an Ogg file, the packet size fields alone contribute an overhead of 1/255 or approximately 0.4%. This is a hard lower bound on the overhead, not attainable even in theory. In reality the overhead tends to be closer to 1%.
Contrast this with the ISO MP4 file format, which can easily achieve an overhead of less than 0.05% with a 1 Mbps elementary stream.
Latency
In many applications end-to-end latency is an important factor. Examples include video conferencing, telephony, live sports events, interactive gaming, etc. With the codec layer contributing as little as 10 milliseconds of latency, the amount imposed by the container becomes an important factor.
Latency in an Ogg-based system is introduced at both the sender and the receiver. Since the page header depends on the entire contents of the page (packet sizes and checksum), a full page of packets must be buffered by the sender before a single bit can be transmitted. This sets a lower bound for the sending latency at the duration of a page.
On the receiving side, playback cannot commence until packets from all elementary streams are available. Hence, with two streams (audio and video) interleaved at the page level, playback is delayed by at least one page duration (two if checksums are verified).
Taking both send and receive latencies into account, the minimum end-to-end latency for Ogg is thus twice the duration of a page, triple if strict checksum verification is required. If page durations are variable, the maximum value must be used in order to avoid buffer underflows.
Minimum latency is clearly achieved by minimising the page duration, which in turn implies sending only one packet per page. This is where the size of the page header becomes important. The header for a single-packet page is 27 + packet_size/255 bytes in size. For a 1 Mbps video stream at 25 fps this gives an overhead of approximately 1%. With a typical audio packet size of 400 bytes, the overhead becomes a staggering 7%. The average overhead for a multiplex of these two streams is 1.4%.
As it stands, the Ogg format is clearly not a good choice for a low-latency application. The key to low latency is small packets and fine-grained interleaving of streams, and although Ogg can provide both of these, by sending a single packet per page, the price in overhead is simply too high.
ISO MPEG-PS has an overhead of 9 bytes on most packets (a 5-byte timestamp is added a few times per second), and Microsoft’s ASF has a 12-byte packet header. My suggestions for compacting the Ogg page header would bring it in line with these formats.
Random access
Any general-purpose container format needs to allow random access for direct seeking to any given position in the file. Despite this goal being explicitly mentioned in the Ogg specification, the format only allows the most crude of random access methods.
While many container formats include an index allowing a time to be directly translated into an offset into the file, Ogg has nothing of this kind, the stated rationale for the omission being that this would require a two-pass multiplexing, the second pass creating the index. This is obviously not true; the index could simply be written at the end of the file. Those objecting that this index would be unavailable in a streaming scenario are forgetting that seeking is impossible there regardless.
The method for seeking suggested by the Ogg documentation is to perform a binary search on the file, after each file-level seek operation scanning for a page header, extracting the timestamp, and comparing it to the desired position. When the elementary stream encoding allows only certain packets as random access points (video key frames), a second search will have to be performed to locate the entry point closest to the desired time. In a large file (sizes upwards of 10 GB are common), 50 seeks might be required to find the correct position.
A typical hard drive has an average seek time of roughly 10 ms, giving a total time for the seek operation of around 500 ms, an annoyingly long time. On a slow medium, such as an optical disc or files served over a network, the times are orders of magnitude longer.
A factor further complicating the seeking process is the possibility of header emulation within the elementary stream data. To safeguard against this, one has to read the entire page and verify the checksum. If the storage medium cannot provide data much faster than during normal playback, this provides yet another substantial delay towards finishing the seeking operation. This too applies to both network delivery and optical discs.
Although optical disc usage is perhaps in decline today, one should bear in mind that the Ogg format was designed at a time when CDs and DVDs were rapidly gaining ground, and network-based storage is most certainly on the rise.
The final nail in the coffin of seeking is the codec-dependent timestamp format. At each step in the seeking process, the timestamp parsing specified by the codec mapping corresponding the current page must be invoked. If the mapping is not known, the best one can do is skip pages until one with a known mapping is found. This delays the seeking and complicates the implementation, both bad things.
Timestamps
A problem old as multimedia itself is that of synchronising multiple elementary streams (e.g. audio and video) during playback; badly synchronised A/V is highly unpleasant to view. By the time Ogg was invented, solutions to this problem were long since explored and well-understood. The key to proper synchronisation lies in tagging elementary stream packets with timestamps, packets carrying the same timestamp intended for simultaneous presentation. The concept is as simple as it seems, so it is astonishing to see the amount of complexity with which the Ogg designers managed to imbue it. So bizarre is it, that I have devoted an entire article to the topic, and will not cover it further here.
Complexity
Video and audio decoding are time-consuming tasks, so containers should be designed to minimise extra processing required. With the data volumes involved, even an act as simple as copying a packet of compressed data can have a significant impact. Once again, however, Ogg lets us down. Despite the brevity of the specification, the format is remarkably complicated to parse properly.
The unusual and inefficient encoding of the packet sizes limits the page size to somewhat less than 64 kB. To still allow individual packets larger than this limit, it was decided to allow packets spanning multiple pages, a decision with unfortunate implications. A page-spanning packet as it arrives in the Ogg stream will be discontiguous in memory, a situation most decoders are unable to handle, and reassembly, i.e. copying, is required.
The knowledgeable reader may at this point remark that the MPEG-TS format also splits packets into pieces requiring reassembly before decoding. There is, however, a significant difference there. MPEG-TS was designed for hardware demultiplexing feeding directly into hardware decoders. In such an implementation the fragmentation is not a problem. Rather, the fine-grained interleaving is a feature allowing smaller on-chip buffers.
Buffering is also an area in which Ogg suffers. To keep the overhead down, pages must be made as large as practically possible, and page size translates directly into demultiplexer buffer size. Playback of a file with two elementary streams thus requires 128 kB of buffer space. On a modern PC this is perhaps nothing to be concerned about, but in a small embedded system, e.g. a portable media player, it can be relevant.
In addition to the above, a number of other issues, some of them minor, others more severe, make Ogg processing a painful experience. A selection follows:
- 32-bit random elementary stream identifiers mean a simple table-lookup cannot be used. Instead the list of streams must be searched for a match. While trivial to do in software, it is still annoying, and a hardware demultiplexer would be significantly more complicated than with a smaller identifier.
- Semantically ambiguous streams are possible. For example, the continuation flag (bit 1) may conflict with continuation (or lack thereof) implied by the segment table on the preceding page. Such invalid files have been spotted in the wild.
- Concatenating independent Ogg streams forms a valid stream. While finding a use case for this strange feature is difficult, an implementation must of course be prepared to encounter such streams. Detecting and dealing with these adds pointless complexity.
- Unusual terminology: inventing new terms for well-known concepts is confusing for the developer trying to understand the format in relation to others. A few examples:
Ogg name Usual name logical bitstream elementary stream grouping multiplexing lacing value packet size (approximately) segment imaginary element serving no real purpose granule position timestamp
Final words
We have found the Ogg format to be a dubious choice in just about every situation. Why then do certain organisations and individuals persist in promoting it with such ferocity?
When challenged, three types of reaction are characteristic of the Ogg campaigners.
On occasion, these people will assume an apologetic tone, explaining how Ogg was only ever designed for simple audio-only streams (ignoring it is as bad for these as for anything), and this is no doubt true. Why then, I ask again, do they continue to tout Ogg as the one-size-fits-all solution they already admitted it is not?
More commonly, the Ogg proponents will respond with hand-waving arguments best summarised as Ogg isn’t bad, it’s just different. My reply to this assertion is twofold:
- Being too different is bad. We live in a world where multimedia files come in many varieties, and a decent media player will need to handle the majority of them. Fortunately, most multimedia file formats share some basic traits, and they can easily be processed in the same general framework, the specifics being taken care of at the input stage. A format deviating too far from the standard model becomes problematic.
- Ogg is bad. When every angle of examination reveals serious flaws, bad is the only fitting description.
The third reaction bypasses all technical analysis: Ogg is patent-free, a claim I am not qualified to directly discuss. Assuming it is true, it still does not alter the fact that Ogg is a bad format. Being free from patents does not magically make Ogg a good choice as file format. If all the standard formats are indeed covered by patents, the only proper solution is to design a new, good format which is not, this time hopefully avoiding the old mistakes.
I’d love to see your analysis of the Matroska container format.
Thanks for your analysis. For many years I’ve read on the FFmpeg mailing list that Ogg is a terrible format, but no one has ever explained why they think so.
I’m now curious to see any response from the Ogg guys justifying their design.
Also it might be interesting to hear your recommendation for a good container format, or if one doesn’t exist, what would it look like?
Great write up. I complained about this recently. I added HTML5 video to my site and the Ogg videos look horrible. The animation is super stuttery. It has nothing on H264 at this point.
Poor video quality should be blamed on Theora, not Ogg.
Theora codec is not poor either. Poor Theora video is may be due to poor encoding, see following:
http://people.xiph.org/~greg/video/ytcompare/comparison.html
We were talking about the Ogg container here. Please leave Theora out of it.
Serving MPEG4 files costs something like 5 million dollars per year, only to be allowed to use it without being sued. Ogg is free. That’s the killer feature. It’s not because MP4 is better that Ogg is bad. It gets the job done, for free. Companies like Spotify are quite happy with it.
Do you have a reference for that claim?
I’m not sure about Ogg, but Spotify is using Vorbis [1]. I’ve been wondering if they wrap it inside Ogg or not, but whatever it is, it seems to be working quite well for them in terms of latency etc.
Speaking of latency, you drove in the point that latency is important and that the need to minimize it increases overhead, but nowhere did you get into actual figures of just _how_ much latency you’re talking about in some of the practical applications you mention as requiring low latency. If you just tried to use Ogg with, say, “average” page length for telephone-quality audio, how much latency would that create in a telephony application? You say that the overhead would hike up to 7% if you used just one packet per page in order to minimize latency, but how much real need is there to actually use that kind of a minimal setting? What kind of a page length would ne needed for practically acceptable latency, and what kind of an overhead would that incur? (I’m not at all sure that ~1.5% or so really matters at all. 7% is bordering on being significant enough for an argument, but it was left unclear whether that’s the kind of setting you’d actually need.)
I understood the issues about timestamps etc. but for all I know, the section on latency could be either academic banter or valueable insight depending on these kinds of actual implications you left out.
[1] http://www.spotify.com/en/help/faq/#tech
We are indeed using Ogg Vorbis, but we don’t have any problems with Ogg latency like Måns describes.
Random Access is annoying though because of network latency. Jumping in the song is unbearably slow over a the internet because of the binary search. We actually slapped on very simple index in the beginning of the file to get rid of the problem.
Mozilla has an effort that might be of use for everyone. http://github.com/cpearce/OggIndex/
The latency issues I mentioned would only be evident in real-time interactive systems like telephony or gaming. Simple music streaming isn’t that sensitive.
Found these:
http://www.streamcrest.com/License%20Calculator9.html
http://www.mpegla.com/main/programs/M4V/Pages/Agreement.aspx
http://www.microsoft.com/windows/windowsmedia/licensing/mpeg4faq.aspx#MPEG4VideoFAQ_1_2
Those are all about codecs, not the MP4 container.
Check the despotify source code and you’ll see.
This is only talking about the container, not the codec. There’s no reason you shouldn’t be able to mux theora and vorbis in a mp4 container, avoiding the h264 patents while still getting a good container.
Is it legal? That is, the mp4 container is not covered by any patents?
Really!? No, seriously – if this can be done, wouldn’t it satisfy everybody? First I’ve heard of this… where can we go to find out more about this?
“Serving MPEG4 files costs something like 5 million dollars per year” – you are getting all kinds of things mixed up, please get your facts straight. The 5 million number you are touting probably refers to the annual cap that the MPEG LA charges for a H.264 license. However, this is the absolute *maximum* that any entity has to pay, online streaming is free and the entrance fees are in the few dollars/cents range.
But we are talking about containers, not codecs here. There are plenty of alternative containers without licensing bodies behind them. They are no less “free” than Ogg.
Despite the fact (as it has already been pointed out) that you’re failing to address the correct topic, your claim that
>Serving MPEG4 files costs something like 5 million dollars per year, only to be allowed to use it without being sued.
…is simply incorrect.
It costs you NO LICENSING FEES AT ALL, as long as you are serving that content for free (as in “free beer”) — fees apply only if you HAVE YOUR CONSUMERS PAY for the content you provide.
I won’t go into great lengths here because it’s all concisely written in the document you’ll find at
http://www.mpegla.com/main/programs/avc/Documents/AVC_TermsSummary.pdf
Just have a look at the (b) scenarios…
And know that
http://www.mpegla.com/Lists/MPEG%20LA%20News%20List/Attachments/226/n-10-02-02.pdf
You are confusing MP4, the file format, with AVC, the video codec. I have yet to see any evidence or royalties being payable for use of the MP4 file format.
Your assessment of the failures of OGG’s current design could be valid.
As things pertain to the history of OGM, it might be noteworthy that OGM was initially only usable with closed source windows tools. (See wikipedia)
False. Although Tobias’ tools may have been closed source, anyone could have created open source alternatives. I don’t know if that happened for writing, but open source readers sure existed. I wrote one myself.
This presentation nicely summarizes MPEG-LA licensing costs:
http://www.mpegla.com/main/programs/avc/Documents/avcweb.ppt
$5M is a cap that applies to large volume distributors, like browser vendors.
That is talking about MPEG4 part 10, aka AVC, aka H.264. The MP4 file format I mentioned is part 12+14 and has nothing to do with H.264/AVC.
If the mp4 container is covered by patents, then the Mpeg LA can ask you to pay any amount! There is no upper or lower limit the patent holder can ask! It is not legal requirement the patent holder **should** license it for you either free or at any amount of fee. They can refuse it license to you. Patent holders normally do not license for individuals. Individuals are not legally exempted violate patents. If you violate patents and if you cannot afford to pay penalties running into millions of dollars, jail is your home. Free children education materials are not exempted from video/audio codec and container patents even for 2 year old kids. No point describe the beauty of the miss universe, you cannot touch her!!
1) The MPEG LA has no patent pool for the MP4 container on offer, so your arguments have no factual basis in reality.
2) MP4 is not the only alternative container. There are plenty of others available, for example Matroska. Same as for Ogg, there are no known or claimed patents on Matroska.
License terms: http://www.mpegla.com/main/programs/AVC/Documents/AVC_TermsSummary.pdf
Free video distribution doesn’t cost anything until 31. Dec. 2010, but you nonetheless have to get a license or you will be sued. Later it can/will cost up to 5 Mio. $ per year. Providers of paid video already must pay for the license. And creators of en-/decoders have to pay 5 Mio. $ a year, too.
You seem to have missed the news that the royalty-free period for Internet video was extended through 2015.
http://www.mpegla.com/Lists/MPEG%20LA%20News%20List/Attachments/226/n-10-02-02.pdf
Why was it extended? Could it be because they’re not sure they own the market yet? The whole thing sounds a bit too much like the classic pusher the-first-one-is-free gambit for me.
Which means the point is to concentrate on a good free format. Not to race into some proprietary corral we then have to climb back out of all over again.
All the extension means is we have another 6 years to develop a GOOD, free codec.
Many of your points sound reasonable, but your argument is strongly undermined by the fact that you offer not a single apples-to-apples comparison between ogg and any other container format in your article. On a section-by-section basis:
Generalities/codec mapping:
You complain about how there is no global mapping, but do not assert that other containers have one.
Overhead:
The breakdown of where space is wasted is informative and mostly reasonable, but some of them seem to be a reach, such as the checksum being unneeded, and your suggestion of implementing the functionality in optional fields seems like a bad idea to me in general, since it will make the header variable-length, which is something to strongly avoid in my experience. Finally, when you do “compare” ogg to mp4, you compare some rather hand-wavey numbers for ogg to a different scenario for mp4.
Latency:
You fire off a bunch of numbers here, but then offer no comparison to the alternatives. In fact you don’t even provide an explanation of how other formats avoid this latency in theory, much less in practice, and instead of showing how bad the latency is, you use it as a platform to show that a naive reaction to the issue will cause bad header overhead.
Random Access:
In this section you list quite a few worst-case numbers for disk accesses (why isn’t it being pre-cached by the filesystem?) and then end with no comparison to alternatives at all.
Complexity:
Once again you have a bunch of statements of problems you have with the format, but no comparisons to “good” formats, in addition this section is particularly weak, with statements like, “implementation is annoying”, and “ambiguity is bad”.
Final Words:
“We have shown” is a rather specific claim to make, which you have not remotely achieved. This pretty much sums up the whole article, which is titled “Ogg objections”, but then tries in the text to bill itself as a rigorous analysis of ogg, which it is not.
If you had matched the tone of the article to the title, this would be reasonable, but you only hurt your position when you throw around phrases like, “True generality is evidently not to be found with the Ogg format.”, “The Ogg format is clearly not a good choice for a low-latency application.”, and “We have shown the Ogg format to be a dubious choice in just about every situation.”. You have demonstrated NONE of the above claims, and by making them you have rendered me skeptical of the rest of your claims.
Lacking an index is not a fatal design flaw for the container. It can be added afterwards with no incompatible changes and minimal disruption. In fact, Ogg is growing an index for fast seeking over high latency connections right now: http://wiki.xiph.org/Ogg_Index
I agree with the comment on Matroska.
Discarding the issue of the patent is not serious and, I may add, probably biased.
This discussion cannot be done only on the basis of technical qualities.
There is a loong list of situations where the worse technical solution was chosed based on arguments completely out of the technical debate, for instance the use of MS DOS in the first IBM PC, which has led to the monopoly of the worst OS technology ever, and it still holds…
The fact that ogg is patent-free is a tremendous argument for ogg and the key reason for its support, no matter what are the technical issue. If you don’t see it, you’re either blind or purposly naive.
Now, I agree on the technical side. Ogg is not a good container, at least for the broad purpose the xiph fundations pushes.
A critical example is streaming. The fact that the first page needs to be passed means that building a streaming server for ogg is incredibly complicated, generates unregular streams (page 0 and then page 13542 for instance), and prevents any implementation using UDP (and I do not mention the issues with page numbering in the various logical streams and the implied latency).
When you compare to the amount of work needed to stream a headerless container such as mpeg, this simply becomes ridiculous.
Hence, yes, we need a light versatile headerless container. But, no, some patent-encumbered solution, such as mpeg, despite its great technical advantages, is not an option.
The next step would probably be a proposal for such a container. And, instead of whining here (which includes me), someone to do the job and bring on the proposal…
I think this guy is right. You should try to join the community of container design because your claims shows a good experience in the field. The tone of your article may nevertheless have shocked some of them and developers get flamy in that kind of situation :/ But there is always room for progress, nothing is perfect and your cooperation could get a new member of the FS community. And you could help refine a great, worldwide-used container over the web :) :)
Do you actually make and encode video for a living? I do. H.264 is a pretty stellar codec for delivery. My experience with Ogg supports the statements made in criticism of it. It’s just not as efficient.
Great comments on codec vs. container education, I run into that all the time.
Ogg has been championed by the “it must be TRULY free or it’s just EVIL!” community. I could care less. If we have to pay for quality, businesses will. If not, we’ll use another – but we’ll use one that is mature and actually WORKS.
the container and format wars are not about technology, they are about control. whether your analysis is valid or not is mostly irrelevant. what is relevant is whether one company is controlling the rights to dominant standardized codecs. the single most important issue to me is getting decoders and encoders under a royalty-free [a|l]gpl style license. fwiw, hth.
_J
You are correct, it is almost certainly more about control and less about technology. Am I the only one to see the hypocrisy in an organisation like Xiph, ostensibly campaigners for freedom, engaging in a battle for control in this manner?
Xiph foundation is in a battle to give public a chance to legally enjoy audio and video.
You buy a audio CD, DVD, Blu Ray, etc. does not mean you have a right to hear it or see it.
You have play it on a player which the manufacturer of the player paid patent royalties taken from you and to patent holders.
It is illegal for you to write a player even if you know how to write one except for the codecs and container formats developed by Xiph. There are few other patent-free codecs and container formats developed others too.
Last time I checked:
– mpeg1 is out of patents for sure
– mp2 is out of patents for sure
– the mpeg container is out of patents for sure as well…
Now if you want to play with the bleeding edge we got dirac and snow as codec and mkv and nut as container…
Now, tell me again why we should pick something like Ogg?
I think you should update the section on the index. The ogg container does indeed have a newly-implemented seek index to avoid precisely the kind of b-search you describe. The specification (http://wiki.xiph.org/Ogg_Index) is still a work in progress, but it’s implemented in ffmpeg2theora and Firefox nightlies are ‘index-aware’ too.
After all these years of insisting indexes are not needed, they are now adding one. Isn’t backtracking beautiful?
Considering that the alternative is permanent denial, backtracking doesn’t sound so bad now, does it?
Having written the second ogm filter for directshow I can only agree with everything, ogg is just that typical “would not touch with a ten foot pole” thing.
Wow I’m surprised Matroska container wasn’t mentioned in the article at all. It has less overhead than .mp4, better random access, better streming capabilities, better compatibility with different video formats(almost all except the new Dirac format but it’s being worked on), better latency and is totally open and free, often used for h.264 movies on the net, even thou it has very little industrial support and is not played with ps3/X-box out of the box. Ogg container sux yes and nobody seriously uses it for video and it’s a bad representer of open containers but so is the mess that is mp4.
Are you saying it’s OK to become dependent on H264/AVC now because payment is not required until 2015? A royalty-free period is merely there to get more parties dependent on the format so that there is a bigger, more willing cow to milk later.
Would it not be better to work towards open, royalty-free formats? Would the Web have become as pervasive as it is if the use of HTML and HTTP required royalty payments?
Excellent post.
The point of making an apples-to-apples comparison is not without merit, but it doesn’t change the core of this argument; a ‘good’ container format would be one which doesn’t have any of these failings.
With that said, what would be ‘best’ container right now, ignoring the issues you present here?
And most importantly; if we’re defining standards for the future, whats the theoretical ‘best’ practice?
Yes, you are probably one of the few. If you don’t see the difference of organizations like Xiph with patent stacking organizations that could withdraw free use of a technology at any time, then you need a pair of very special glasses.
Nice article, your evidence looks pretty reasonable.
I believe Ogg’s “patent-free” claim is heavily overblown; as the Wikipedia page points out, with an area this heavily patented you simply don’t know until somebody sues you.
Stream of consciousness replies to your article…
The whining about using 32 *whole bits* for certain fields doesn’t do much for me. I’m not sure I actually believe in integers narrower than 32 bits any more. However, the “add these bytes up to get the length” method is dumb, so I’d want to replace that with a 32-bit integer field…
I’m confused about the phrase “reliable storage/transmission medium”. What is that? I understand that the “average” bit-flip in an audio or video stream will not be noticed, but this doesn’t magically erase the value of checksums. Additionally, your desire for short checksums makes them useless — the shorter it is, the more likely random gibberish can be spewed into the payload and still pass the checksum.
Your persistent drum banging for sub-32-bit fields and also for speed of computation are incompatible. It’s typically substantially more time expensive to work on narrower fields and/or on unaligned data.
The overhead gets all the way up to one whole percent. OMFG, call the Marines! Feh.
I don’t care about startup latency until it makes it to 0.3 second. For asynchronous streaming (say a phone call or video conference) most of the time, mostly “nothing happened” is encoded. We can probably put several packets of that in a frame, introduce latency, and have no visible effect. So latency may be a red herring.
Timestamps in the container aren’t sufficient to get presentation simultaneity in multiple streams, unless all the decoders introduce exactly the same latency. Yeah. Right.
128kB? Really? Can I buy a DRAM that small any more?
I have to agree with other responders, above. This is a one-sided rant until it is put in a contrasting context with other containers.
Not every device is a desktop computer.
As someone who has written a parser for OGG (and several other container formats), I have to agree, the format has serious “issues” – like there is no actual fixed length for the codec identifier, nor is it null terminated. You are supposed to have a known list you are looking for, and so you don’t need to know the length or have a null byte. What if you just want to display the codecs used to the user?
Also the division between, f.e., OGG and Vorbis is not very clean.
Obviously the patent issue *is* a big issue politically, but I am not sure how big of a practical issue it is for most people. Personal users don’t have to worry about it. Open source users can bitch about it and then download a reference implementation “illegally” with no issues. Big companies can easily pay the licensing costs. I think the only ones who will truly be affected are small companies.
Thanks for this write-up. It’s good to have a cogent summation of all the problems with this format. Many of the objections I have read seem charge that you failed to demonstrate how other container formats are better than Ogg in various areas. Perhaps the whole discussion should have been prefaced with “This assumes general technical knowledge of how multimedia is stored in a container format.” So much of what Ogg does is counter-intuitive to those “skilled in the art”.
I believe that if we are to be using an Open Video container that Matroska is the way to go. It is much better than ogm and has much less overhead.
Try UTF if you want a way of representing arbitrarily-large integers compactly. It is widely implemented and used, so you do not need to write any new code.
If all you want to do it represent integers, you can do better than UTF-8 (see http://guru.multimedia.cx/utf-8/ for an example). UTF-8 has some advantages for strings – null never appears within another character, etc – but those don’t matter for storing integers.
Well, patent-free stuff is all well and good but I refuse to use it if it doesn’t do the job properly. I’m very skeptical to Xiph Foundation. I don’t use any of their codecs or container formats.
matroska is far the best container at the moment with mp4 coming second.
Me personally use technology in a way that patents become a non issue.
but yeah I guess for browsers like Firefox and Opera they would have to stick with Xiph’s technology because they can’t fork up the licensing cost of using MPEG-4 family of technologies.
Google did purchase On2 so we can all hope they will free the VP8 codec. Matroska could be used as the container as it supports streaming (altough not used for this purpose currently by anyone I know of).
THANK YOU! Someone finally got the word out.
Sometimes I wonder what’s patented in the MP4FF once you take out BIFS, scene graphs, initial object descriptors, and all those other weird things that are grouped with it that no one ever uses. Has anyone ever look into this?
“The segment_table field lists the sizes of all packets in the page. Each value in the list is coded as a number of bytes equal to 255 followed by a final byte with a smaller value. The packet size is simply the sum of all these bytes.”
Wow. Amazing! Welcome to 1985-way (PC) of doing things.
I’ve disliked OGG before, but now I will surely stay far away from it and never touch it again. Just goes to show how much developers really suck.
Why not post a new format that works? You seem to have all the knowledge to release a top notch design. Why don’t you? It would be great to get a single working container.
You have to consider ogg is not expected to be used with non-Xiph format streams. They just aim at containing Theora, Vorbis and the other official Xiph Foundation formats.
And you don’t have to code your own muxer/demuxer from scratch since there’s libogg which is BSD-style licensed and you can even pack it in your own proprietary program and charge for it any amount or use it as you wish. There’s also Icecast, which streams ogg and the code it open too.
If C is not your languaje, you can cheat and copy the code into your lang.
But if you are just trying to use ogg for streams in any format, ogg is not the right container so feel free to use any other you find fits better.
As a long time advocated of Ogg Vorbis (who recently switched back to MP3 again) I would love to hear (even a brief) analysis of the Vorbis codec. Though, I suppose this would be a lot more verbose than talking about a container, even just a few words would be insightful.
Thank you for the article though!
.:excitatory
first thanks for a such nice report and discussion.
I think in some sense we are fogetting one of the most important reasons to use ogg as a container : “community”. Free software projects and initiatives needs our support to grow. And yes, anybody who uses ogg knows there is something wrong on it (for now), but all we are expecting it to be solved and anyway we are supporting it as much as we can (of course, when it’s possible and where it’s bad performance is less evident).
just think for a while on how many free software you had been using in your “computer user” live with worst performance than the propertary alternative? …
and now, If nobody uses a monolitic kernel because microkernel has better performance, then who will support Linux?
in that sense performance is not so important, no?
We use ogg because is what we have. ofcourse is good to know it’s not perfect so then… try to fix it. but please don’t stop using it, don’t stop supporting it… and don’t try to convence people to stop doing it :) …
ogg, even it’s lacks is being used a lot in success stories : lets say firefox, spotify, icecast radios … let’s try to help improving it and not to rest …
thanks again,
ll.
The problem is that Xiph are refusing to fix the problems, preferring instead to turn a blind eye and chant “patents, patents, patents”.
Newsflash, microkernels do have worst performance, they have theoretical better reliability…
While I’ll admit the Ogg container is lacking, at least they seem to have taken to heart some of it’s shortcomings and proposed some improvements for the future. Ogg Skeleton seems to be an attempt to provide the “codec id” and “indexing” you’re looking for.
Coupled with the extension on MPEG LA, hopefully by the time the licensing becomes an issue, Ogg will have improved itself to the point where it’s a better competitor, providing a free AND good container format.
If the same marketing effort had been poured on mkv or nut we’d have already something good, free and widespread.
Read the article once more. It’s not possible to fix Ogg’s shortcomings in compatible ways. If you are going to go the incompatible route, you might as well switch to something else entirely.
And what is it with people continuing to talk about *free* container formats? Containers are not codecs. There have never been patent issues with containers in the real world…
Poison Ivy leaves are free, but that doesn’t magically make them suitable as toilet paper if you are in the woods.
“Being free from patents does not magically make Ogg a good choice as file format. If all the standard formats are indeed covered by patents, the only proper solution is to design a new, good format which is not, this time hopefully avoiding the old mistakes.”
Please do so.
Freedom from patents is a very, very important choice for users, so much so that we are happily willing to ignore the numerous technical problems you describe and use this broken format. This is not an issue of magic, but practical issues out-of-band to the technical discussion you provide here.
It doesn’t hurt that Ogg Vorbis provides excellent perceived sound quality relative to bitrate.
A cheksum is useful for protecting data at rest. Other then that I agree with your assessment. It is pretty odd to lavishly allocate 32bits to a stream id that really doesn’t need more then ~10 distinct values but only use 8 bits for a size value that will frequently need to be over 255.
Also the escaping method is weak, if values over 250 were used to encode the length’s length the expansion from lame format choices would at least only be one byte.
A 32-bit CRC is a very weak checksum and is not suitable for protecting data in long-term storage. Furthermore, filesystems like ZFS already provide integrity checks. File formats (other than archive formats) should not concern themselves with such details.
Re: Concatenating independent Ogg streams forms a valid stream.
I personally make frequent use of this particular feature when dealing with music albums where the tracks are seamlessly mixed together (e.g. Paul Oakenfold’s Tranceport). I can rip the individual tracks and tag them and leave them separate for individual listening, but also make a single combined file by concatenating all of the tracks–and still maintain the separate tag info. I could get by without this feature, but it is still nice to have and does have a valid use case.
What if you want to examine all the metadata for one of these concatenated files? You’ll have to parse the whole thing. Assigning metadata to specific regions of a file is possible, or could easily be added, in MP4 or Matroska. Besides, it’s not safe to blindly concatenate Ogg files. First you must ensure that the elementary stream IDs in each of them are unique.
Real World use of Audio in OGG here…
I use Native Instruments Traktor software for live performance, and have a few dozen tunes in OGG. The sound quality is certainly no better than a well encoded MP3, but it seems to add undue overhead while processing these files.
Due to the complex looping and editing you can do on the fly, it seems this format is, in my personal experience, ill suited to quick, random access and playback.
Im not even close to the level of technical expertise as you lot, but it’s a real world example of a free format just not being good for my use. FLAC performance, also under the Xiph aegis, is awesome however.
$0.02 added.
I’d edit the final section of your article – I think the main reason a lot of people defend Ogg is that they get confused around the whole container vs codec issue.
At least one of the comments backs this view up, in believing there is a royalty cost to the MPEG container.
Of course there is the hard-line view that even the royalty-free MPEG format is still based on patents, rather than being patent-free.
(But then we enter the territory of what ‘patent-free’ actually means, and whether it is safer to use a patented but royalty free standard)
Pingback: A Critique of the Ogg Container Format | Extra Future
Thanks for an informative post. I am not very knowledgeable of the innards behind the various codecs and components, yet I found this information helpful in terms of explaining motivations behind Ogg’s adoption (or lack of, in case of Apple).
I do have one quick question–and forgive my ignorance if I’m mistaking scenarios here: per your comment, “Those objecting that this index would be unavailable in a streaming scenario are forgetting that seeking is impossible there regardless.” When you say “seeking is impossible there”, does “there” mean “when streaming”? For example, when I watch a youtube video or some other flash videos, I have the ability to seek to a position, and the stream will re-buffer and begin from that point. What is going on the server (sending) side when a client “seeks” on a stream?
Cheers,
Alex
When I wrote streaming, I was primarily thinking of true streaming as in broadcast, where the receiver has no control the stream being received. On youtube and similar sites, a certain degree of random access is possible. If random access is possible, the client can begin by fetching the index regardless of where it is stored, and the argument falls apart once again. Seeking in a network-delivered stream is also one of the cases where an index is a necessity due to the high latencies involved (I mentioned this somewhere in the text).
> This discussion cannot be done only on the basis
> of technical qualities.
Spoken like somebody who doesn’t actually require this technology for their everyday work.
You might as well say “let’s use an egg as the multimedia container format … it’s not patented.” Or maybe we should use a raincoat, or a bumble bee.
The technical discussion comes first. Ogg is disqualified solely on the basis of that. We don’t need to continue the discussion. Ogg has been dead and buried for a decade while the entire world embraced H.264.
> The next step would probably be a proposal for such a container.
Yeah, that happened in 1998 and it was over by 2001. So welcome to the party. The container format that was chosen was the one that everybody had already been using for years at that point: the QuickTime container. That is about as surprising as Apache running on Unix.
What’s happening today is the Web is trying to move the video player out of plug-ins and into browsers. Maybe it seems like online video is just starting to you but it’s not. You’ve been watching H.264 in your FlashPlayer in Firefox for years now. YouTube runs on iPods and all the smartphones. We’re way, way into this.
> The fact that ogg is patent-free is a tremendous argument for ogg
Ogg is not patent-free unless it lives on Mars or something. It may not be patented, but it likely infringes patents. But again, that is immaterial if it is not up to the technical task.
> MS DOS in the first IBM PC
That’s supposed to be an argument in favor of Ogg?
Nice breakdown of the format. However, in the end *users* just want to see that kitty movie on their screen. They don’t really care about bits or bytes.
An informative read overall, but I am baffled by this sentence.
“Those objecting that this index would be unavailable in a streaming scenario are forgetting that seeking is impossible there regardless”
I suppose this would be true of a live stream, but the ability to seek without having to download the entire file is one of the few valid reasons to use streaming delivery. This is especially important if publishing agreements require streaming delivery.
If you can’t seek, an index will do you no good. If you can seek, you can go fetch the index wherever it is. There is no problem.
Ogg sucks, film at 11. What about Theora+Vorbis in .mkv?
As a previous commenter pointed out, you don’t provide a single comparison of ogg with another container on a like for like basis.
As it is all you end up doing is involuntarily providing ammunition to incompetent pundits like John Gruber who can barely string 2 lines of perl together.
Also why not discuss the technical trade offs involved in each point? Any engineer worth the name knows there is no ideal solution to any technical issue, only a balance of trade-offs and choosing among them based on your needs!
For example what exactly is the problem created by the codec specific mappings when there is only 1 combination of audio and video codecs in widespread use on the web and implemented by the web browsers?
Also you are just plain wrong about with your last comment about “all these years” and indexes – I was at an linux-conf au presentation 3.5 years ago where it and other similiar issues like seeking with the ogg format were discussed, so pls stop rewriting history.
“For example what exactly is the problem created by the codec specific mappings when there is only 1 combination of audio and video codecs in widespread use on the web and implemented by the web browsers?”
But that isn’t the case, is it? Until recently, web browsers haven’t implemented any video file support. Ogg+Theora playback is probably going to be implemented in Chrome, Mozilla and Opera while H.264 is going to be implemented in Chrome, Safari and IE (http://en.wikipedia.org/wiki/HTML5_video#Browser_support). If Theora is going to be the only codec that will be supported for the Ogg container, doesn’t that undermine developments like Dirac?
“Also you are just plain wrong about with your last comment about ‘all these years’ and indexes – I was at an linux-conf au presentation 3.5 years ago where it and other similiar issues like seeking with the ogg format were discussed, so pls stop rewriting history.”
So the indices were discussed 3.5 years ago and are still at draft stage?
Pingback: Hjalmar 007s it-blogg » Lite tankar om proprietära system
Pingback: Ogg vs World (as picked up from Slashdot) | Axant Tech Blog
One can read Monty’s reply at
http://people.xiph.org/~xiphmont/lj-pseudocut/o-response-1.html
Why not use Matroska instead of Ogg? As far as I can tell, Matroska avoids all the problems with Ogg, and is still open. If their are problems with Matroska, it should be relatively easy to develop a new format that fixes the problems.
I think a lot of people don’t seem to realize that Ogg is a container format and Theora and Vorbis are not bound to it at all.
Container formats are much simpler to develop, and much easier to avoid patents with than audio or video codecs like Vorbis and Theora.
Somebody should write patches for Firefox, Chrome, and common open source audio/video editors (if they don’t already) to support Matroska. If Ogg really is that bad, being optimistic, I can imagine Matroska or another format basically replacing Ogg within ~6 months.
Matroska already replaced Ogg for all practical purposes about 6 years ago, and now that Google chose Matroska for their Webm thing, we can probably expect support for it in all major browsers.
Pingback: Why Lossless Audio Codecs generally suck « Kostya’s Wild Codec World
Pingback: Why didn't Google use OGG or MPEG-4 Part 12 as the container format for WebM? - Quora
Pingback: FFMPEG Aotuv | hardyweb:perjalanan hidupku
Pingback: All Containers Suck « Kostya’s Wild Codec World
Pingback: How the codecs should emerge (hint: without .ebuilds) « Kostya's Wild Codec World