123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513 |
- <?xml version="1.0" encoding="US-ASCII"?>
- <!DOCTYPE rfc SYSTEM "rfc2629.dtd">
- <?rfc toc="yes"?>
- <?rfc tocompact="yes"?>
- <?rfc tocdepth="3"?>
- <?rfc tocindent="yes"?>
- <?rfc symrefs="yes"?>
- <?rfc sortrefs="yes"?>
- <?rfc comments="yes"?>
- <?rfc inline="yes"?>
- <?rfc compact="yes"?>
- <?rfc subcompact="no"?>
- <rfc category="std" docName="draft-ietf-codec-opus-update-10"
- ipr="trust200902" updates="6716">
- <front>
- <title abbrev="Opus Update">Updates to the Opus Audio Codec</title>
- <author initials="JM" surname="Valin" fullname="Jean-Marc Valin">
- <organization>Mozilla Corporation</organization>
- <address>
- <postal>
- <street>331 E. Evelyn Avenue</street>
- <city>Mountain View</city>
- <region>CA</region>
- <code>94041</code>
- <country>USA</country>
- </postal>
- <phone>+1 650 903-0800</phone>
- <email>jmvalin@jmvalin.ca</email>
- </address>
- </author>
- <author initials="K." surname="Vos" fullname="Koen Vos">
- <organization>vocTone</organization>
- <address>
- <postal>
- <street></street>
- <city></city>
- <region></region>
- <code></code>
- <country></country>
- </postal>
- <phone></phone>
- <email>koenvos74@gmail.com</email>
- </address>
- </author>
- <date day="24" month="August" year="2017" />
- <abstract>
- <t>This document addresses minor issues that were found in the specification
- of the Opus audio codec in RFC 6716. It updates the normative decoder implementation
- included in the appendix of RFC 6716. The changes fixes real and potential security-related
- issues, as well minor quality-related issues.</t>
- </abstract>
- </front>
- <middle>
- <section title="Introduction">
- <t>This document addresses minor issues that were discovered in the reference
- implementation of the Opus codec. Unlike most IETF specifications, Opus is defined
- in <xref target="RFC6716">RFC 6716</xref> in terms of a normative reference
- decoder implementation rather than from the associated text description.
- That RFC includes the reference decoder implementation as Appendix A.
- That's why only issues affecting the decoder are
- listed here. An up-to-date implementation of the Opus encoder can be found at
- <eref target="https://opus-codec.org/"/>.</t>
- <t>
- Some of the changes in this document update normative behaviour in a way that requires
- new test vectors. The English text of the specification is unaffected, only
- the C implementation is. The updated specification remains fully compatible with
- the original specification.
- </t>
- <t>
- Note: due to RFC formatting conventions, lines exceeding the column width
- in the patch are split using a backslash character. The backslashes
- at the end of a line and the white space at the beginning
- of the following line are not part of the patch. A properly formatted patch
- including all changes is available at
- <eref target="https://www.ietf.org/proceedings/98/slides/materials-98-codec-opus-update-00.patch"/>
- and has a SHA-1 hash of 029e3aa88fc342c91e67a21e7bfbc9458661cd5f.
- </t>
- </section>
- <section title="Terminology">
- <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
- "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
- document are to be interpreted as described in <xref
- target="RFC2119">RFC 2119</xref>.</t>
- </section>
- <section title="Stereo State Reset in SILK">
- <t>The reference implementation does not reinitialize the stereo state
- during a mode switch. The old stereo memory can produce a brief impulse
- (i.e. single sample) in the decoded audio. This can be fixed by changing
- silk/dec_API.c at line 72:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- for( n = 0; n < DECODER_NUM_CHANNELS; n++ ) {
- ret = silk_init_decoder( &channel_state[ n ] );
- }
- + silk_memset(&((silk_decoder *)decState)->sStereo, 0,
- + sizeof(((silk_decoder *)decState)->sStereo));
- + /* Not strictly needed, but it's cleaner that way */
- + ((silk_decoder *)decState)->prev_decode_only_middle = 0;
-
- return ret;
- }
- <CODE ENDS>
- ]]></artwork>
- </figure>
- <t>
- This change affects the normative output of the decoder, but the
- amount of change is within the tolerance and too small to make the testvector check fail.
- </t>
- </section>
- <section anchor="padding" title="Parsing of the Opus Packet Padding">
- <t>It was discovered that some invalid packets of very large size could trigger
- an out-of-bounds read in the Opus packet parsing code responsible for padding.
- This is due to an integer overflow if the signaled padding exceeds 2^31-1 bytes
- (the actual packet may be smaller). The code can be fixed by decrementing the
- (signed) len value, instead of incrementing a separate padding counter.
- This is done by applying the following changes at line 596 of src/opus_decoder.c:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- /* Padding flag is bit 6 */
- if (ch&0x40)
- {
- - int padding=0;
- int p;
- do {
- if (len<=0)
- return OPUS_INVALID_PACKET;
- p = *data++;
- len--;
- - padding += p==255 ? 254: p;
- + len -= p==255 ? 254: p;
- } while (p==255);
- - len -= padding;
- }
- <CODE ENDS>
- ]]></artwork>
- </figure>
- <t>This packet parsing issue is limited to reading memory up
- to about 60 kB beyond the compressed buffer. This can only be triggered
- by a compressed packet more than about 16 MB long, so it's not a problem
- for RTP. In theory, it could crash a file
- decoder (e.g. Opus in Ogg) if the memory just after the incoming packet
- is out-of-range, but our attempts to trigger such a crash in a production
- application built using an affected version of the Opus decoder failed.</t>
- </section>
- <section anchor="resampler" title="Resampler buffer">
- <t>The SILK resampler had the following issues:
- <list style="numbers">
- <t>The calls to memcpy() were using sizeof(opus_int32), but the type of the
- local buffer was opus_int16.</t>
- <t>Because the size was wrong, this potentially allowed the source
- and destination regions of the memcpy() to overlap on the copy from "buf" to "buf".
- We believe that nSamplesIn (number of input samples) is at least fs_in_khZ (sampling rate in kHz),
- which is at least 8.
- Since RESAMPLER_ORDER_FIR_12 is only 8, that should not be a problem once
- the type size is fixed.</t>
- <t>The size of the buffer used RESAMPLER_MAX_BATCH_SIZE_IN, but the
- data stored in it was actually twice the input batch size
- (nSamplesIn<<1).</t>
- </list></t>
- <t>The code can be fixed by applying the following changes to line 78 of silk/resampler_private_IIR_FIR.c:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- )
- {
- silk_resampler_state_struct *S = \
- (silk_resampler_state_struct *)SS;
- opus_int32 nSamplesIn;
- opus_int32 max_index_Q16, index_increment_Q16;
- - opus_int16 buf[ RESAMPLER_MAX_BATCH_SIZE_IN + \
- RESAMPLER_ORDER_FIR_12 ];
- + opus_int16 buf[ 2*RESAMPLER_MAX_BATCH_SIZE_IN + \
- RESAMPLER_ORDER_FIR_12 ];
-
- /* Copy buffered samples to start of buffer */
- - silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \
- * sizeof( opus_int32 ) );
- + silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \
- * sizeof( opus_int16 ) );
-
- /* Iterate over blocks of frameSizeIn input samples */
- index_increment_Q16 = S->invRatio_Q16;
- while( 1 ) {
- nSamplesIn = silk_min( inLen, S->batchSize );
-
- /* Upsample 2x */
- silk_resampler_private_up2_HQ( S->sIIR, &buf[ \
- RESAMPLER_ORDER_FIR_12 ], in, nSamplesIn );
-
- max_index_Q16 = silk_LSHIFT32( nSamplesIn, 16 + 1 \
- ); /* + 1 because 2x upsampling */
- out = silk_resampler_private_IIR_FIR_INTERPOL( out, \
- buf, max_index_Q16, index_increment_Q16 );
- in += nSamplesIn;
- inLen -= nSamplesIn;
-
- if( inLen > 0 ) {
- /* More iterations to do; copy last part of \
- filtered signal to beginning of buffer */
- - silk_memcpy( buf, &buf[ nSamplesIn << 1 ], \
- RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );
- + silk_memmove( buf, &buf[ nSamplesIn << 1 ], \
- RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );
- } else {
- break;
- }
- }
-
- /* Copy last part of filtered signal to the state for \
- the next call */
- - silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \
- RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );
- + silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \
- RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );
- }
- <CODE ENDS>
- ]]></artwork>
- </figure>
- </section>
- <section title="Integer wrap-around in inverse gain computation">
- <t>
- It was discovered through decoder fuzzing that some bitstreams could produce
- integer values exceeding 32-bits in LPC_inverse_pred_gain_QA(), causing
- a wrap-around. The C standard considers
- this behavior as undefined. The following patch to line 87 of silk/LPC_inv_pred_gain.c
- detects values that do not fit in a 32-bit integer and considers the corresponding filters unstable:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- /* Update AR coefficient */
- for( n = 0; n < k; n++ ) {
- - tmp_QA = Aold_QA[ n ] - MUL32_FRAC_Q( \
- Aold_QA[ k - n - 1 ], rc_Q31, 31 );
- - Anew_QA[ n ] = MUL32_FRAC_Q( tmp_QA, rc_mult2 , mult2Q );
- + opus_int64 tmp64;
- + tmp_QA = silk_SUB_SAT32( Aold_QA[ n ], MUL32_FRAC_Q( \
- Aold_QA[ k - n - 1 ], rc_Q31, 31 ) );
- + tmp64 = silk_RSHIFT_ROUND64( silk_SMULL( tmp_QA, \
- rc_mult2 ), mult2Q);
- + if( tmp64 > silk_int32_MAX || tmp64 < silk_int32_MIN ) {
- + return 0;
- + }
- + Anew_QA[ n ] = ( opus_int32 )tmp64;
- }
- <CODE ENDS>
- ]]></artwork>
- </figure>
- </section>
- <section title="Integer wrap-around in LSF decoding" anchor="lsf_overflow">
- <t>
- It was discovered -- also from decoder fuzzing -- that an integer wrap-around could
- occur when decoding bitstreams with extremely large values for the high LSF parameters.
- The end result of the wrap-around is an illegal read access on the stack, which
- the authors do not believe is exploitable but should nonetheless be fixed. The following
- patch to line 137 of silk/NLSF_stabilize.c prevents the problem:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- /* Keep delta_min distance between the NLSFs */
- for( i = 1; i < L; i++ )
- - NLSF_Q15[i] = silk_max_int( NLSF_Q15[i], \
- NLSF_Q15[i-1] + NDeltaMin_Q15[i] );
- + NLSF_Q15[i] = silk_max_int( NLSF_Q15[i], \
- silk_ADD_SAT16( NLSF_Q15[i-1], NDeltaMin_Q15[i] ) );
-
- /* Last NLSF should be no higher than 1 - NDeltaMin[L] */
- <CODE ENDS>
- ]]></artwork>
- </figure>
- </section>
- <section title="Cap on Band Energy">
- <t>On extreme bit-streams, it is possible for log-domain band energy levels
- to exceed the maximum single-precision floating point value once converted
- to a linear scale. This would later cause the decoded values to be NaN (not a number),
- possibly causing problems in the software using the PCM values. This can be
- avoided with the following patch to line 552 of celt/quant_bands.c:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- {
- opus_val16 lg = ADD16(oldEBands[i+c*m->nbEBands],
- SHL16((opus_val16)eMeans[i],6));
- + lg = MIN32(QCONST32(32.f, 16), lg);
- eBands[i+c*m->nbEBands] = PSHR32(celt_exp2(lg),4);
- }
- for (;i<m->nbEBands;i++)
- <CODE ENDS>
- ]]></artwork>
- </figure>
- </section>
- <section title="Hybrid Folding" anchor="folding">
- <t>When encoding in hybrid mode at low bitrate, we sometimes only have
- enough bits to code a single CELT band (8 - 9.6 kHz). When that happens,
- the second band (CELT band 18, from 9.6 to 12 kHz) cannot use folding
- because it is wider than the amount already coded, and falls back to
- white noise. Because it can also happen on transients (e.g. stops), it
- can cause audible pre-echo.
- </t>
- <t>
- To address the issue, we change the folding behavior so that it is
- never forced to fall back to LCG due to the first band not containing
- enough coefficients to fold onto the second band. This
- is achieved by simply repeating part of the first band in the folding
- of the second band. This changes the code in celt/bands.c around line 1237:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- b = 0;
- }
-
- - if (resynth && M*eBands[i]-N >= M*eBands[start] && \
- (update_lowband || lowband_offset==0))
- + if (resynth && (M*eBands[i]-N >= M*eBands[start] || \
- i==start+1) && (update_lowband || lowband_offset==0))
- lowband_offset = i;
-
- + if (i == start+1)
- + {
- + int n1, n2;
- + int offset;
- + n1 = M*(eBands[start+1]-eBands[start]);
- + n2 = M*(eBands[start+2]-eBands[start+1]);
- + offset = M*eBands[start];
- + /* Duplicate enough of the first band folding data to \
- be able to fold the second band.
- + Copies no data for CELT-only mode. */
- + OPUS_COPY(&norm[offset+n1], &norm[offset+2*n1 - n2], n2-n1);
- + if (C==2)
- + OPUS_COPY(&norm2[offset+n1], &norm2[offset+2*n1 - n2], \
- n2-n1);
- + }
- +
- tf_change = tf_res[i];
- if (i>=m->effEBands)
- {
- <CODE ENDS>
- ]]></artwork>
- </figure>
- <t>
- as well as line 1260:
- </t>
- <figure>
- <artwork><![CDATA[
- <CODE BEGINS>
- fold_start = lowband_offset;
- while(M*eBands[--fold_start] > effective_lowband);
- fold_end = lowband_offset-1;
- - while(M*eBands[++fold_end] < effective_lowband+N);
- + while(++fold_end < i && M*eBands[fold_end] < \
- effective_lowband+N);
- x_cm = y_cm = 0;
- fold_i = fold_start; do {
- x_cm |= collapse_masks[fold_i*C+0];
- <CODE ENDS>
- ]]></artwork>
- </figure>
- <t>
- The fix does not impact compatibility, because the improvement does
- not depend on the encoder doing anything special. There is also no
- reasonable way for an encoder to use the original behavior to
- improve quality over the proposed change.
- </t>
- </section>
- <section title="Downmix to Mono" anchor="stereo">
- <t>The last issue is not strictly a bug, but it is an issue that has been reported
- when downmixing an Opus decoded stream to mono, whether this is done inside the decoder
- or as a post-processing step on the stereo decoder output. Opus intensity stereo allows
- optionally coding the two channels 180-degrees out of phase on a per-band basis.
- This provides better stereo quality than forcing the two channels to be in phase,
- but when the output is downmixed to mono, the energy in the affected bands is cancelled
- sometimes resulting in audible artifacts.
- </t>
- <t>As a work-around for this issue, the decoder MAY choose not to apply the 180-degree
- phase shift. This can be useful when downmixing to mono inside or
- outside of the decoder (e.g. user-controllable).
- </t>
- </section>
- <section title="New Test Vectors">
- <t>Changes in <xref target="folding"/> and <xref target="stereo"/> have
- sufficient impact on the testvectors to make them fail. For this reason,
- this document also updates the Opus test vectors. The new test vectors now
- include two decoded outputs for the same bitstream. The outputs with
- suffix 'm' do not apply the CELT 180-degree phase shift as allowed in
- <xref target="stereo"/>, while the outputs without the suffix do. An
- implementation is compliant as long as it passes either set of vectors.
- </t>
- <t>
- Any Opus implementation
- that passes either the original test vectors from <xref target="RFC6716">RFC 6716</xref>
- or one of the new sets of test vectors is compliant with the Opus specification. However, newer implementations
- SHOULD be based on the new test vectors rather than the old ones.
- </t>
- <t>The new test vectors are located at
- <eref target="https://www.ietf.org/proceedings/98/slides/materials-98-codec-opus-newvectors-00.tar.gz"/>.
- The SHA-1 hashes of the test vectors are:
- <figure>
- <artwork>
- <![CDATA[
- e49b2862ceec7324790ed8019eb9744596d5be01 testvector01.bit
- b809795ae1bcd606049d76de4ad24236257135e0 testvector02.bit
- e0c4ecaeab44d35a2f5b6575cd996848e5ee2acc testvector03.bit
- a0f870cbe14ebb71fa9066ef3ee96e59c9a75187 testvector04.bit
- 9b3d92b48b965dfe9edf7b8a85edd4309f8cf7c8 testvector05.bit
- 28e66769ab17e17f72875283c14b19690cbc4e57 testvector06.bit
- bacf467be3215fc7ec288f29e2477de1192947a6 testvector07.bit
- ddbe08b688bbf934071f3893cd0030ce48dba12f testvector08.bit
- 3932d9d61944dab1201645b8eeaad595d5705ecb testvector09.bit
- 521eb2a1e0cc9c31b8b740673307c2d3b10c1900 testvector10.bit
- 6bc8f3146fcb96450c901b16c3d464ccdf4d5d96 testvector11.bit
- 338c3f1b4b97226bc60bc41038becbc6de06b28f testvector12.bit
- f5ef93884da6a814d311027918e9afc6f2e5c2c8 testvector01.dec
- 48ac1ff1995250a756e1e17bd32acefa8cd2b820 testvector02.dec
- d15567e919db2d0e818727092c0af8dd9df23c95 testvector03.dec
- 1249dd28f5bd1e39a66fd6d99449dca7a8316342 testvector04.dec
- b85675d81deef84a112c466cdff3b7aaa1d2fc76 testvector05.dec
- 55f0b191e90bfa6f98b50d01a64b44255cb4813e testvector06.dec
- 61e8b357ab090b1801eeb578a28a6ae935e25b7b testvector07.dec
- a58539ee5321453b2ddf4c0f2500e856b3966862 testvector08.dec
- bb96aad2cde188555862b7bbb3af6133851ef8f4 testvector09.dec
- 1b6cdf0413ac9965b16184b1bea129b5c0b2a37a testvector10.dec
- b1fff72b74666e3027801b29dbc48b31f80dee0d testvector11.dec
- 98e09bbafed329e341c3b4052e9c4ba5fc83f9b1 testvector12.dec
- 1e7d984ea3fbb16ba998aea761f4893fbdb30157 testvector01m.dec
- 48ac1ff1995250a756e1e17bd32acefa8cd2b820 testvector02m.dec
- d15567e919db2d0e818727092c0af8dd9df23c95 testvector03m.dec
- 1249dd28f5bd1e39a66fd6d99449dca7a8316342 testvector04m.dec
- d70b0bad431e7d463bc3da49bd2d49f1c6d0a530 testvector05m.dec
- 6ac1648c3174c95fada565161a6c78bdbe59c77d testvector06m.dec
- fc5e2f709693738324fb4c8bdc0dad6dda04e713 testvector07m.dec
- aad2ba397bf1b6a18e8e09b50e4b19627d479f00 testvector08m.dec
- 6feb7a7b9d7cdc1383baf8d5739e2a514bd0ba08 testvector09m.dec
- 1b6cdf0413ac9965b16184b1bea129b5c0b2a37a testvector10m.dec
- fd3d3a7b0dfbdab98d37ed9aa04b659b9fefbd18 testvector11m.dec
- 98e09bbafed329e341c3b4052e9c4ba5fc83f9b1 testvector12m.dec
- ]]>
- </artwork>
- </figure>
- Note that the decoder input bitstream files (.bit) are unchanged.
- </t>
- </section>
- <section anchor="security" title="Security Considerations">
- <t>This document fixes two security issues reported on Opus and that affect the
- reference implementation in <xref target="RFC6716">RFC 6716</xref>: CVE-2013-0899
- <eref target="https://nvd.nist.gov/vuln/detail/CVE-2013-0899"/>
- and CVE-2017-0381 <eref target="https://nvd.nist.gov/vuln/detail/CVE-2017-0381"/>.
- CVE- 2013-0899 theoretically could have caused an information leak. The leaked
- information would have gone through the decoder process before being accessible
- to the attacker. It is fixed by <xref target="padding"/>.
- CVE-2017-0381 could have resulted in a 16-bit out-of-bounds read from a fixed
- location. It is fixed in <xref target="lsf_overflow"/>.
- Beyond the two fixed CVEs, this document adds no new security considerations on top of
- <xref target="RFC6716">RFC 6716</xref>.
- </t>
- </section>
- <section anchor="IANA" title="IANA Considerations">
- <t>This document makes no request of IANA.</t>
- <t>Note to RFC Editor: this section may be removed on publication as an
- RFC.</t>
- </section>
- <section anchor="Acknowledgements" title="Acknowledgements">
- <t>We would like to thank Juri Aedla for reporting the issue with the parsing of
- the Opus padding. Thanks to Felicia Lim for reporting the LSF integer overflow issue.
- Also, thanks to Tina le Grand, Jonathan Lennox, and Mark Harris for their
- feedback on this document.</t>
- </section>
- </middle>
- <back>
- <references title="Normative References">
- <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
- <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml"?>
- </references>
- </back>
- </rfc>
|