Using Simulcast in Session Description Protocol (SDP) and RTP SessionsEricssonGronlandsgatan 31SE-164 60 StockholmSwedenbo.burman@ericsson.comEricssonTorshamnsgatan 23SE-164 83 StockholmSwedenmagnus.westerlund@ericsson.comCisco170 West Tasman DriveSan JoseCA95134United States of Americasnandaku@cisco.comCisco170 West Tasman DriveSan JoseCA95134United States of Americamzanaty@cisco.comConferencemulti-partymiddleboxMCUSFUmediavideorestrictionsRTCPRIDRtpStreamIdIn some application scenarios, it may be desirable to send multiple
differently encoded versions of the same media source in different RTP
streams. This is called simulcast. This document describes how to
accomplish simulcast in RTP and how to signal it in the Session
Description Protocol (SDP). The described solution uses an RTP/RTCP
identification method to identify RTP streams
belonging to the same media source and makes an extension to SDP to
indicate that those RTP streams are different simulcast formats of that
media source. The SDP extension consists of a new media-level SDP
attribute that expresses capability to send and/or receive simulcast RTP
streams.Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by
the Internet Engineering Steering Group (IESG). Further
information on Internet Standards is available in Section 2 of
RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
. Introduction
. Definitions
. Terminology
. Requirements Language
. Use Cases
. Reaching a Diverse Set of Receivers
. Application-Specific Media Source Handling
. Receiver Media-Source Preferences
. Overview
. Detailed Description
. Simulcast Attribute
. Simulcast Capability
. Offer/Answer Use
. Generating the Initial SDP Offer
. Creating the SDP Answer
. Offerer Processing the SDP Answer
. Modifying the Session
. Use with Declarative SDP
. Relating Simulcast Streams
. Signaling Examples
. Single-Source Client
. Multisource Client
. Simulcast and Redundancy
. RTP Aspects
. Outgoing from Endpoint with Media Source
. RTP Middlebox to Receiver
. Media-Switching Mixer
. Selective Forwarding Middlebox
. RTP Middlebox to RTP Middlebox
. Network Aspects
. Bitrate Adaptation
. Limitation
. IANA Considerations
. Security Considerations
. References
. Normative References
. Informative References
. Requirements
Acknowledgements
Contributors
Authors' Addresses
IntroductionMost of today's multiparty video-conference solutions make use of
centralized servers to reduce the bandwidth and CPU consumption in the
endpoints. Those servers receive RTP streams from each participant and
send some suitable set of possibly modified RTP streams to the rest of
the participants, which usually have heterogeneous capabilities (screen
size, CPU, bandwidth, codec, etc.). One of the biggest issues is how to
perform RTP stream adaptation to different participants' constraints
with the minimum possible impact on both video quality and server
performance.Simulcast is defined in this memo as the act of simultaneously
sending multiple different encoded streams of the same media source --
e.g., the same video source encoded with different video-encoder types or
image resolutions. This can be done in several ways and for different
purposes. This document focuses on the case where it is desirable to
provide a media source as multiple encoded streams over RTP towards an intermediary so that the
intermediary can provide the wanted functionality by selecting which RTP
stream(s) to forward to other participants in the session, and more
specifically how the identification and grouping of the involved RTP
streams are done.The intended scope of the defined mechanism is to support negotiation
and usage of simulcast when using SDP offer/answer and media transport
over RTP. The media transport topologies considered are point-to-point
RTP sessions, as well as centralized multiparty RTP sessions, where a
media sender will provide the simulcasted streams to an RTP middlebox or
endpoint, and middleboxes may further distribute the simulcast streams
to other middleboxes or endpoints. Simulcast could be used point to point between
middleboxes as part of a distributed multiparty scenario. Usage of
multicast or broadcast transport is out of scope
and left for future extensions.This document describes a few scenarios that motivate the use of
simulcast and also defines the needed RTP/RTCP and SDP signaling for
it.DefinitionsTerminologyThis document makes use of the terminology defined in "A Taxonomy of Semantics and
Mechanisms for Real-Time
Transport Protocol (RTP) Sources" and "RTP Topologies". The following terms are
especially noted or here defined:
RTP mixer:
An RTP middlebox, in the wide sense of the term, encompassing
Sections
to of
.
RTP session:
An association among a group of
participants communicating with RTP, as defined in and amended by .
RTP stream:
A stream of RTP packets containing media
data, as defined in .
RTP switch:
A common short term for the terms
"switching RTP mixer", "source projecting middlebox", and "video
switching Multipoint Control Unit (MCU)", as discussed in .
Simulcast stream:
One encoded stream or dependent
stream from a set of concurrently transmitted encoded streams and
optional dependent streams, all sharing a common media source, as
defined in . For example, HD and thumbnail
video simulcast versions of a single media source sent
concurrently as separate RTP streams.
Simulcast format:
Different formats of a simulcast
stream serve the same purpose as alternative RTP payload types in
nonsimulcast SDP: to allow multiple alternative media formats for
a given RTP stream. As for multiple RTP payload types on the
"m=" line in offer/answer, any one of
the negotiated alternative formats can be used in a single RTP
stream at a given point in time, but not more than one (based on
RTP timestamp). What format is used can change dynamically from
one RTP packet to another.
Requirements Language
The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are
to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Use CasesThe use cases of simulcast described in this document relate to a
multiparty communication session where one or more central nodes are
used to adapt the view of the communication session towards individual
participants and facilitate the media transport between participants.
Thus, these cases target the RTP mixer type of topology.There are two principal approaches for an RTP mixer to provide this
adapted view of the communication session to each receiving
participant:
Transcoding (decoding and re-encoding) received RTP streams with
characteristics adapted to each receiving participant. This often
includes mixing or composition of media sources from multiple
participants into a mixed media source originated by the RTP mixer.
The main advantage of this approach is that it achieves
close-to-optimal adaptation to individual receiving
participants. The main
disadvantages are that it can be very computationally expensive to
the RTP mixer, typically degrades media Quality of Experience (QoE)
such as creating end-to-end delay for the receiving participants, and
requires the RTP mixer to have access to media content.
Switching a subset of all received RTP streams or substreams to
each receiving participant, where the used subset is typically
specific to each receiving participant. The main advantages of this
approach are that it is computationally cheap to the RTP mixer, has
very limited impact on media QoE, and does not require the RTP mixer
to have (full) access to media content. The main disadvantage is
that it can be difficult to combine a subset of received RTP streams into a
perfect fit for the resource situation of a receiving participant. It
is also a disadvantage that sending multiple RTP streams consumes
more network resources from the sending participant to the RTP
mixer.
The use of simulcast relates to the latter approach, where it is more
important to reduce the load on the RTP mixer and/or minimize QoE impact
than to achieve an optimal adaptation of resource usage.Reaching a Diverse Set of ReceiversThe media sources provided by a sending participant potentially
need to reach several receiving participants that differ in terms of
available resources. The receiver resources that typically differ
include, but are not limited to:
Codec:
This includes codec type (such as RTP payload
format MIME type) and can include codec configuration. A couple of
codec resources that differ only in codec configuration will be
"different" if they are somehow not "compatible", such as if they
differ in video codec profile or the transport packetization
configuration.
Sampling:
This relates to how the media source is
sampled, in spatial as well as temporal domain. For video
streams, spatial sampling affects image resolution, and temporal
sampling affects video frame rate. For audio, spatial sampling
relates to the number of audio channels, and temporal sampling
affects audio bandwidth. This may be used to suit different
rendering capabilities or needs at the receiving endpoints.
Bitrate:
This relates to the number of bits sent per
second to transmit the media source as an RTP stream, which
typically also affects the QoE for the receiving user.
Letting the sending participant create a simulcast of a few
differently configured RTP streams per media source can be a good
trade-off when using an RTP switch as middlebox, instead of sending a
single RTP stream and using an RTP mixer to create individual
transcodings to each receiving participant.This requires that the receiving participants can be categorized in
terms of available resources and that the sending participant can
choose a matching configuration for a single RTP stream per category
and media source. For example, a set of receiving participants differ
only in screen resolution; some are able to display video with at most
360p resolution, and some support 720p resolution. A sending
participant can then reach all receivers with best possible resolution
by creating a simulcast of RTP streams with 360p and 720p resolution
for each sent video media source.The maximum number of simulcasted RTP streams that can be sent is
mainly limited by the amount of processing and uplink network
resources available to the sending participant.Application-Specific Media Source HandlingThe application logic that controls the communication session may
include special handling of some media sources. It is, for example,
commonly the case that the media from a sending participant is not
sent back to itself.It is also common that a currently active speaker participant is
shown in larger size or higher quality than other participants (the
sampling or bitrate aspects of )
in a receiving client. Many conferencing systems do not send the
active speaker's media back to the sender itself, which means there is
some other participant's media that instead is forwarded to the active
speaker -- typically the previous active speaker. This way, the
previously active speaker is needed both in larger size (to current
active speaker) and in small size (to the rest of the participants),
which can be solved with a simulcast from the previously active
speaker to the RTP switch.Receiver Media-Source PreferencesThe application logic that controls the communication session may
allow receiving participants to state preferences on the
characteristics of the RTP stream they like to receive, for example in
terms of the aspects listed in .
Sending a simulcast of RTP streams is one way of accommodating
receivers with conflicting or otherwise incompatible preferences.OverviewThis memo defines SDP signaling that
covers the above described simulcast use cases and functionalities. A
number of requirements for such signaling are elaborated in .The Restriction Identifier (RID) mechanism, as defined in , enables an SDP offerer or answerer to
specify a number of different RTP stream restrictions for a rid-id by
using the "a=rid" line. Examples of such restrictions are maximum
bitrate, maximum spatial video resolution (width and height), maximum
video frame rate, etc. Each rid-id may also be restricted to use only a
subset of the RTP payload types in the associated SDP media description.
Those RTP payload types can have their own configurations and parameters
affecting what can be sent or received, using the "a=fmtp" line as well
as other SDP attributes.A new SDP media-level attribute, "a=simulcast", is defined. The
attribute describes, independently for "send" and "receive" directions, the
number of simulcast RTP streams as well as potential alternative formats
for each simulcast RTP stream. Each simulcast RTP stream, including
alternatives, is identified using the RID identifier (rid-id), defined
in .
a=simulcast:send 1;2,3 recv 4
If this line is included in an SDP offer, the "send" part
indicates the offerer's capability and proposal to send two simulcast
RTP streams. Each simulcast stream is described by one or more RTP
stream identifiers (rid-ids), and each group of rid-ids for a simulcast
stream is separated by a semicolon (";"). When a simulcast stream has
multiple rid-ids that are separated by a comma (","), they describe
alternative representations for that particular simulcast RTP stream.
Thus, the "send" part shown above is interpreted as an intention to send two
simulcast RTP streams. The first simulcast RTP stream is identified and
restricted according to rid-id 1. The second simulcast RTP stream can be
sent as two alternatives, identified and restricted according to rid-ids
2 and 3. The "recv" part of the line shown here indicates that the offerer
desires to receive a single RTP stream (no simulcast) according to
rid-id 4.A more complete example SDP-offer media description is provided
in .The SDP media description in can be
interpreted at a high level to
say that the offerer is capable of sending two simulcast RTP streams:
one H.264 encoded stream in up to 720p resolution, and one additional
stream encoded as either H.264 or VP8 with a maximum resolution of
320x180 pixels. The offerer can receive one H.264 stream with maximum
720p resolution.The receiver of this SDP offer can generate an SDP answer that
indicates what it accepts. It uses the "a=simulcast" attribute to
indicate simulcast capability and specify what simulcast RTP streams and
alternatives to receive and/or send. An example of such an answering
"a=simulcast" attribute, corresponding to the above offer, is:
a=simulcast:recv 1;2 send 4
With this SDP answer, the answerer indicates in the "recv" part that
it wants to receive the two simulcast RTP streams. It has removed an
alternative that it doesn't support (rid-id 3). The "send" part confirms
to the offerer that it will receive one stream for this media source
according to rid-id 4. The corresponding, more complete example SDP
answer media description could look like .It is assumed that a single SDP media description is used to describe
a single media source. This is aligned with the concepts defined in
and will work in a WebRTC context, both with
and without BUNDLE grouping of media descriptions .To summarize, the "a=simulcast" line describes "send"- and
"receive"-direction simulcast streams separately. Each direction can in
turn describe one or more simulcast streams, separated by semicolons. The
identifiers describing simulcast streams on the "a=simulcast" line are
rid-ids, as defined by "a=rid" lines in . Each simulcast stream can be offered as
a list of alternative rid-ids, with each alternative separated by a comma
as shown in the example offer in . A detailed specification can be found in
, and more detailed examples are outlined in
.Detailed DescriptionThis section provides further details to the overview in . First, formal syntax is provided, followed by the rest of the SDP
attribute definition in . "Relating Simulcast Streams" provides the
definition of the RTP/RTCP mechanisms used. The section concludes
with a number of examples.Simulcast AttributeThis document defines a new SDP media-level "a=simulcast"
attribute, with value according to the syntax in , which uses ABNF and its update, "Case-Sensitive String Support in ABNF":The "a=simulcast" attribute has a parameter in the form of one or
two simulcast stream descriptions, each consisting of a direction
("send" or "recv"), followed by a list of one or more simulcast
streams. Each simulcast stream consists of one or more alternative
simulcast formats. Each simulcast format is identified by a simulcast
stream identifier (rid-id). The rid-id MUST have the form of an RTP
stream identifier, as described by "RTP Payload Format Restrictions".In the list of simulcast streams, each simulcast stream is
separated by a semicolon (";"). Each simulcast stream can, in turn, be
offered in one or more alternative formats, represented by rid-ids,
separated by commas (","). Each rid-id can also be specified as
initially paused, indicated by
prepending a "~" to the rid-id. The reason to allow separate initial
pause states for each rid-id is that pause capability can be specified
individually for each RTP payload type referenced by a rid-id. Since
pause capability specified via the "a=rtcp-fb" attribute applies only
to specified payload types, and a rid-id specified by "a=rid" can refer
to multiple different payload types, it is unfeasible to pause streams
with rid-id where any of the related RTP payload type(s) do not have
pause capability.Simulcast CapabilitySimulcast capability is expressed through a new media-level SDP attribute, "a=simulcast". The use of this
attribute at the session level is undefined. Implementations of this
specification MUST NOT use it at the session level and MUST ignore it
if received at the session level. Extensions to this specification may
define such session-level usage. Each SDP media description MUST
contain at most one "a=simulcast" line.There are separate and independent sets of simulcast streams in the
"send" and "receive" directions. When listing multiple directions, each
direction MUST NOT occur more than once on the same line.Simulcast streams using undefined rid-ids MUST NOT be used as valid
simulcast streams by an RTP stream receiver. The direction for a
rid-id MUST be aligned with the direction specified for the
corresponding RTP stream identifier on the "a=rid" line.The listed number of simulcast streams for a direction sets a limit
to the number of supported simulcast streams in that direction. The
order of the listed simulcast streams in the "send" direction suggests
a proposed order of preference, in decreasing order: the rid-id listed
first is the most preferred, and subsequent streams have progressively
lower preference. The order of the listed rid-ids in the "recv"
direction expresses which simulcast streams are preferred, with
the leftmost being most preferred. This can be of importance if the
number of actually sent simulcast streams has to be reduced for some
reason.rid-ids that have explicit dependencies to other rid-ids (even in the same media
description) MAY be used.Use of more than a single, alternative simulcast format for a
simulcast stream MAY be specified as part of the
attribute parameters by expressing the simulcast stream as a
comma-separated list of alternative rid-ids. The order of the rid-id
alternatives within a simulcast stream is significant; the rid-id
alternatives are listed from (left) most preferred to (right) least
preferred. For the use of simulcast, this overrides the normal codec
preference as expressed by format-type ordering on the "m=" line,
using regular SDP rules. This is to enable a separation of general
codec preferences and simulcast-stream configuration
preferences. However, the choice of which alternative to use per
simulcast stream is independent, and there is currently no mechanism
for the offerer to force the answerer to choose the same alternative
for multiple simulcast streams.
A simulcast stream can use a codec defined such that the same RTP
synchronization source (SSRC) can change RTP payload type multiple
times during a session, possibly even on a per-packet basis. A typical
example is a speech codec that makes use of formats for Comfort Noise and/or dual-tone multifrequency
(DTMF).If RTP stream
pause/resume is supported, any rid-id MAY be
prefixed by a "~" character to indicate that the corresponding
simulcast stream is paused already from the start of the RTP
session. In this case, support for RTP stream pause/resume
MUST also be included under the same "m=" line where
"a=simulcast" is included. All RTP payload types related to such an
initially paused simulcast stream MUST be listed in the
SDP as pause/resume capable as specified by -- e.g., by using the "*" wildcard format for
"a=rtcp-fb".An initially paused simulcast stream in the "send" direction for the
endpoint sending the SDP MUST be considered equivalent to an
unsolicited locally paused stream and handled accordingly.
Initially paused simulcast streams are resumed as described by the RTP
pause/resume specification. An RTP stream receiver that wishes to
resume an unsolicited locally paused stream needs to know the SSRC of
that stream.
The SSRC of an initially paused simulcast stream can be obtained from
an RTP stream sender RTCP Sender Report (SR) or Receiver Report (RR)
that includes both the desired SSRC as initial SSRC in the source
description (SDES) chunk, optionally a MID SDES item (if used and if rid-ids are not
unique across "m=" lines), and the rid-id value in an RtpStreamId RTCP SDES
item.If the endpoint sending the SDP includes a "recv"-direction
simulcast stream that is initially paused, then the remote RTP sender
receiving the SDP SHOULD put its RTP stream in an unsolicited locally
paused state. The simulcast stream sender does not put the stream in
the locally paused state if there are other RTP stream receivers in
the session that do not mark the simulcast stream as initially paused.
However, in centralized conferencing, the RTP sender usually does not
see the SDP signaling from RTP receivers and cannot make this
determination. The reason for requiring that an initially paused "recv" stream
be considered locally paused by the remote RTP sender instead of
making it equivalent to implicitly sending a pause request is that
the pausing RTP sender cannot know which receiving SSRC owns the
restriction when Temporary Maximum Media Stream Bit Rate Request
(TMMBR) and Temporary Maximum Media Stream Bit Rate Notification
(TMMBN) are used for pause/resume signaling (); this is because the RTP
receiver's SSRC
in the "send" direction is sometimes not yet known.Use of the redundant audio data format
could be seen as a form of simulcast for loss-protection
purposes, but it is not considered conflicting with the mechanisms
described in this memo and MAY therefore be used as any other format.
In this case, the "red" format, rather than the carried formats, SHOULD
be the one to list as a simulcast stream on the "a=simulcast"
line.The media formats and corresponding characteristics of simulcast
streams SHOULD be chosen such that they are different -- e.g., as
different SDP formats with differing "a=rtpmap" and/or "a=fmtp" lines,
or as differently defined RTP payload format restrictions. If this
difference is not required, it is RECOMMENDED to use RTP duplication
procedures instead of simulcast. To avoid
complications in implementations, a single rid-id
MUST NOT occur more than once per "a=simulcast" line. Note that this
does not eliminate use of simulcast as an RTP duplication mechanism,
since it is possible to define multiple different rid-ids that are
effectively equivalent.Offer/Answer Use
Note:
The inclusion of "a=simulcast" or the use of simulcast
does not change any of the interpretation or Offer/Answer
procedures for other SDP attributes, such as "a=fmtp" or "a=rid".
Generating the Initial SDP OfferAn offerer wanting to use simulcast for a media description SHALL
include one "a=simulcast" attribute in that media description in the
offer. An offerer listing a set of receive simulcast streams and/or
alternative formats as rid-ids in the offer MUST be prepared to
receive RTP streams for any of those simulcast streams and/or
alternative formats from the answerer.Creating the SDP AnswerAn answerer that does not understand the concept of simulcast
will also not know the attribute and will remove it in the SDP
answer, as defined in existing SDP offer/answer procedures . Since SDP session-level simulcast is
undefined in this memo, an answerer that receives an offer with the
"a=simulcast" attribute on the SDP session level SHALL remove it in the
answer. An answerer that understands the attribute but receives
multiple "a=simulcast" attributes in the same media description
SHALL disable use of simulcast by removing all "a=simulcast" lines
for that media description in the answer.An answerer that does understand the attribute and wants to
support simulcast in an indicated direction SHALL reverse
directionality of the unidirectional direction parameters -- "send"
becomes "recv" and vice versa -- and include it in the answer.An answerer that receives an offer with simulcast containing an
"a=simulcast" attribute listing alternative rid-ids MAY keep all the
alternative rid-ids in the answer, but it MAY also choose to remove
any nondesirable alternative rid-ids in the answer. The answerer
MUST NOT add any alternative rid-ids in the "send" direction in the answer
that were not present in the offer receive direction. The answerer
MUST be prepared to receive any of the receive-direction rid-id
alternatives and MAY send any of the "send"-direction alternatives
that are part of the answer.An answerer that receives an offer with simulcast that lists a
number of simulcast streams MAY reduce the number of simulcast
streams in the answer, but it MUST NOT add simulcast streams.An answerer that receives an offer without RTP stream
pause/resume capability MUST NOT mark any simulcast streams as
initially paused in the answer.An RTP stream answerer capable of pause/resume that receives an
offer with RTP stream pause/resume capability MAY mark any rid-ids
that refer to pause/resume capable formats as initially paused in
the answer.An answerer that receives indication in an offer of a rid-id
being initially paused SHOULD mark that rid-id as initially paused
also in the answer, regardless of direction, unless it has good
reason for the rid-id not being initially paused. One reason to
remove an initial pause in the answer compared to the offer could be,
for example, that all "receive"-direction simulcast streams for a
media source the answerer accepts in the answer would otherwise be
paused.Offerer Processing the SDP AnswerAn offerer that receives an answer without "a=simulcast" MUST NOT
use simulcast towards the answerer. An offerer that receives an
answer with "a=simulcast" without any rid-id in a specified
direction MUST NOT use simulcast in that direction.An offerer that receives an answer where some rid-id alternatives
are kept MUST be prepared to receive any of the kept "send"-direction
rid-id alternatives and MAY send any of the kept "receive"-direction
rid-id alternatives.An offerer that receives an answer where some of the rid-ids are
removed compared to the offer MAY release the corresponding
resources (codec, transport, etc) in its "receive" direction and MUST NOT send any RTP packets corresponding to the removed rid-ids.An offerer that offered some of its rid-ids as initially paused
and receives an answer that does not indicate RTP stream
pause/resume capability MUST NOT initially pause any simulcast
streams.An offerer with RTP stream pause/resume capability that receives
an answer where some rid-ids are marked as initially paused SHOULD
initially pause those RTP streams, even if they were marked as
initially paused also in the offer, unless it has good reason for
those RTP streams not being initially paused. One such reason could be,
for example, that the answerer would otherwise initially not
receive any media of that type at all.Modifying the SessionOffers inside an existing session follow the same rules as for
initial SDP offer, with these additions:
rid-ids marked as initially paused in the offerer's "send"
direction SHALL reflect the offerer's opinion of the current
pause state at the time of creating the offer. This is purely
informational, and RTP stream
pause/resume signaling in the ongoing
session SHALL take precedence in case of any conflict or
ambiguity.
rid-ids marked as initially paused in the offerer's "receive"
direction SHALL (as in an initial offer) reflect the offerer's
desired rid-id pause state. Except for the case where the
offerer already paused the corresponding RTP stream through
RTP stream pause/resume signaling,
this is identical to the conditions at an initial offer.
Creation of SDP answers and processing of SDP answers inside an
existing session follow the same rules as described above for
initial SDP offer/answer.Session modification restrictions in "RTP Payload Format
Restrictions"
also apply.Use with Declarative SDPThis document does not define the use of "a=simulcast" in
declarative SDP, partly because use of the simulcast format identification
is not defined for use in declarative SDP. If concrete use cases
for simulcast in declarative SDP are identified in the future, the
authors of this memo expect that additional specifications will
address such use.Relating Simulcast StreamsSimulcast RTP streams MUST be related on the RTP
level through RtpStreamId, as specified in the
SDP "a=simulcast" attribute
parameters.
This is sufficient as long as there is only a single media source per
SDP media description. When using BUNDLE, where
multiple SDP media descriptions jointly specify a single RTP session,
the SDES MID (Media Identification) mechanism in BUNDLE allows relating RTP
streams back to individual media descriptions, after which the
RtpStreamId relations described above can be used.
Use of the RTP header extension for the RTCP
source description items for both MID
and RtpStreamId identifications can be important to ensure rapid
initial reception, required to correctly interpret and process the RTP
streams. Implementers of this specification MUST
support the RTCP source description (SDES) item method and
SHOULD support RTP header extension method to signal
RtpStreamId on the RTP level.
NOTE:
For the case where it is clear from SDP that the
RTP PT uniquely maps to a corresponding RtpStreamId, an RTP receiver
can use RTP PT to relate simulcast streams. This can sometimes
enable decoding even in advance of receiving RtpStreamId
information in RTCP SDES and/or RTP header extensions.
RTP streams MUST only use a single alternative rid-id at a time
(based on RTP timestamps) but MAY change format (and rid-id) on a
per-RTP packet basis. This corresponds to the existing (nonsimulcast)
SDP offer/answer case when multiple formats are included on the "m="
line in the SDP answer, enabling per-RTP packet change of RTP payload
type.Signaling ExamplesThese examples describe a client-to-video-conference service, using
a centralized media topology with an RTP mixer.Single-Source ClientAlice is calling in to the mixer with a simulcast-enabled client
capable of a single media source per media type. The client can send
a simulcast of 2 video resolutions and frame rates: HD 1280x720p
30fps and thumbnail 320x180p 15fps. This is defined below using the
"imageattr". In this example, only the
"pt" "a=rid" parameter is used to
describe simulcast stream formats, effectively achieving a 1:1 mapping
between RtpStreamId and media formats (RTP payload types). Alice's Offer:The only thing in the SDP that indicates simulcast capability is
the line in the video media description containing the "simulcast"
attribute. The included "a=fmtp" and "a=imageattr" parameters
indicate that sent simulcast streams can differ in video
resolution. The RTP header extension for RtpStreamId is offered to
avoid issues with the initial binding between RTP streams (SSRCs)
and the RtpStreamId identifying the simulcast stream and its
format.The answer from the server indicates that it, too, is simulcast
capable. Should it not have been simulcast capable, the
"a=simulcast" line would not have been present, and communication
would have started with the media negotiated in the SDP. Also, the
usage of the RtpStreamId RTP header extension is accepted.Since the server is the simulcast media receiver, it reverses the
direction of the "simulcast" and "rid" attribute parameters.Multisource ClientFred is calling in to the same conference as in the example above
with a two-camera, two-display system, thus capable of handling two
separate media sources in each direction, where each media source is
simulcast enabled in the "send" direction. Fred's client is restricted
to a single media source per media description.The first two simulcast streams for the first media source use
different codecs, H264-SVC and H264. These two simulcast streams also have
a temporal dependency. Two different video codecs, VP8 and H264, are offered as alternatives
for the third simulcast stream for the first media source. Only the
highest-fidelity simulcast stream is sent from start, the
lower-fidelity streams being initially paused.The second media source is offered with three different simulcast
streams. All video streams of this second media source are loss
protected by RTP retransmission. In
addition, all but the highest-fidelity simulcast stream are
initially paused. Note that the lower resolution is more prioritized
than the medium-resolution simulcast stream.Fred's client is also using BUNDLE to send all RTP streams from
all media descriptions in the same RTP session on a single media
transport. Although using many different simulcast streams in this
example, the use of RtpStreamId as simulcast stream identification
enables use of a low number of RTP payload types.
Note that when using both BUNDLE and "a=rid", it is recommended to use the RTP
header extension for the RTCP
source descriptions items for carrying
these RTP stream-identification fields, which is consequently also
included in the SDP.
Note also that for "a=rid",
the corresponding RtpStreamId SDES attribute RTP header extension is
named rtp-stream-id.Simulcast and RedundancyThe example in this section looks at applying simulcast with
audio and video redundancy formats.
The audio media description uses codec and bitrate restrictions,
combined with the RTP
payload for redundant audio data for enhanced packet-loss
resilience. The video media description applies both resolution and
bitrate restrictions, combined with Forward Error Correction (FEC)
in the form of flexible
FEC and RTP
retransmission.
The audio source is offered to be sent as two simulcast
streams. The first simulcast stream is encoded with Opus,
restricted to 64 kbps (rid-id=1), and the second simulcast stream
(rid-id=2) is encoded with either G.711, or G.711 combined with
linear predictive coding (LPC) for redundancy and explicit comfort
noise (CN). Both simulcast streams include telephone-event
capability. In this example, stand-alone LPC is not offered as a
possible payload type for the second simulcast stream's RID, which
could be motivated by, for example, not providing sufficient
quality.
The video source is offered to be sent as two simulcast streams,
both with two alternative simulcast formats. Redundancy and repair
are offered in the form of both flexible FEC and RTP retransmission.
The flexible FEC is not bound to any particular RTP streams and is
therefore able to be used across all RTP streams that are being sent
as part of this media description.RTP AspectsThis section discusses what the different entities in a simulcast
media path can expect to happen on the RTP level. This is explored from
source to sink by starting in an endpoint with a media source that is
simulcasted to an RTP middlebox. That RTP middlebox sends media sources
to other RTP middleboxes (cascaded middleboxes), as well as
selecting some simulcast format of the media source and sending it to
receiving endpoints. Different types of RTP middleboxes and their usage
of the different simulcast formats results in several different
behaviors.Outgoing from Endpoint with Media SourceThe most straightforward simulcast case is the RTP streams being
emitted from the endpoint that originates a media source. When
simulcast has been negotiated in the sending direction, the endpoint
can transmit up to the number of RTP streams needed for the negotiated
simulcast streams for that media source. Each RTP stream (SSRC) is
identified by associating it () with
an RtpStreamId SDES item, transmitted in RTCP and possibly also as an
RTP header extension. In cases where multiple media sources have been
negotiated for the same RTP session and thus BUNDLE is used, the MID SDES item will also be
sent, similarly to the RtpStreamId.Each RTP stream might not be continuously transmitted due to any of
the following reasons: temporarily paused using Pause/Resume, sender-side application logic
temporarily pausing it, or lack of network resources to transmit this
simulcast stream. However, all simulcast streams that have been
negotiated have active and maintained SSRCs (at least in regular RTCP
reports), even if no RTP packets are currently transmitted. The
relation between an RTP stream (SSRC) and a particular simulcast
stream is not expected to change, except in exceptional situations
such as SSRC collisions. At SSRC changes, the usage of MID and
RtpStreamId should enable the receiver to correctly identify the RTP
streams even after an SSRC change.RTP Middlebox to ReceiverRTP streams in a multiparty RTP session can be used in multiple
different ways when the session utilizes simulcast at least on the
media-source-to-middlebox legs. This is to a large degree due to the
different RTP middlebox behaviors, but also the needs of the
application. This text assumes that the RTP middlebox will select a
media source and choose which simulcast stream for that media source
to deliver to a specific receiver. In many cases, at most one
simulcast stream per media source will be forwarded to a particular
receiver at any instant in time, even if the selected simulcast stream
may vary. For cases where this does not hold due to application needs,
the RTP stream aspects will fall under the middlebox-to-middlebox
case ().The selection of which simulcast streams to forward towards the
receiver is application specific. However, in conferencing
applications, active speaker selection is common. In case the number
of media sources possible to forward, N, is less than the total number
of media sources available in a multimedia session, the current and
previous speakers (up to N in total) are often the ones forwarded. To
avoid the need for media-specific processing to determine the current
speaker(s) in the RTP middlebox, the endpoint providing a media source
may include metadata, such as the RTP header
extension for client-to-mixer audio level indication.The possibilities for stream switching are media type specific, but
for media types with significant interframe dependencies in the
encoding, like most video coding, the switching needs to be made at
suitable switching points in the media stream that breaks or otherwise
deals with the dependency structure. Even if switching points can be
included periodically, it is common to use mechanisms like Full Intra Requests to request switching
points from the endpoint performing the encoding of the media
source.Inclusion of the RtpStreamId SDES item for an SSRC in the
middlebox-to-receiver direction should only occur when use of
RtpStreamId has
been negotiated in that direction. It is worth noting that one can
signal multiple RtpStreamIds when simulcast signaling indicates only
a single simulcast stream, allowing one to use all of the RtpStreamIds
as alternatives for that simulcast stream. One reason for including
the RtpStreamId in the middlebox-to-receiver direction for an RTP
stream is to let the receiver know which restrictions apply to the
currently delivered RTP stream. In case the RtpStreamId is negotiated
to be used, it is important to remember that the used identifiers will
be specific to each signaling session. Even if the central entity can
attempt to coordinate, it is likely that the RtpStreamIds need to be
translated to the leg-specific values. The below cases will assume
that RtpStreamId is not used in the mixer to receiver
direction.Media-Switching MixerThis section discusses the behavior in cases where the RTP
middlebox behaves like the media-switching mixer in
RTP topologies (). The
fundamental aspect
here is that the media sources delivered from the middlebox will be
the mixer's conceptual or functional ones. For example, one media
source may be the main speaker in high-resolution video, while a
number of other media sources are thumbnails of each
participant.The above results in the RTP stream produced by the mixer being
one that switches between a number of received incoming RTP streams
for different media sources and in different simulcast versions. The
mixer selects the media source to be sent as one of the RTP streams
and then selects among the available simulcast streams for the most
appropriate one. The selection criteria include available bandwidth
on the mixer-to-receiver path and restrictions based on the
functional usage of the RTP stream delivered to the receiver. As an
example of the latter, it is unnecessary to forward a full HD video
to a receiver if the display area is just a thumbnail. Thus,
restrictions may exist to not allow some simulcast streams to be
forwarded for some of the mixer's media sources.This will result in a single RTP stream being used for each of
the RTP mixer's media sources. At any point in time, this RTP stream
is a selection of one particular RTP stream arriving to the mixer,
where the RTP header-field values are rewritten to provide a
consistent, single RTP stream. If the RTP mixer doesn't receive any
incoming stream matched to this media source, the SSRC will not
transmit but be kept alive using RTCP. The SSRC and thus RTP stream
for the mixer's media source is expected to be long-term stable. It
will only be changed by signaling or other disruptive events. Note
that although the above talks about a single RTP stream, there can
in some cases be multiple RTP streams carrying the selected
simulcast stream for the originating media source, including
redundancy or other auxiliary RTP streams.The mixer may communicate the identity of the originating media
source to the receiver by including the Contributing Source (CSRC) field with the
originating media source's SSRC value. Note that due to the
possibility that the RTP mixer switches between simulcast versions
of the media source, the CSRC value may change, even if the media
source is kept the same.It is important to note that any MID SDES item from the
originating media source needs to be removed and not be associated
with the RTP stream's SSRC. That is, there is nothing in the
signaling between the mixer and the receiver that is structured
around the originating media sources, only the mixer's media
sources. If they were associated with the SSRC, the receiver
would likely believe that there has been an SSRC collision and
the RTP stream is spurious, because it doesn't carry the identifiers used
to relate it to the correct context. However, this is not true for
CSRC values, as long as they are never used as an SSRC. In these cases,
one could provide CNAME and MID as SDES items. A receiver could use
this to determine which CSRC values that are associated with the
same originating media source.If RtpStreamIds are used in the scenario described by this
section, it should be noted that the RtpStreamId on a particular
SSRC will change based on the actual simulcast stream selected for
switching. These RtpStreamId identifiers will be local to this leg's
signaling context. In addition, the defined RtpStreamIds and their
parameters need to cover all the media sources and simulcast streams
received by the RTP mixer that can be switched into this media
source, sent by the RTP mixer.Selective Forwarding MiddleboxThis section discusses the behavior in cases where the RTP
middlebox behaves like the Selective Forwarding Middlebox in RTP
topologies (). Applications
for this type of RTP middlebox result in each originating
media source having a corresponding media source on the leg
between the middlebox and the receiver. A Selective Forwarding
Middlebox (SFM) could go as far as exposing all the simulcast
streams for a media source; however, this section will focus on
having a single simulcast stream that can contain any of the
simulcast formats. This section will assume that the SFM projection
mechanism works on the media-source level and maps one of the media
source's simulcast streams onto one RTP stream from the SFM to the
receiver.This usage will result in the individual RTP stream(s) for
one media source being able to switch between being active and
paused, based on
the subset of media sources the SFM wants to provide the receiver
for the moment. With SFMs, there exist no reasons to use CSRC to
indicate the originating stream, as there is a one-to-one
media-source mapping. If the application requires knowing the
simulcast
version received to function well, then RtpStreamId should be
negotiated on the SFM to receiver leg. Which simulcast stream that
is being forwarded is not made explicit unless RtpStreamId is used
on the leg.Any MID SDES items being sent by the SFM to the receiver are only
those agreed between the SFM and the receiver, and no MID values
from the originating side of the SFM are to be forwarded.An SFM could expose corresponding RTP streams for all the media
sources and their simulcast streams and then, for any media source
that is to be provided, forward one selected simulcast stream.
However, this is not recommended, as it would unnecessarily increase
the number of RTP streams and require the receiver to timely detect
switching between simulcast streams. The above usage requires the
same SFM functionality for switching, while avoiding the
uncertainties of timely detecting that an RTP stream ends. The
benefit would be that the received simulcast stream would be
implicitly provided by which RTP stream would be active for a media
source. However, using RtpStreamId to make this explicit also
exposes which alternative format is used. The conclusion is that
using one RTP stream per simulcast stream is unnecessary. The issue
with timely detecting end of streams, independent of whether they are
stopped temporarily or long term, is that there is no explicit
indication that the transmission has intentionally been stopped. The
RTCP-based pause and resume
mechanism
includes a PAUSED indication that provides the last RTP sequence
number transmitted prior to the pause. Due to usage, the timeliness
of this solution depends on when delivery using RTCP can occur in
relation to the transmission of the last RTP packet. If no explicit
information is provided at all, then detection based on
nonincreasing RTCP SR field values and timers need to be used to
determine pause in RTP packet delivery. As a result, when the last
RTP packet arrives (if it arrives), one usually
cannot determine that this will be the last. That it was the last is
something that one learns later.RTP Middlebox to RTP MiddleboxThis relates to the transmission of simulcast streams between RTP
middleboxes or other usages where one wants to enable the delivery of
multiple simultaneous simulcast streams per media source, but the
transmitting entity is not the originating endpoint. For a particular
direction between middleboxes A and B, this looks very similar to the
originating-to-middlebox case on a media-source basis. However, in
this case, there are usually multiple media sources, originating from
multiple endpoints. This can create situations where limitations in
the number of simultaneously received media streams can arise -- for
example, due to limitation in network bandwidth. In this case, a subset
of not only the simulcast streams but also media sources can be
selected. As a result, individual RTP streams can become
paused at any point and later be resumed based on various criteria.The MIDs used between A and B are the ones agreed between these two
identities in signaling. The RtpStreamId values will also be provided
to ensure explicit information about which simulcast stream they are.
The RTP-stream-to-MID and -RtpStreamId associations should here be
long-term stable.Network AspectsSimulcast is in this memo defined as the act of sending multiple
alternative encoded streams of the same underlying media
source. Transmitting multiple independent streams that originate from
the same
source could potentially be done in several different ways using
RTP. A general discussion on considerations for use of the different RTP
multiplexing alternatives can be found in "Guidelines for Using the Multiplexing Features of
RTP to Support Multiple Media Streams". Discussion and
clarification on how to handle multiple streams in an RTP session can be
found in .The network aspects that are relevant for simulcast are:
Quality of Service (QoS):
When using simulcast, it might be
of interest to prioritize a particular simulcast stream, rather than
applying equal treatment to all streams. For example, lower-bitrate
streams may be prioritized over higher-bitrate streams to minimize
congestion or packet losses in the low-bitrate streams. Thus, there
is a benefit to using a simulcast solution with good QoS support.
Using multiple RTP sessions incurs
more cost for NAT/FW traversal unless they can reuse the same
transport flow, which can be achieved by multiplexing negotiation using SDP port
numbers.
Bitrate AdaptationUse of multiple simulcast streams can require a significant amount
of network resources. The aggregate bandwidth for all simulcast
streams for a media source (and thus SDP media description) is bounded
by any SDP "b=" line applicable to that media source. It is assumed
that a suitable congestion-control mechanism is used by the
application to ensure that it doesn't cause persistent congestion. If
the amount of available network resources varies during an RTP session
such that it does not match what is negotiated in SDP, the bitrate
used by the different simulcast streams may have to be reduced
dynamically. When a simulcasting media source uses a single media
transport for all of the simulcast streams, it is likely that a joint
congestion control across all simulcast streams is used for that media
source. What simulcast streams to prioritize when allocating available
bitrate among the simulcast streams in such adaptation SHOULD be taken
from the simulcast stream order on the "a=simulcast" line and ordering
of alternative simulcast formats (). Simulcast
streams that have pause/resume capability and that would be given such
low bitrate by the adaptation process that they are considered not
really useful can be temporarily paused until the limiting condition
clears.LimitationThe chosen approach has a limitation that relates to the use of a
single RTP session for all simulcast formats of a media source, which
comes from sending all simulcast streams related to a media source under
the same SDP media description.It is not possible to use different simulcast streams on different
media transports, which limits the possibilities for applying different QoS to
different simulcast streams. When using unicast, QoS mechanisms based on
individual packet marking are feasible, since they do not require
separation of simulcast streams into different RTP sessions to apply
different QoS.It is also not possible to separate different simulcast streams into
different multicast groups to allow a multicast receiver to pick the
stream it wants, rather than receive all of them. In this case, the only
reasonable implementation is to use different RTP sessions for each
multicast group so that reporting and other RTCP functions operate as
intended. Such simulcast usage in a multicast context is out of scope for
the current document and would require additional specification.IANA ConsiderationsThis document registers a new media-level SDP attribute,
"simulcast", in the "att-field (media level only)" registry within the
"Session Description Protocol (SDP) Parameters" registry, according to the
procedures of and .
Contact name, email:
The IESG (iesg@ietf.org)
Attribute name:
simulcast
Long-form attribute name:
Simulcast stream description
Charset dependent:
No
Attribute value:
sc-value; see of RFC
8853.
Purpose:
Signals simulcast capability for a set of RTP
streams
Mux category:
NORMAL
Security ConsiderationsThe simulcast capability, configuration attributes, and parameters
are vulnerable to attacks in signaling.A false inclusion of the "a=simulcast" attribute may result in
simultaneous transmission of multiple RTP streams that would otherwise
not be generated. The impact is limited by the media description joint
bandwidth, shared by all simulcast streams irrespective of their number.
However, there may be a large number of unwanted RTP streams that will
impact the share of bandwidth allocated for the originally wanted RTP
stream.A hostile removal of the "a=simulcast" attribute will result in
simulcast not being used.
Integrity protection and source authentication of all SDP signaling,
including simulcast attributes, can mitigate the risks of such attacks
that attempt to alter signaling.
Security considerations related to the use of "a=rid" and the
RtpStreamId SDES item are covered in
and . There are no additional
security concerns related to their use in this specification.ReferencesNormative ReferencesKey words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.An Offer/Answer Model with Session Description Protocol (SDP)This document defines a mechanism by which two entities can make use of the Session Description Protocol (SDP) to arrive at a common view of a multimedia session between them. In the model, one participant offers the other a description of the desired session from their perspective, and the other participant answers with the desired session from their perspective. This offer/answer model is most useful in unicast sessions where information from both participants is needed for the complete view of the session. The offer/answer model is used by protocols like the Session Initiation Protocol (SIP). [STANDARDS-TRACK]RTP: A Transport Protocol for Real-Time ApplicationsThis memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of- service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers. Most of the text in this memorandum is identical to RFC 1889 which it obsoletes. There are no changes in the packet formats on the wire, only changes to the rules and algorithms governing how the protocol is used. The biggest change is an enhancement to the scalable timer algorithm for calculating when to send RTCP packets in order to minimize transmission in excess of the intended rate when many participants join a session simultaneously. [STANDARDS-TRACK]SDP: Session Description ProtocolThis memo defines the Session Description Protocol (SDP). SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session initiation. [STANDARDS-TRACK]Augmented BNF for Syntax Specifications: ABNFInternet technical specifications often need to define a formal syntax. Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power. The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges. This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications. [STANDARDS-TRACK]Case-Sensitive String Support in ABNFThis document extends the base definition of ABNF (Augmented Backus-Naur Form) to include a way to specify US-ASCII string literals that are matched in a case-sensitive manner.RTP Stream Pause and ResumeWith the increased popularity of real-time multimedia applications, it is desirable to provide good control of resource usage, and users also demand more control over communication sessions. This document describes how a receiver in a multimedia conversation can pause and resume incoming data from a sender by sending real-time feedback messages when using the Real-time Transport Protocol (RTP) for real- time data transport. This document extends the Codec Control Message (CCM) RTP Control Protocol (RTCP) feedback package by explicitly allowing and describing specific use of existing CCMs and adding a group of new real-time feedback messages used to pause and resume RTP data streams. This document updates RFC 5104.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.Negotiating Media Multiplexing Using the Session Description Protocol (SDP)RTP Payload Format RestrictionsRTP Stream Identifier Source Description (SDES)A Framework for Session Description Protocol (SDP) Attributes When MultiplexingInformative ReferencesRTP Payload for Redundant Audio DataThis document describes a payload format for use with the real-time transport protocol (RTP), version 2, for encoding redundant audio data. [STANDARDS-TRACK]Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)RTP Retransmission Payload FormatRTP retransmission is an effective packet loss recovery technique for real-time applications with relaxed delay bounds. This document describes an RTP payload format for performing retransmissions. Retransmitted RTP packets are sent in a separate stream from the original RTP stream. It is assumed that feedback from receivers to senders is available. In particular, it is assumed that Real-time Transport Control Protocol (RTCP) feedback as defined in the extended RTP profile for RTCP-based feedback (denoted RTP/AVPF) is available in this memo. [STANDARDS-TRACK]RTP Payload for DTMF Digits, Telephony Tones, and Telephony SignalsThis memo describes how to carry dual-tone multifrequency (DTMF) signalling, other tone signals, and telephony events in RTP packets. It obsoletes RFC 2833.This memo captures and expands upon the basic framework defined in RFC 2833, but retains only the most basic event codes. It sets up an IANA registry to which other event code assignments may be added. Companion documents add event codes to this registry relating to modem, fax, text telephony, and channel-associated signalling events. The remainder of the event codes defined in RFC 2833 are conditionally reserved in case other documents revive their use.This document provides a number of clarifications to the original document. However, it specifically differs from RFC 2833 by removing the requirement that all compliant implementations support the DTMF events. Instead, compliant implementations taking part in out-of-band negotiations of media stream content indicate what events they support. This memo adds three new procedures to the RFC 2833 framework: subdivision of long events into segments, reporting of multiple events in a single packet, and the concept and reporting of state events. [STANDARDS-TRACK]Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)This document specifies a few extensions to the messages defined in the Audio-Visual Profile with Feedback (AVPF). They are helpful primarily in conversational multimedia scenarios where centralized multipoint functionalities are in use. However, some are also usable in smaller multicast environments and point-to-point calls.The extensions discussed are messages related to the ITU-T Rec. H.271 Video Back Channel, Full Intra Request, Temporary Maximum Media Stream Bit Rate, and Temporal-Spatial Trade-off. [STANDARDS-TRACK]RTP Payload Format for Generic Forward Error CorrectionThis document specifies a payload format for generic Forward Error Correction (FEC) for media data encapsulated in RTP. It is based on the exclusive-or (parity) operation. The payload format described in this document allows end systems to apply protection using various protection lengths and levels, in addition to using various protection group sizes to adapt to different media and channel characteristics. It enables complete recovery of the protected packets or partial recovery of the critical parts of the payload depending on the packet loss situation. This scheme is completely compatible with non-FEC-capable hosts, so the receivers in a multicast group that do not implement FEC can still work by simply ignoring the protection data. This specification obsoletes RFC 2733 and RFC 3009. The FEC specified in this document is not backward compatible with RFC 2733 and RFC 3009. [STANDARDS-TRACK]Signaling Media Decoding Dependency in the Session Description Protocol (SDP)This memo defines semantics that allow for signaling the decoding dependency of different media descriptions with the same media type in the Session Description Protocol (SDP). This is required, for example, if media data is separated and transported in different network streams as a result of the use of a layered or multiple descriptive media coding process.A new grouping type "DDP" -- decoding dependency -- is defined, to be used in conjunction with RFC 3388 entitled "Grouping of Media Lines in the Session Description Protocol". In addition, an attribute is specified describing the relationship of the media streams in a "DDP" group indicated by media identification attribute(s) and media format description(s). [STANDARDS-TRACK]RTP Payload Format for H.264 VideoThis memo describes an RTP Payload format for the ITU-T Recommendation H.264 video codec and the technically identical ISO/IEC International Standard 14496-10 video codec, excluding the Scalable Video Coding (SVC) extension and the Multiview Video Coding extension, for which the RTP payload formats are defined elsewhere. The RTP payload format allows for packetization of one or more Network Abstraction Layer Units (NALUs), produced by an H.264 video encoder, in each RTP payload. The payload format has wide applicability, as it supports applications from simple low bitrate conversational usage, to Internet video streaming with interleaved transmission, to high bitrate video-on-demand.This memo obsoletes RFC 3984. Changes from RFC 3984 are summarized in Section 14. Issues on backward compatibility to RFC 3984 are discussed in Section 15. [STANDARDS-TRACK]RTP Payload Format for Scalable Video CodingThis memo describes an RTP payload format for Scalable Video Coding (SVC) as defined in Annex G of ITU-T Recommendation H.264, which is technically identical to Amendment 3 of ISO/IEC International Standard 14496-10. The RTP payload format allows for packetization of one or more Network Abstraction Layer (NAL) units in each RTP packet payload, as well as fragmentation of a NAL unit in multiple RTP packets. Furthermore, it supports transmission of an SVC stream over a single as well as multiple RTP sessions. The payload format defines a new media subtype name "H264-SVC", but is still backward compatible to RFC 6184 since the base layer, when encapsulated in its own RTP stream, must use the H.264 media subtype name ("H264") and the packetization method specified in RFC 6184. The payload format has wide applicability in videoconferencing, Internet video streaming, and high-bitrate entertainment-quality video, among others. [STANDARDS-TRACK]Negotiation of Generic Image Attributes in the Session Description Protocol (SDP)This document proposes a new generic session setup attribute to make it possible to negotiate different image attributes such as image size. A possible use case is to make it possible for a \%low-end \%hand- held terminal to display video without the need to rescale the image, something that may consume large amounts of memory and processing power. The document also helps to maintain an optimal bitrate for video as only the image size that is desired by the receiver is transmitted. [STANDARDS-TRACK]A Real-time Transport Protocol (RTP) Header Extension for Client-to-Mixer Audio Level IndicationThis document defines a mechanism by which packets of Real-time Transport Protocol (RTP) audio streams can indicate, in an RTP header extension, the audio level of the audio sample carried in the RTP packet. In large conferences, this can reduce the load on an audio mixer or other middlebox that wants to forward only a few of the loudest audio streams, without requiring it to decode and measure every stream that is received. [STANDARDS-TRACK]Duplication Grouping Semantics in the Session Description ProtocolPacket loss is undesirable for real-time multimedia sessions, but it can occur due to congestion or other unplanned network outages. This is especially true for IP multicast networks, where packet loss patterns can vary greatly between receivers. One technique that can be used to recover from packet loss without incurring unbounded delay for all the receivers is to duplicate the packets and send them in separate redundant streams. This document defines the semantics for grouping redundant streams in the Session Description Protocol (SDP). The semantics defined in this document are to be used with the SDP Grouping Framework. Grouping semantics at the Synchronization Source (SSRC) level are also defined in this document for RTP streams using SSRC multiplexing.A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) SourcesThe terminology about, and associations among, Real-time Transport Protocol (RTP) sources can be complex and somewhat opaque. This document describes a number of existing and proposed properties and relationships among RTP sources and defines common terminology for discussing protocol entities and their relationships.RTP TopologiesThis document discusses point-to-point and multi-endpoint topologies used in environments based on the Real-time Transport Protocol (RTP). In particular, centralized topologies commonly employed in the video conferencing industry are mapped to the RTP terminology.RTP Payload Format for VP8 VideoThis memo describes an RTP payload format for the VP8 video codec. The payload format has wide applicability, as it supports applications from low-bitrate peer-to-peer usage to high-bitrate video conferences.RTP Header Extension for the RTP Control Protocol (RTCP) Source Description ItemsSource Description (SDES) items are normally transported in the RTP Control Protocol (RTCP). In some cases, it can be beneficial to speed up the delivery of these items. The main case is when a new synchronization source (SSRC) joins an RTP session and the receivers need this source's identity, relation to other sources, or its synchronization context, all of which may be fully or partially identified using SDES items. To enable this optimization, this document specifies a new RTP header extension that can carry SDES items.Sending Multiple RTP Streams in a Single RTP SessionThis memo expands and clarifies the behavior of Real-time Transport Protocol (RTP) endpoints that use multiple synchronization sources (SSRCs). This occurs, for example, when an endpoint sends multiple RTP streams in a single RTP session. This memo updates RFC 3550 with regard to handling multiple SSRCs per endpoint in RTP sessions, with a particular focus on RTP Control Protocol (RTCP) behavior. It also updates RFC 4585 to change and clarify the calculation of the timeout of SSRCs and the inclusion of feedback messages.RTP Payload Format for Flexible Forward Error Correction (FEC)This document defines new RTP payload formats for the Forward Error Correction (FEC) packets that are generated by the non-interleaved and interleaved parity codes from source media encapsulated in RTP. These parity codes are systematic codes (Flexible FEC, or "FLEX FEC"), where a number of FEC repair packets are generated from a set of source packets from one or more source RTP streams. These FEC repair packets are sent in a redundancy RTP stream separate from the source RTP stream(s) that carries the source packets. RTP source packets that were lost in transmission can be reconstructed using the source and repair packets that were received. The non-interleaved and interleaved parity codes that are defined in this specification offer a good protection against random and bursty packet losses, respectively, at a cost of complexity. The RTP payload formats that are defined in this document address scalability issues experienced with the earlier specifications and offer several improvements. Due to these changes, the new payload formats are not backward compatible with earlier specifications; however, endpoints that do not implement this specification can still work by simply ignoring the FEC repair packets.Guidelines for Using the Multiplexing Features of RTP to Support Multiple Media StreamsRequirementsThe following requirements are met by the defined solution to support
the use cases:
REQ-1:
Identification:
REQ-1.1:
It must be possible to
identify a set of simulcasted RTP streams as originating from
the same media source in SDP signaling.
REQ-1.2:
An RTP endpoint must be
capable of identifying the simulcast stream that a received RTP
stream is associated with, knowing the content of the SDP
signaling.
REQ-2:
Transport usage. The solution
must work when using:
REQ-2.1:
Legacy SDP with separate
media transports per SDP media description.
REQ-2.2:
Bundled
SDP media descriptions.
REQ-3:
Capability negotiation. The
following must be possible:
REQ-3.1:
The sender can express
capability of sending simulcast.
REQ-3.2:
The receiver can express
capability of receiving simulcast.
REQ-3.3:
The sender can express
the maximum number of simulcast streams that can be
provided.
REQ-3.4:
The receiver can express the
maximum number of simulcast streams that can be received.
REQ-3.5:
The sender can detail the
characteristics of the simulcast streams that can be
provided.
REQ-3.6:
The receiver can detail the
characteristics of the simulcast streams that it prefers to
receive.
REQ-4:
Distinguishing features. It must
be possible to have different simulcast streams use different codec
parameters, as can be expressed by SDP format values and RTP payload
types.
REQ-5:
Compatibility. It must be
possible to use simulcast in combination with other RTP mechanisms
that generate additional RTP streams:
REQ-5.1:
RTP retransmission.
REQ-5.2:
RTP Forward Error Correction.
REQ-5.3:
Related payload types
such as audio Comfort Noise and/or DTMF.
REQ-5.4:
A single simulcast stream can consist of
multiple RTP streams, to support codecs where a dependent stream
is dependent on a set of encoded and dependent streams, each
potentially carried in their own RTP stream.
REQ-6:
Interoperability. The solution
must be possible to use in:
REQ-6.1:
Interworking with
nonsimulcast legacy clients using a single media source per
media type.
REQ-6.2:
WebRTC environment with
a single media source per SDP media description.
AcknowledgementsThe authors would like to thank , , , , ,
, and for the feedback they provided during the development of
this document.Contributors and , both from Ericsson, have contributed with important material
to the first draft versions of this document. and from Cisco, from Google, and from Mozilla contributed significantly to subsequent
versions.Authors' AddressesEricssonGronlandsgatan 31SE-164 60 StockholmSwedenbo.burman@ericsson.comEricssonTorshamnsgatan 23SE-164 83 StockholmSwedenmagnus.westerlund@ericsson.comCisco170 West Tasman DriveSan JoseCA95134United States of Americasnandaku@cisco.comCisco170 West Tasman DriveSan JoseCA95134United States of Americamzanaty@cisco.com