We are going back to application and transport layer.
Mutlimedia networking Application
Why learn about this?
Video is predominant on the internet
- Popular services
- All delievers as OTT
OTT: Over the top - Video services are delievered on top of the internet ISP does not deliever what you get from the internet
Multimedia: Audio
- Analog audio signal smaple at constant rate
- Telephone: 8000 samples/sec
- CD Music: 44100 sample/sec
- Each sample quantized
- 2^8 = 256 possible quantized values
- Each quantized values represented by bits
Converting bits back to analog can cause some quality reduction
- ADC: Analog to digital converter
- DAC: Digital to analog converter
Humans are more sensitve to certain frequency (low) thus we can remove the higher frequency and record our audio in lower bit-rates
Multimedia: Video
- Sequence of images displayed at a constant rate
- Digital image: Array of pixels (Each pixel is a bit)
- Coding: Use redundancy within and between images to decrease number of bits used to encode image
- Spatial (Within image)
- Temporal (From one image to next)
- CBR: (Constant bit rate) - Video encoding rate fixed
-> Better quality
- VBR: (Variable bit rate): Video encoding rate changes as amount of spatial temporal coding changes
-> Constant visual quality
Multimedia networking: 3 Application types
- Streaming, stored: Audio, video
- Streaming: Can begin playout before downloading entire
- Stored(At server): Transmit faster than audio/video will be rendered (Theres buffering)
- Conversational (Two way live) voice/ video over IP
- Interactive nature of human to human conversation limits delay tolerance
- e.g Skype, zoom
- Streaming live (One way live) audio, video
- Live sporting event ( soccer, foootball)
Streaming stored video
- Data is produce and captured at a time
- Stream the data in the similiar way
- Data arrives at the client in the same way
- Sent to the decoder and
Streaming at the (2) later time, client playing out early part of video while server still sending later part of video
Challenges
- Continuous playout constraint: Once client playout begins, playback must match original timing
- But network delays are variable (jitter) so we need a client side buffer to match playout requirements
- Other:
- Clinet interactivity: pause, fast-forward, rewind, jump through videos
- Video packets may be lost, retransmitted
More realistic:
- There might be variable network delay between sending the packets
- The staircase might be uneven
- Play back will be stalled if the packet has not arrived.
- Including a buffer at the client side: Will ensure that we have the packets before starting the playback
- We can see if there is no problems if the blue curve is not intersectted by the black curve
The more delay = longer we have to wait even thou the longer the delay = the less chance of delay
Client side buffering, playout
- Initial fill buffer until playout begins at tp
- Playout begins at tp
- Buffer fill level varies over time as fill rate x(t) varies and playout rate r is constant
If the data is coming as a slower rate, the buffer will drain faster and freeze the video
Streaming multimedia: UDP
- Server sends at rate appropriate for client
- OFten: Send rate = encoding rate = constant rate - > Push based streaming (Server push)
- Transmission rate can be oblivious to congestion levels
- Short playout delay to remove network jitter
- Error recovery: Application level, time permitting
- RTP [RFC 2326]: Multimedia payload types
- UDP may not go through firewalls
Streaming multimedia:HTTP
- Multimedia file retrieved via HTTP GET -> Pull base streaming (client pull)
- Send at max possible rate under TCP
- Fill rate fluctuates due to TCP congestion control, retransmission
- Larger playout delay: Smooth TCP Delivery rate
- HTTP/TCP passes more easily through firewalls
Voice over IP
- Need to maintain conversation aspect
- small delay
- Include application level network delays
- Session init: How does callee advertise IP address, port number, encoding algorithms
- Value-added services: Call forwarding, screening, recording
- Emergency services: 911
Characterstics
- Speaker audio: Alternating talk spurts, silent periods
- 64kbps talk spurt
- Packet only generated during talk spurt
- 20msec chunk at 8Kbytes/sec:160 bytes of data per chunk
- Application-layer header added to each chunk
- Chunk +header encapsulated into UDP or TCP
- Application sends segment into socket every few msec
Packet loss, delay
- Network loss; IP datagram lost due to network congestion (router buffer overflow)
- Delay loss: IP datagram arrives too late for playout at receiver
- delays: Processing, queuing, end sys delays
- Typical tolerable delay: 400ms
- Loss tolenrance: Depending on voice encoding, loss concealment, packetloss rates between 1 to 10 percent can tolerate
Our brain has an error recovery system such that even if there is a little dropout noise in a call, our brain can recover some of the missing parts
Delay Jitter
Similiar to the graph before, the variable network delay is a jitter.
So we introduce a client playout delay.
Fixed playout delay
- Reciever attempts to playout each chunk exactly q msec after chunk was generated
- Chunk has time stamp t: playout chunk at t+q
- Chunk arrives after t+q: data arrives too late for playout: date “lost”
- Tradeoff in choosing q:
- Large q: less packet loss
- Small q: better interactive experience
Adaptive playout delay
Goal: Low playout delay, low late loss rate
Approach: Adaptive playout delay adjustment
- Estimate network delay
- Adjust at beginning of each talk spurt
- Silent periods compressed and elogated
- Chunks still playout every 20 msec
Alpha is like a nob we can turn to see if we can use more of history or more of recent
di is the middle of the bellcurve
The first talk spurt has a longer delay compared to talkspurt.
Q: How does reciever determine whether packet is first in a talkspirt?
If no loss, receiver looks at successive time stamps
Recovery from packet loss
When data is loss completely. We can
- Let it go
- Do recovery via retransmission through application level
- FEC (Forward Error correction)
- Send enough bits to allow recovery without needing new transmission
Piggyback
The low quality part of the previous is attached to the next packet.
THis means that they can retrieve the lower quality packet if its loss
Error concealment
Packet interleaving
- No redundancy
- Reorder the packet sequence and interleave the stream
This means that if the packet is loss, not every piece is missing.
Protocols for real time converational application
RTP (Real time protocol)
- Specifies packet structures for packets carrying audio, video data
- Carries an additional header that is media specific
- Time stamp
- Sequence number
Wholeidea: If two application has RTP means that they are compatible, but this is not the case in real life.
Sometimes people talk about RTP as 3 types of protocol
RTP: Media data (UDP)
RTCP: Use for signalling (UDP)
RTSP: Carries the commands (TCP)
We do not want to loose data regarding comands
RTP Header:
- Sequence number
- Time stamps
- source ID
The reciever needs to know if the data is compress or not
RTP and QoS
- Rtp does not provide any thing to ensure it is real time or QoS gurantees
- RTP encapsulation only sen at end system (Not at intermediate routers)
- Its a transport layer protocol
WebRTC
RTP use to be use for VOD, now companies are using different types of RTP.
RTP has been integrated into browser and it allows it to build RTC capabilities via simple JavaScript APIs.
SIP: Session initiation protocol
Whole idea: Come up with a system that works much more internationally for VoIP over the internet.
- Country code
- Phone number
Can use email address as a phone number no matter where the callee roams no matter what IP device the callee is using
Dynamic Adaptive streaming over HTTPS (DASH)
Notes on streaming:
- RTP/RTSP/RTCP streaming has challenges
- Not compatible
- Need complex servers (Scheduling at server side)
- Protocol use TCP and UDP (firewalls)
- Difficult to cache data (There is no web caching)
- Advantages
- Short to end latency
VoD:
- Can be wasteful if use HTTP streaming just GETs a whole vide when it might not be viewed fully
Now the world has move to DASH
Main Idea of DASH:
- Use HTTP protocol to stream media
- Divide media into small chunks (Streamlets)
Adv:
- Server is simple
- No firewall problems
- Standard image web caching works
Disadvantages
- Media segments are longer compared to RTP
- Latency can be quite low due to buffering (does not work for video conferencing)
Works
- Take a whole video
- Cut into small pieces (Streamlets)
- Encode it into different quality files (Transcoded)
MFD: Descibes whhat is available and what quality
- Webserver provides a playlist
Client run a ABR (Adaptive bitrate algorithm) which determins which segment do download
ie. If the bandwidth is low, it will pull the lower quality.
Streamlets are stitch back together at the client side.
Dash has replace VoD streaming protocols
Recently focus is on low latency due to live video streaming.