July 18, 2018

Important QnA on Multimedia Computing


Text Books

T1 Ze-Nian Li & Mark S. Drew, "Fundamentals of Multimedia", Pearson Education, 2004


Question.1 
Explain the difference between Temporal and Frequency masking.

Answer:
Frequency Masking: When an audio signal consists of multiple frequencies the sensitivity of the ear changes with the relative amplitude of the signals. If the frequencies are close and the amplitude of one is less than the other close frequency then the second frequency may not be heard. 

Temporal masking
  • After the ear hears a loud sound, consisting of multiple frequencies, it takes a further short while before it can hear a quieter sound close in frequency.
  • Temporal Masking accomplished 50% overlap between consecutive transform windows and then relating frequency masking.
  • MPEG Audio encrypts frequency masking by quantizing each filter bank with adaptive values from nearby bands, defined by a look up table.
Question. 2
Consider the following set of protocols (SIP, RTSP, RSVP, RTCP, RTP on top of UDP. If you want to design a protocol stack (control and data plane) for Video‐On‐Demand (VOD) service between client and server, (a) which protocols would you use and why, and (b) in which order would you apply your selected protocols? Explain how the protocol stack of selected protocols would be used ?

Answer:
To design the Video‐On‐Demand service,
RTP protocol : for transmission of the video . RTP/UDP enables real‐time transmission. 
RTCP : The accompanying control protocol for RTP is the RTCP that would allow the receiver to provide feedback to the sender if some parameters need to be adjusted during the streaming session. 
RTSP : Since VOD will use commands such as play, stop, pause, fast forward, rewind, so anyone should use the RTSP protocol because it specifies signaling for completing these control tasks between the client and the server.
RSVP: If an IP level also requires the built-in reservation capabilities of VOD traffic, it is important to enable the RSVP protocol to start integrated Guaranteed Services using the RSVP Reservation Protocol.
 
RSVP to reserve bandwidth for the VOD session. 

Once the resources are reserved, VOD traffic should be transmitted via RTP which is on top of UDP/IP. 


Along with this, RTP RTCP and RTSP should provide (1) RTP (Control) to provide control feedback about the status of traffic and receiver, and (2) controlling the channel through signaling (RTSP) control , Stop, and otherwise control the VOD playback.

Question. 3
Compress the string “DEAF!DEFEATED” using LZW algorithm. Find compression ratio by

supposing that originally the characters are represented in 8-bit (B=66, A = 65, ,., Z = 90, Y = 89, , ! = 33).

Answer:
Input: DEAF!DEFEATED

Previous
Current
Output
Code
String

065
A
068
D
069
E
070
F
084
T
033
!
D
E
068
256
DE
E
A
069
257
EA
A
F
065
258
AF
F
!
070
259
F!
!
D
033
260
!D
D
E



DE
F
256
261
DEF
F
E
070
262
FE
E
A



EA
T
257
263
EAT
T
E
084
264
TE
E
D
069
265
ED
D
End Of File
068



The compressed output is 068 069 065 070 033 256 070 257 084 069 068
Compression ratio = 13/11 = 1.18181

Question. 4
Suppose we use a predictor as follows: 

Also, suppose we adopt the quantizer equation

If the input signal has values as follows 20  32  42  54   78   97 115  129  150  173   205
Then find the following using DPCM
a. Predicted signal
b. Error signal
c. Quantized error signal
d. Reconstructed signal

Answer:




Question. 5
Complete the following table:


# of frequency bands
Min # of Samples
Max # of Samples
MPEG Layer 1



MPEG Layer 2



MPEG Layer 3




Answer:


# of frequency bands
Min # of Samples
Max # of Samples
MPEG Layer 1
32
12
12
MPEG Layer 2
32
36
36
MPEG Layer 3
32
12
36

Question. 6
You have to design a DVR server which may serve up to 100 simultaneous users over a 1 gbps connection. One of the key requirements is to support trick modes for video, including aspects like 16X rewind. Typical video will be stored in high definition approx. 20 mbps, 24 frames per second.  Each I frames take 8x data, while P frame takes 2x and B frames take 1x data.  What would be the average size for each I, P and B frames in bytes. 

Answer:
24 fps. It has IBBPBBPBBPBB
In this 1-I, 3-P, 8-B
Each of these has
I- 8X, P-2X and B-1X data
1*8X+3*2X+8*1X=22X
22X = 0.5 sec
1 sec -> 20 mb
0.5 sec -> ?  => 10 mb
22X = 10 mb

X= 106 bits/ 22 = 106 /(22*8) bytes = 5682 bytes

Question. 7
Using the following frequency table compress the string “DEED” (originally 8-bit representation) & find the compression ratio using arthimetic compression(64-bit arthimetic).
Character
A
B
C
D
E
F
G
H
frequency
10
20
10
20
50
70
90
30

Answer:

Characters
Frequency
Range
A
10
[0, 0.03)
B
20
[0.03, 0.1)
C
10
[0.1, 0.13)
D
20
[0.13, 0.2)
E
50
[0.2, 0.36]
F
70
[0.36, 0.6]
G
90
[0.6, 0.9)
H
30
[0.9, 1)
Compression

Symbol
Low
High
Range

0
1.0
1.0
D
0.13
0.2
0.07
E
0.144
0.1552
0.0112
E
0.14624
0.148032
0.001792
D
0.14647296
0.1465984
0.00012544

After compression consider the lower value 0.14647296

Compression ratio = 64/18 =3.555

Question. 8
Ten channels, each with a 150-kHz bandwidth, are to be multiplexed together. What is the minimum bandwidth of the link if there is a need for a guard band of 20 kHz between the channels to prevent interference?

Answer:
For ten channels, we need at least nine guard bands. This means that the required bandwidth is at least
10*150KHz + 9*20 KHz =  1680 KHz

Question. 9
Integrating RTP / RTCP type protocols works better for applications like Skype as opposed to relying on RSVP. Mention True or False and Briefly explain the reason.

Answer:
RSVP will not provide QoS across multiple autonomous systems, especially because the user base is world-wide. While RTP can be used with/without RSVP (i.e. explicit reservation)for Skype, the feedback uses RTCP like mechanism to find delay/jitter/loss-rates and adjust its approach accordingly. In other works rather than expecting quality from network, it modifies application behaviour to align with observed network characteristics.

Question. 10
The LZ compression is used to compress a text document. The dictionary table contains 512 entries. The average character per word in the document is 5. Calculate the compression achieved.

Answer:
The dictionary table contains 512 entries. Hence, it is represented by a 10 – bit code word (2"9 = 512). Thus, the index of each word in the document is a 9-bit number. If the compression scheme was not applied, each character would have been represented by a 7-bit ASCII code. Since there is an average of 5 characters per word, each word would need to be represented by a 35-bit number. However, after compression each word is represented by a 9-bit number. Hence, the compression ratio is 35:9

Question. 11
Calculate the storage capacity per image for SVGA given its resolution as 1920x1080 pixels and the color format is 24bits/pixel.

Answer:
Image Resolution=1920x1080 pixels, Colour Depth=24bits/pixel
Storage capacity = Resolution X (no. of bits per pixel / 8 bits per byte)
    = 1920x1080 x(24/8) bytes = 6220800 bytes