[ZhiminXu Notes] Neural Adaptive Video Streaming with Pensieve

2017-10-10

Posted by 许智敏 (Zhimin Xu)

Overview:

This week, I read a paper ‘Neural Adaptive Video Streaming with Pensieve’, which is published on 2017 ACM SIGCOMM by Hongzi Mao, Ravi Netravali, Mohammad Alizadeh.

First of all, state-of-the-art ABR algorithms suffer from a key limitation: they use fixed control rules based on simplified or inaccurate models of the deployment environment. As a result, existing schemes inevitably fail to achieve optimal performance across a broad set of network conditions and QoE objectives.

This paper presents Pensieve, a system that generates ABR algorithms using reinforcement learning (RL), which learns to make ABR decisions solely through observations of the resulting performance of past decisions.

The author using trace-driven and real world experiments spanning a wide variety of network conditions, QoE metrics, and video properties. In all considered scenarios, Pensieve outperforms the best state-of-the-art scheme, with improvements in average QoE of 12%–25%. Pensieve also generalizes well, outperforming existing schemes even on networks for which it was not explicitly trained.

Author evaluate Pensieve using a full system implementation. Their implementation deploys Pensieve’s neural network model on an ABR server, which video clients query to get the bitrate to use for the next chunk; client requests include observations about throughput, buffer occupancy, and video properties. This design removes the burden of performing neural network computation on video clients, which may have limited computation power, e.g., TVs, mobile devices, etc.

Method:
Pensieve
Pensieve trains the neural network using A3C [30], a state-of-the-art actor-critic RL algorithm.
The goal of learning is to maximize the expected cumulative discounted reward.

f2

RL-generated ABR algorithms can automatically optimize for different network characteristics and QoE objectives.

Training Methodology:
Pensieve trains ABR algorithms in a simple simulation environment that faithfully models the dynamics of video streaming with real client applications。
Pensieve is achieved by disabling slow-start-restart on the video server. Disabling slow-start-restart could increase traffic burstiness, but recent standards efforts are tackling the same problem for video streaming more gracefully by pacing the initial burst from TCP following an idle period.

Basic Training Algorithm:
Pensieve’s training algorithm uses A3C. As show in follows figure:

f1

To further enhance and speed up training, Pensieve spawns multiple learning agents in parallel, as suggested by the A3C paper. By default, Pensieve uses 16 parallel agents. Each learning agent is configured to experience a different set of input parameters (e.g., network traces). However, the agents continually send their fstate, action, rewardg tuples to a central agent, which aggregates them to generate a single ABR algorithm model.

Enhancement for multiple videos:
Handling chunks’s variation would require each neural network to take a variable sized set of inputs and produce a variable sized set of outputs.
f3

Implementation:
To generate ABR algorithms, Pensieve passes k = 8 past bandwidth measurements to a 1D convolution layer (CNN) with 128 filters, each of size 4 with stride 1. Next chunk sizes are passed to another 1D-CNN with the same shape. Results from these layers are then aggregated with other inputs in a hidden layer that uses 128 neurons to apply the softmax function (Figure 5). The critic network uses the same NN structure, but its final output is a linear neuron (with no activation function). During training, author use a discount factor r = 0:99, which implies that current actions will be
influenced by 100 future steps. The learning rates for the actor and critic are configured to be 10^4 and 10^3, respectively. Additionally, the entropy factor is controlled to decay from 1 to 0.1 over 105 iterations. They keep all these hyperparameters fixed throughout our experiments.
While some tuning is useful, found that Pensieve performs well for a wide range of hyperparameter values. Thus author did not use sophisticated hyperparameter tuning methods. Author implemented this architecture using TensorFlow. For compatibility, author leveraged the TFLearn deep learning library’s TensorFlow API to declare the neural network during both training and testing.
Pensieve runs on a standalone ABR server.

EVALUATION:
Network traces: To evaluate Pensieve and state-of-the-art ABR algorithms on realistic network conditions, author created a corpus of network traces by combining several public datasets: a broadband dataset provided by the FCC and a 3G/HSDPA mobile dataset collected in Norway.

Adaptation algorithms:
(1) Buffer-Based (BB)
(2) Rate-Based (RB)
(3) BOLA
(4) MPC
(5) robustMPC

Experimental setup:
dash.js (version 2.4) a playback buffer capacity of 60 seconds.
H.264/MPEG-4 codec at bitrates in [300; 750; 1200; 1850; 2850; 4300] kbps.
Google Chrome browser (version 53) and the video server (Apache version 2.4.7).
Mahimahi to emulate the network conditions.
80 ms RTT.

QoE metrics:
general QoE metric used by MPC.

f1

f2

f3

Gains:
a. In all of the considered scenarios, Pensieve is able to rival or outperform the best existing scheme.
b. Pensieve’s ABR algorithms are able to maintain high levels of performance both in the presence of new network conditions and new video properties.
c. Performance is largely unaffected by the neural network architecture and the latency between the video client and ABR server etc.

I think this is a better work for us to use some learning method for HTTP streaming.