This is a data analysis work.
We all know there are problems in current video delivery. But most of us don’t know:
- what the problems are, and
- what cause these problems, and
- how much improvement if we fix these problems
Now, authors begin to answer these questions!
1st step: data collection
Authors collect many sessions of video delivery. Each session has a feature vector X and a performance vector Y.
X= [ASN, CDN, Content provider (Site), VoD or Live, Player type, Browser, Connection type]
Y=[ Buffering ratio, Join time, Average bitrate, Join failures]
2nd step: picking up bad vector Xs (problem clusters).
Bad vector X means many sessions of X’s feature are bad sessions.
conclusion: there are many bad Xs (problem clusters).
3rd step: finding out common features (critical clusters) of bad Xs (problem clusters).
For example:
[A, B, C] = bad
[A, B, D] = bad
[A, E, F] = bad
Then we can conclude that A is a bad feature (critical clusters).
Conclusions:
(1) There are a non-trivial number of problem clusters that are prevalent (i.e., recurrent problems) and persistent (i.e., long lasting);
(2) The majority of these problem clusters are covered by a small number of critical clusters—a few potential causes that can explain most of these observations;
(3) Most of critical clusters correspond to either the Site, the ASN, or the CDN;
(4) While the critical attributes for different quality metrics are similar, the specific values of ASN, Site, or CDN that appear as critical clusters varies quite significantly.

This picture shows the most problematic clusters.
4th step: how much improvement if we fix these critical clusters
- just selecting the top 1% of critical clusters (in terms of coverage) can yield a potential improvement of 15-55% across the quality metrics.
- Proactively alleviating the dominant critical clusters observed
in history can also yield close-to-optimal improvement (60–85% of the upper bound) in the future.
- Even a simple reactive strategy of waiting for critical clusters to emerge after 1 hour and taking remedial actions to address these critical clusters can reduce up to 51% of problem sessions.
Some thoughts
- Some critical clusters are totally impossible to be fixed.
- A good reference in our Introduction.
- Data analysis is very very important. We need to take advantages of data from Fangkuan.