TESTING METHODOLOGY 

THE METHODOLOGY OF THE P3 CONNECT MOBILE BENCHMARK IS THE RESULT OF MORE THAN 15 YEARS OF TESTING MOBILE NETWORKS. TODAY, NETWORK TESTS ARE CONDUCTED IN MORE THAN 80 COUNTRIES. OUR METHODOLOGY WAS CAREFULLY DESIGNED TO EVALUATE AND OBJECTIVELY COMPARE THE PERFORMANCE AND SERVICE QUALITY OF MOBILE NETWORKS FROM THE USERS’ PERSPECTIVE. 

The P3 connect Mobile Benchmark in the Netherlands comprises of the results of extensive voice and data drive­tests and walktests as well as a sophisticated crowdsourcing approach.

 

DRIVETESTS AND WALKTESTS
The drivetests and walktests in the Netherlands took place from February 12 to March 1, 2019. All samples were collected during the day, between 8.00 a.m. and 10.00 p.m. The network tests covered inner-city, outer metro­politan and sub­urban areas. Mea­surements were also taken in smaller towns and on the connecting highways. The two measurement cars together covered about 2,140 kilo­metres in the cities, about 620 km in towns and about 3,730 km on the roads – resulting in a total of 6,490 kilometres. The combination of test areas has been selected to provide repre­sentative test results across the Dutch population. The areas selected for the 2019 test account for approximately 5.7 million people, or roughly 33.5 per cent of the total popula­tion of the Netherlands. 

The drivetests covered 21 cities and 31 towns. Addi­tionally, one team conducted walktests in seven cities and also on railway journeys between 14 destinations. The exact routes are shown here, all visited cities and towns are listed in the box below.

Cities and Towns NL2019.png

The two drive-test cars as well as the battery-powered backpacks of the walktest teams were equipped with arrays of Samsung Galaxy S9 smartphones for the simultaneous measurement of voice and data services.

VOICE TESTING
One smartphone per operator in each car was used for the voice tests, setting up test calls from one car to another. The walktest team also carried one smartphone per operator for the voice tests. In this case, the smartphones called a stationary counterpart.The audio quality of the calls was evalua­ted using the HD-voice capable and ITU standardised POLQA wideband algorithm. All smartphones used for the voice tests were set to VoLTE preferred mode. In networks or areas where this modern 4G­based voice technology was not available, they would perform a fallback to 3G or 2G.

As a new KPI in 2018, we assess the so-called P90 value for call setup times. P90 values specify the threshold in a statistical distri­bution, below which 90 per cent of the gathered values are ranging.

In order to account for typical smartphone use during the voice tests, background data traffic was generated through random injection of small amounts of HTTP traffic. The voice scores account for 34 per cent of the total results. 

DATA TESTING
Data performance was mea­sured by using three more Galaxy S9 per car or walktest team – one per operator. Their radio access technology was set to LTE preferred mode.

For the web tests, they accessed web pages according to the widely recognised Alexa ranking. In addi­tion, the static Kepler test web page as spe­cified by ETSI (European Telecommunications Standards Institute) was used. In order to test the data service performance, files of 3 MB and 1 MB for download and up­load were transferred from or to a test server located on the Internet. In addition, the peak data performance was tested in uplink and downlink directions by assessing the amount of data that was transferred within a seven seconds time period. This KPI targets to show the network capability, i.e. the maximum achievable data throughput, similar to what speed test apps would show. Such applications typically use multiple TCP sockets to overcome possible limitations to the maximum throughput of a single TCP connection. Such limits are caused by the combination of a variety of network parameters. Our measurements were executed using three parallel sockets for all ope­rators to ensure fairness. The carrier aggregation capabilities play a role. The more carrier frequencies are combined, the higher the throughput can be, always depending on whether network layout and other parameters do not prevent higher data rates. In such cases, higher throughput could be achieved with more sockets.

The evaluation of YouTube playback takes into account that YouTube dynamically adapts the video resolution to the available band­width. So, in addition to success ratios, start times and playouts without inter­rup­tions, we also determined average video resolution. All tests were conducted with the best-performing mobile plan of each operator. Data scores account for 51 per cent of the total results.

CROWDSOURCING

Additionally, P3 conducted crowd-based analyses of the Dutch networks which contribute 15 per cent to the end result. They are based on data that were gathered in December 2018 as well as in January and February 2019. For the collection of crowd data, P3 has integrated a back­ground diagnosis pro­cesses into 800+ diverse Android apps. If one of these applications is installed on the end user’s phone and the user authorizes the background analysis, data collection takes place 24/7, 365 days a year. Reports are generated for every quarter of an hour and sent daily to P3‘s cloud servers. Such reports contain just a small number of bytes per mes­sage and do not include any personal user data.

NETWORK COVERAGE
For the assessment of network coverage, P3 lays a grid of 2 by 2 km over the whole test area. The “­evaluation areas“ generated this way are then sub-divided into 16 smaller tiles. To ensure statistical relevance, P3 requires a certain number of users and measurement values per operator for each tile and each evalua­tion area. If these thresholds are not met by one of the operators, this part of the map will not be considered in the assess­ment for the sake of fairness.

“Quality of Co­ve­rage“ reveals whether voice and data services actually work in an evaluation area. P3 does this because not in each area that allegedly provides network reception, mobile services can actually be used. We specify these values for the co­verage of voice services (3G and 4G combined), data (3G and 4G combined) and 4G only.

DATA THROUGHPUTS 
Additionally, P3 investigates the data rates that were actually available to each user. For this purpose, we determine the best obtained data rate for each user during the evaluation period and then calculate their average value. In addition, we determine the so-called P90 values for he top throughput of each evaluation area as well as of each user‘s best throughput. P90 values specify the threshold in a statistical distribution, below which 90 per cent of the gathered values are ranging and depict how fast the network is under favourable conditions.

DATA SERVICE AVAILABILITY 
Formerly called “operational excellence“, this parameter indicates the number of outages or service degradations – events where data connectivity is impacted by a number of cases that significantly exceeds the expectation level. To judge this, the algorithm looks at a sliding window around the hour of interest. This ensures that we only consider actual degradations as opposed to a simple loss of network coverage due to prolonged indoor stays or similar reasons. In order to ensure statistical relevance, each operator must have sufficient statistics for trend and noise analyses per each evaluated hour. The exact number depends on the market size and number of operators. A valid assess­ment month must comprise of at least 90 per cent of valid assess­ment hours. Deviating from the other crowd score elements, Data Service Availability is rated based on a nine-month observation period – in this case from June 2018 to February 2019.

Two boxes were mounted into the rear and side windows of each measurement car in order to support six smartphones per car.

Two boxes were mounted into the rear and side windows of each measurement car in order to support six smartphones per car.

One Samsung Galaxy S9 per operator took the voice measurements and one additional S9 per operator was used for the data tests.

One Samsung Galaxy S9 per operator took the voice measurements and one additional S9 per operator was used for the data tests.

All test phones were operated and supervised by P3‘s unique control system.

All test phones were operated and supervised by P3‘s unique control system.

 
Scorebreakdown_Drive_Walk_Crowd_englisch.png
 
P3 has integrated a back­ground diagnosis pro­cesses into 800+ diverse Android apps. If one of these applications is installed on the user’s phone and the user authorizes background analysis, data collection takes place 24/7, 365 days a year.

P3 has integrated a back­ground diagnosis pro­cesses into 800+ diverse Android apps. If one of these applications is installed on the user’s phone and the user authorizes background analysis, data collection takes place 24/7, 365 days a year.

 

CONCLUSION

T-MOBILE IS THE OVERALL WINNER – FOR THE FOURTH TIME IN A ROW. KPN SHOWS THE BIGGEST SCORE IMPROVEMENT IN COMPARISON TO THE PREVIOUS YEAR, WHILE VODAFONE KEEPS ITS HIGH PERFORMANCE LEVEL.

For the fourth time in a row, T-Mobile is the clear winner of the P3 connect Mobile Benchmark Netherlands – and it ranks at the highest score level which has ever been achieved in a P3 connect Mobile Benchmark.

KPN and Vodafone follow behind the winner at some distance. With their scores only two points apart, they show practically the same level of performance. And the fact, that all three Dutch operators achieve the grade “outstanding” clearly emphasises the very high performance level of the Netherland’s mobile networks.

On this very high level, the Benchmark at hand reveals some differences: While T-Mobile leads in all three assessment categories, KPN performs slightly better than Vodafone in the Voice category, while the ranking order is vice-versa in the Data category. In the Crowdsourced evaluations, Vodafone loses a few points due to a sligthly elevated level of service degradations in August, September and October 2018. This time, KPN showed the biggest score improvement over its result from the previous year. However, next year the race is open again.

NL2019_bar-charts_TotalScore.png
NL2019_table_Overall-Results_withRailway_v2.png
NL2019_Conclusion.png