TESTING METHODOLOGY 

The methodology of the umlaut connect Mobile Benchmark is the result of more than 15 years of testing mobile networks.
Today, network tests are conducted in more than 80 countries. Our methodology was carefully designed to evaluate and objectively compare the performance and service quality of mobile networks from the users’ perspective.

The umlaut connect Mobile Benchmark in the United Kingdom com­prises of the ­results of extensive voice and data drive tests and walk tests as well as a sophisticated crowdsourcing approach. 

DRIVE TESTS AND WALK TESTS

The drive tests and walk tests in the UK took place between November 6th and November 26th, 2019. All samples were collected during the day, between 8.00 a.m. and 10.00 p.m. The network tests covered inner-city areas, outer metropolitan and sub­urban areas. Measurements were also taken in smaller towns and ­cities along connecting highways. The connecting routes between the cities alone ­covered about 1,520 kilometres per car – 6,080 kilometres for all four cars. In total, the four vehicles together have covered about 10,700 kilometres.

The combination of test areas has been selected to provide re­pre­sentative test results across the UK‘s population. The areas
selected for the 2019 test ­account for 17 million people, or roughly 27 percent of the total popula­­tion of the United Kingdom. The test routes are shown here, all visited cities and towns are listed in the box below.

Cities and Towns UK 2019.png

The four drive-test cars were equipped with arrays of Samsung Galaxy S9 smartphones for the ­simultaneous measurement of voice and data services.

VOICE TESTING

One smartphone per operator in each car was used for the voice tests, setting up test calls from one car to another. The walk test team also carried one ­smartphone per operator for the voice tests. In this case, the smartphones called a stationary counterpart. The ­audio quality of the transmitted
speech samples was ­evalua­ted using the HD-voice capable and ITU standardised so-called ­POLQA ­wideband algorithm. All smartphones used for the voice tests were set to VoLTE preferred mode. In networks or areas where this modern 4G-based voice technology was not available, they would perform a fallback to 3G or 2G.

In the assessment of call setup ­times we also rate the so-called P90 value. Such values specify the threshold in a statistical distri­bu­tion, below which 90 percent of the gathered values are ­ranging. For speech quality, we publish the P10 value (10 percent of the values are lower than the specified threshold), because in this case higher values are better.

In order to ­account for ­typical ­smartphone-use scenarios during the voice tests, background data ­traffic was generated in a ­controlled way ­through ­injection of 100 KB of data traffic (HTTP ­downloads). As a new KPI in our 2019 setup, we also evaluate the so-called ­Multi­rab (Multi ­Radio ­Access Bearer) Connectivity. This value denominates whether data connectivity is available during the phone calls. The voice scores ­account for 32 percent of the total results.

DATA TESTING

Data performance was ­mea­sured by using four more ­Galaxy S9 in each car – one per operator. Their ­radio access technology was also set to LTE preferred mode.

For the web tests, they accessed web ­pages ­according to the widely ­recognised Alexa ranking. In addition, the ­static ­“Kepler” test web ­page as ­spe­cified by ETSI (Euro­pean Telecommu­­ni­ca­tions Standards Insti­tute) was used. In order to test the data ­service performance, files of 5 MB and 2.5 MB for download and up­load ­were transferred from or to a test server ­located in the cloud. 

In addition, the peak data ­per­formance was tested in uplink and downlink ­directions by assessing the amount of data that was transferred within a ­seven ­seconds time period.

The evaluation of ­YouTube playback takes into account that YouTube dynamically adapts the video ­resolution to the available band­width. So, in addition to ­success ratios and start times, the measurements also ­determined average video ­resolution.

All the tests were conducted with the best-performing ­mobile plan available from each operator. Data scores ­account for 48 percent of the total results.

CROWDSOURCING

Additionally, umlaut conducted crowd-based analyses of the UK‘s networks which contribute 20 percent to the end result.
They are based on data gathered between early June and mid-November, 2019.

For the collection of crowd data, umlaut has integrated a back­ground diagnosis pro­cess into 800+ ­diverse Android apps. If one of these applications is ­installed on the end-user’s phone and the user authorizes the background analysis, data collection takes place 24/7, 365 days a year. ­Reports are generated for every hour and sent daily to umlaut‘s cloud servers. Such ­reports ­occupy just a small number of bytes per mes­sage and do not include any ­personal user data. Interested parties can deli­berately take part in the data gathering with the ­specific ”U get“ app (see box on the right). 

This unique ­crowdsourcing ­technology ­allows umlaut to collect data about ­real-world ­experience wher­ever and whenever customers use their smartphones.

NETWORK COVERAGE

For the assessment of network coverage, umlaut lays a grid of
2 by 2 ­kilometres over the whole test area. The “evaluation areas“ generated this way are then subdivided into 16 smaller tiles. To ensure statistical ­relevance, umlaut ­requires a certain number of users and measurement values per operator for each tile and each evalua­tion area.

In our 2019 benchmark frame­work, we differentiate between a “Benchmark View“ and an “Own Network View“ at the crowd results: For the Benchmark View, only those evaluation areas are considered for which we have determined valid results for all operators who are incorporated in the benchmark. In the “Own Network View“ this exclusion is not made – an evaluation area will be consi­dered if ­there are valid samples for the ­assessed operator, regardless of the presence of competitors.

Above that, we now distinguish urban and non-urban areas in our crowd evaluations – respecting that the coverage with mobile ­services is usually higher in urban areas than in rural surroundings. We specify according coverage values for the co­verage of voice ­services (2G, 3G and 4G combined), data (3G and 4G ­com­bined) and 4G only.

 

DATA THROUGHPUTS 

Additionally, umlaut investigates the data rates that were actually ­available to each user. For this purpose, we determine maximum download and upload data rates per user within 15 minute slices. These values are then aggrega­ted per evaluation area in 4-week time slices, for each of which we determine the P90 value. For the ­final calculation of this KPI we then calculate the average of the ­results of the six time slices.

DATA SERVICE AVAILABILITY 

Also called “operational excellence“, this parameter indicates the number of “service degradations“ – events where ­data connectivity is ­impacted by a number of identified ­anomalies with sufficient severity. To judge this, the algorithm compares similar time­ frames on similar days in a ­window around the day and time of ­interest. The algorithm looks at large ­scale anomalies on a network-wide level and ensures that individual users‘ degradations such as a simple loss of coverage due to an indoor stay or similar reasons can not affect the result.

In order to ensure statistical re­levance, ­valid assessment weeks and hours must fulfil distinct requirements. Each operator must have sufficient statistics for trend and noise analyses per each evaluated time windows. The exact number depends on the ­market size and number of operators. Data Service Availability is based on the same 24-week observa­­tion period as our other crowd results.

Two boxes were mounted into the rear and side windows of each measurement car in order to support eight smartphones per car.

Two boxes were mounted into the rear and side windows of each measurement car in order to support eight smartphones per car.

One Samsung Galaxy S9 per operator took the voice measurements and one additional S9 per operator was used for the data tests.

One Samsung Galaxy S9 per operator took the voice measurements and one additional S9 per operator was used for the data tests.

All test phones were operated and supervised by umlaut‘s unique control system.

All test phones were operated and supervised by umlaut‘s unique control system.

 
Scorebreakdown_Drive_Walk_Crowd_english V2.png
 
U-get-Mockup-Homescreen.jpg

PARTICIPATE IN OUR CROWDSOURCING

Everybody interested in being a part of our global crowdsourcing panel and obtaining insights into the reliability of the mobile network that her or his smartphone is logged into, can most easily participate by installing and using the
“U get“ app. This app exclusively concentrates on network analyses and is available under
http://uget-app.com.

“U get“ checks and visualises the current mobile network performance and contributes the results to our crowdsourcing platform. Join the global community of users who understand their personal wireless performance, while contributing to the world’s most comprehensive picture of mobile customer experience.


 

CONCLUSION

EE wins for the sixth time. Vodafone maintains the second place and shows clear score improvements over last year‘s results. O2 and Three swap places, with O2 ranking third and Three fourth.

The overall winner of the 2019 umlaut connect Mobile
Benchmark in the UK is EE – for the sixth time (in 2016, EE shared the first place with Vodafone). EE‘s lead over the second-placed Vodafone is close in the voice discipline, but more distinct in the data and crowdsourcing categories. Overall, EE defends its position and deserves the grade very good.

As in 2017 and 2018, Vodafone holds the second place and shows a good performance level. The operator shows a distinct score distance to the third-placed contender, which separates the UK market into two stronger and two less powerful providers.

On the lower ranks, we see a swap of places: O2 manages to overtake Three and reach the third place, outperforming the Hutchison brand in all disciplines of our Benchmark and achieving the overall grade satis­factory. This is also confirmed by a distinct improve­ment especially in the Crowd score.

Three ranks last, falling below its ­performance levels from our previous Benchmark and achieving the overall grade sufficient. However, this operator shows some improvements in the results of our crowdsoucing.

UK2020_bar-charts_TotalScore_V2.png
UK2020_table_Overall-Results.png
Single Reviews UK 2019.png

Reactions UK2019.png