TESTING METHODOLOGY

The methodology of the umlaut connect Mobile Benchmark is the result of more than 15 years of testing mobile networks. Today, network tests are conducted in more than 80 countries. Our methodology was carefully designed to evaluate and objectively compare the performance and service quality of mobile networks from the users’ perspective.

The umlaut connect Mobile Benchmark Spain comprises of the ­results of extensive voice and data drive­tests and walktests as well as a sophisticated crowdsourcing approach.

DRIVETESTS AND WALKTESTS

The drivetests and walktests in Spain took place between late September and mid-October 2019. All samples were collected during the day, between 8.00 a.m. and 10.00 p.m. The network tests covered inner-city areas, outer metropolitan and sub-urban areas. Measurements were also taken in smaller towns and ­cities along connecting highways. The connecting routes between the cities alone ­covered about 1,900 kilometres per car – 7,630 kilometres for all four cars. In total, the four vehicles together have covered about 11,570 kilometres.

The combination of test areas has been selected to provide re­pre­sentative test results across the Spanish population. The areas selected for the 2019 test ­account for 11.7 million people, or roughly 25.2 per cent of the total population of Spain. The test routes and all visited cities and towns are shown here

The four drive-test cars were equipped with arrays of ­Samsung Galaxy S9 smartphones for the ­simultaneous measurement of voice and data services.

VOICE TESTING

One smartphone per operator in each car was used for the voice tests, setting up test calls from one car to another. The walktest team also carried one ­smartphone per operator for the voice tests. In this case, the smartphones called a stationary counterpart. The ­audio quality of the transmitted speech samples was ­evalua­ted using the HD-voice capable and ITU standardised so-called ­POLQA ­wideband algorithm. 

All smartphones used for the voice tests were set to VoLTE ­preferred mode. In networks or areas where this modern 4G-­based voice technology was not available, they would perform a fallback to 3G or 2G.

In the assessment of call setup ­times we also rate the so-called P90 value. Such values specify the threshold in a statistical distri­bu­tion, below which 90 per cent of the gathered values are ­ranging. For speech quality, we publish the P10 value (10 percent of the values are lower than the specified threshold), because in this case higher values are better.

In order to ­account for ­typical ­smartphone-use scenarios during the voice tests, background data ­traffic was generated in a ­controlled way ­through ­injection of 100 KB of data traffic (HTTP ­downloads). As a new KPI in 2019, we also evaluate the so-called ­Multi­rab (Multi ­Radio ­Access Bearer) ­Connectivity. This value denominates whether data connectivity is available during the phone calls. The voice scores account for 32 per cent of the total results.

DATA TESTING

Data performance was ­mea­sured by using four more ­Galaxy S9 in each car – one per operator. Their ­radio access technology was also set to LTE preferred mode.

For the web tests, they accessed web ­pages ­according to the ­widely ­recognised Alexa ranking. In addition, the ­static ­“Kepler” test web ­page as ­spe­cified by ETSI (Euro­pean ­Tele­commu­­ni­ca­tions Standards Insti­tute) was used. In order to test the data ­service performance, files of 5 MB and 2.5 MB for download and up­load
­were transferred from or to a test ­server ­located in the cloud. In addition, the peak data ­per­formance was tested in uplink and downlink ­directions by assessing the amount of data that was transferred within a ­seven ­seconds time period.

The evaluation of ­YouTube playback takes into account that ­YouTube dynamically adapts the video ­resolution to the avai­l­able band­width. So, in addition to ­success ratios and start times, the measurements also ­determined average video ­resolution.

All the tests were conducted with the best-performing ­mobile plan available from each operator. Data scores ­account for 48 per cent of the total results.

CROWDSOURCING

Additionally, umlaut conducted crowd-based analyses of the Spanish networks which contribute 20 per cent to the end result. They are based on data gathered
between mid-April and end of September 2019. 

For the collection of crowd data, umlaut has integrated a back­ground diagnosis processes into 800+ ­diverse Android apps. If one of these applications is ­installed on the end-user’s phone and the user authorizes the background analysis, data collection takes place 24/7, 365 days a year. ­Reports are generated for every hour and sent daily to umlaut‘s cloud servers. Such ­reports ­occupy just a small number of bytes per mes­sage and do not include any ­personal user data. Interested parties can deliberately take part in the data gathering with the ­specific ”U get“ app (see below). 

This unique ­crowdsourcing ­technology ­allows umlaut to ­collect data about ­real-world ­experience wher­ever and when­ever customers use their smartphones.

NETWORK COVERAGE

For the assessment of network coverage, umlaut lays a grid of 2 by 2 ­kilometres over the whole test area. The “evaluation areas“ generated this way are then subdivided into 16 smaller tiles. To ensure statistical ­relevance, umlaut ­requires a certain number of users and measurement values per operator for each tile and each evalua­tion area.

In our 2019 benchmark frame­work, we differentiate between a „Benchmark View“ and an „Own Network View“ at the crowd results: For the Benchmark View, only those evaluation areas are considered for which we have determined valid results for all operators who are considered in the benchmark. In the „Own Network View“ this exclusion is not made – an evaluation area will be consi­dered if ­there are valid samples for the ­assessed operator, regardless of the presence of competitors.

Above that, we now distinguish urban and non-urban areas in our crowd evaluations – respecting that the coverage with mobile ­services is usually higher in urban areas than in rural surroundings. We specify according coverage values for the co­verage of voice ­services (2G, 3G and 4G combined), data (3G and 4G ­com­bined) and 4G only.

DATA THROUGHPUTS

Additionally, umlaut investigates the data rates that were ­actually ­available to each user. For this purpose, we determine maximum download and upload data rates
per user within 15 minute slices. These values are then aggrega­ted per evaluation area in 4-week-­timeslices, for each of which we determine the P90 value. For the final calculation of this KPI we then calculate the average of the ­results of the six timeslices.

DATA SERVICE AVAILABILITY

Also called “operational excellence“, this parameter indicates the number of “service degradations“ – events where ­data connectivity is ­impacted by a number of identified ­anomalies with sufficient severity. To judge this, the algorithm compares ­similar time­frames on similar days in a ­window around the day and time of ­interest. The algorithm looks at large ­scale anomalies on a network-wide level and ensures that individual users‘ degra­dations such as a simple loss of coverage due to an indoor stay or similar reasons can not affect the result.

In order to ensure statistical re­levance, ­valid assessment weeks and hours must fulfill distinct requirements. Each operator must have sufficient statistics for trend and noise analyses per each evaluated time windows. The exact number depends on the ­market size and number of operators. Data Service Availability is based on the same 24-week observa­­tion period as our other crowd results.

Two boxes were mounted into the rear and side windows of each measurement car in order to support eight smartphones per car.

Two boxes were mounted into the rear and side windows of each measurement car in order to support eight smartphones per car.

One Samsung Galaxy S9 per operator took the voice measurements and one additional S9 per operator was used for the data tests. All test phones were operated and supervised by umlaut‘s unique control system.

One Samsung Galaxy S9 per operator took the voice measurements and one additional S9 per operator was used for the data tests. All test phones were operated and supervised by umlaut‘s unique control system.

8225452_opt.jpeg
 
Donut_Data_Voice_Crowd_v2.png

PARTICIPATE IN OUR CROWDSOURCING

Everybody interested in being a part of our global crowdsourcing panel and obtaining insights into the reliability of the mobile network that her or his smartphone is logged into, can most easily participate by installing and using the “U get“ app. This app exclusively concentrates on network analyses and is available under http://uget-app.com.

“U get“ checks and visualises the current mobile network per­formance and contributes the results to our crowdsourcing platform. Join the global community of users who understand their personal wireless performance, while contributing to the world’s most comprehensive picture of mobile customer experience.


ConclusioN

Vodafone wins for the fifth time in a row. Orange ­manages to defend its second rank and shows the most distinct score improvements over last year‘s ­results. Movistar ranks third and Yoigo fourth, both with overall “good“ results. 

The clear winner of the umlaut connect Mobile Benchmark Spain is Vodafone – for the fifth time in a row. Orange succeeds in defending the second rank, which it took last year from its constant rival Movistar. Achieving a substantial improvement over it‘s 2018 score, Orange solidifies its position. Like the winner, Orange also achieves the overall grade „very good“.

Movistar ranks third overall but still holds the second position in the data discipline. As in the previous year, the Telefónica brand achieves the grade „good“. The same is applicable for the smallest Spanish ­ope­rator, Yoigo. In the overall assessment, Yoigo loses some points but also manages to improve in some areas such as the reliability of data connections on rural roads.

In our crowdsourced assessment which is designed to augment and verify the drivetest and walktest results, Movistar takes the lead from Orange. In the Data Service Availability category, we could not identify any degradations in any Spanish network in the whole 24-week evaluation period. And our first trial measurements for 5G in the Vodafone network emphasize the distinct advantages of this future mobile communications standard.

ES2020_bar-charts_TotalScore.png
ES2020_table_Overall-Results.png

1 Vodafone.png

For the fifth time in a row, Vodafone is the winner of our ­Mobile Benchmark in Spain. The operator ­clearly  leads both in the voice and data cate­gories. Also, providing us with an outlook at the capabilities of 5G ­con­nectivity, Vodafone proves that it is well positioned for the future.


Umlaut-connect-BEST IN TEST VF ES2019.png
2 Orange.png

Orange keeps the second rank which it took from its constant rival Movistar in the previous year. Showing an overall “very good“ performance, the ­operator ranks second in the voice and crowdsourcing disciplines and third by a close margin in the data assessment. It managed to distinctly improve its score over the previous year.


3 Movistar.png

With strong data ­results and an overall good voice score, the largest Spanish operator ranks third. It achieved the best result in our crowdsourcing evaluations. Compared to its 2018 scores, the Tele­fónica brand has lost some ­points, but it managed to still achieve the overall grade “good“.

4 Yoigo.png

Spain‘s smallest operator scores fourth, but still achieves the overall grade “good“. In ­comparison to last year‘s results, Yoigo lost some points. But the operator manages to improve in some areas such as the ­reliability of data connectivity evaluated in our drive-tests on rural roads.