It seems like Telstra hasn’t had the best of luck when it comes to network problems. I like to joke about it as The Network without Equal (outages) at least, this past few months. They had further issues with voice service outages, and a lesser known, pre-paid service outage which I was hit by.
On 2nd March, they claimed that I was out of credit despite having over 10Gb of a data pack remaining – and they couldn’t actually retrieve any information about my credit. Most of that day, I relied on my back-up data service from Virgin/Optus and made it through the day, but it’s not the sort of thing anyone expected from the most premium network in Australia.
To make up for it, in a way that many users wouldn’t actually be able to take advantage of, they held another free data day on Sunday 3rd April, making this the second this year.
It was rather clear from their announcements that they were apologetic and ultimately, were facing an unprecedented and unacceptable level of performance that had resulted in waves of social media “discontent”.
An e-mail was sent the day before to remind eligible users of the event. Again, as I didn’t have much to do, I decided to repeat the experiment conducted last free-data day – a quest to try and graph the network performance from my location throughout the day.
The same Xiaomi Redmi Note 2 phone was used in the same location with the same service. This time, the sampling was done more frequently, and one more hour existed due to transitioning off daylight saving, so the plots are done in UTC+10 for consistency. I set alarms to wake myself up every 30 minutes/ 1 hour in the early mornings to get some samples in traditionally “sleep” times, and 15 minutes / 5 minutes during the waking hours. Please note that the graph this time is plotted differently to the previous due to the extremities of the data.
My connection commenced around midnight (UTC+11), and the connection trend showed very little congestion as usual up till 7am when the speeds began to slow, as before. The latency increased to 100-120ms as before, but unlike the previous free data day the services quickly became overloaded resulting in throughput dipping near 10Mbit/s on LTE and latencies skyrocketing into the 200ms mark.
This trend seemed rather peculiar to me, and I decided that I couldn’t dedicate all of my phone to the experiment, so I took time to upgrade the ROM of the phone which took half an hour. The result? Speed improvement. Sampling into the evening was increased to 5 minute intervals, and clearly shows the load easing almost linearly past 8pm, recovering to the unloaded ~100Mbit/s service we normally get.
Let me explain – after the upgrade and several reboots, the PPP data session was broken with the terminating router/access point, and re-established with a different terminating router/access point. This is seen by the different IP blocks I was allocated (above graph). It seems their load balancing system was at play here – it knew that the former system was under higher load and allowed new sessions to establish onto the less loaded system, resulting in better throughput. This thus suggests that congestion was happening at network core elements, rather than the last-mile.
The skyrocketing latencies also suggests that packets were “queueing” somewhere, in a buffer somewhere in the network. After re-establishment of session on the new block, latencies were consistent indicating that the new access point was not approaching capacity.
The cause of this is likely to be that people may have gotten up earlier to exploit the free data day and all connected to the network well-in-advance of actually needing to use it. As a result, their sessions were loaded onto one particular access point, which was overwhelmed when they all started to use the service. Why sessions were not broken to force re-establishment may be related to the fact that if they get that wrong, they are likely to cause a cascading failure which caused them to have to offer this free data day in the first place!
This second access point also has an unusual characteristic where upload rates dip to about half the expected rates for long periods. Maybe this indicates a local user which has started large uploads, or a particular peculiarity with the traffic which this node handles. Maybe it’s a primary node for other services (business, M2M) which had been pressed into overflow service.
As a result, if I were to estimate based on data rate, I’d say that the data consumption probably peaked around mid-day at probably 60% increased loading over the previous free data day.
The speed improvement was not ultimately due to any ROM upgrade related things as the baseband version and the performance in unloaded circumstances remained identical.
Customers, being the unreasonable people they are, took to social media to complain. In this article, it was claimed that 2686 terabytes of data was used (~245Gbit/s average) and saw 46% more download amount used. Some complainants were seeing 4.8Mbit/s.
They claimed that the data rate peak of the previous free data day was exceeded by 8am and stayed above that level until midnight – coincidentally, at 8am was when the latency increased to the same levels as previously measured.
I suspect if they had rebooted their phones, they may have been lucky enough to be diverted to a different access point like I was – which immediately cleared up core congestion, although if their local base last-mile connection was saturated, there’s no real helping that aside from moving to somewhere else.
No one solution satisfies everyone. No free giveaway is bound to be completely satisfying. Someone’s got to whinge. That being said, I think the Telstra network again, performed relatively well under the strain, although parts of their network were definitely overloaded to the brink of turning a 4G service into a 3G service. Better load-balancing probably would have helped, but doing so without terminating existing sessions isn’t possible and can result in inconvenience to the customer. While I didn’t really benefit from the free data in the sense of obtaining/downloading something useful, I did manage to get an experiment out of it – so thanks Telstra!