The Broadband Monitoring Programme seeks to provide real-world data sampled from consumer Internet services so that Retail Service Provider (RSP) products can be compared by an independent body (the ACCC) and the results are published publicly. The ACCC has commissioned SamKnows to conduct this data collection using ‘whitebox’ devices that are pre-configured and are simply connected to the end user’s router.
As one of the participants of this programme, I quickly noticed the data being collected and reported by my whitebox showed my connection performance was implausibly low. I performed network analysis of the traffic to and from the whitebox on my network using wireshark, then inspected the route to each of the servers the device was communicating with. The results suggest the whitebox is using a method to measure network performance that cannot possibly be accurate or meaningful within the defined context – it’s not measuring the Internet performance between my premises and the RSP, it’s measuring my connection’s usable bandwidth to servers on the other side of the planet.
If SamKnows have configured all of their whitebox devices like this, the entire Broadband Monitoring Programme’s data is useless. But even if this is only true for some of the participating devices, it still poisons the data set, invalidating the pool of data, which makes it unusable.
On the face of it, the ACCC’s Broadband Monitoring Programme has a laudable objective – providing independent bandwidth test data with which we can compare the real-world performance of Australian ISPs/RSPs. The reality, however, is that the method employed to collect this test data may be flawed, rendering the data useless; instead of providing clarity, it may just be further muddying the waters.
When I first heard about the monitoring programme I signed up to participate. Working in the IT industry, the state of broadband in Australia has been something of a bugbear my whole career, and improving it is something I’m passionate about. So I was keen to be able to add data from my own connection to the overall pool of end user experience.
My house has a new HFC cable, connected July 2017 as there wasn’t a pre-existing Telstra Cable line to the premises. We’ve had NBN over HFC since October 2017. Initially we had performance issues that couldn’t be accounted for by our RSP Aussie Broadband. NBN explicitly denied their network could be the cause of the issue, then two weeks later they admitted there were widespread issues with HFC across the country due to NBN services operating at higher frequency than the existing Telstra/Foxtel services, which was more susceptible to poor quality cable, taps and signal noise. Remediation work was performed on the HFC in our area on several occasions, and (other than the occasional ongoing network maintenance work that’s happened two or three times) since January 2018 we’ve had very reliable, stable performance.
What do I mean by reliable, stable performance? I have a 100mbps downstream, 40mbps upstream NBN service (100/40). After my initial issues (with connection performance as low as 54/16 at times) I got used to regularly checking my bandwidth stats. Over the past 16 months I’ve seen consistent performance measurement at over 95/36mbps using Aussie Broadband’s speed test site.
I have come to use ABB’s own test server because it gives the best indication of the connection performance between my house and ABB’s network. Anything beyond that is, unfortunately, beyond ABB’s control. I have of course also tested against alternatives like speedtest.net, but the further you get from your own RSP’s network the more variables you’re introducing that can lead to the test results being unreliable to essentially useless. For example, here are some tests I ran while writing this – I first tested with ABB’s internal test, then three alternative speedtest.net servers (Vodafone, Superloop and Telstra), as well as Whistleout, OzSpeedTest, LightningBroadband and finally the SamKnows broadbandperformance.co.uk:
Just from this example, you can instantly see that there’s a large variation in performance results from the same connection tested just minutes apart. Let me assure you, the lower performance figures aren’t a result of the connection fluctuating wildly – tests against the ABB server during and immediately after these tests assured me that the results were due to either the test servers themselves, or some part of the network in-between.
I also performed another couple of speed tests later that evening at 8:09pm, which should be right in the middle of the peak period:
Yes, the ‘OzBroadband Speed Test’ is using East-Coast time.
So we know from doing tests like this that speed test measurements aren’t necessarily a reliable way to tell how well our broadband connection is performing, unless we can control the parameters somewhat. This is, of course, the same with all scientific testing and measurement – if you can’t control the variables in the test environment, you can’t rely on the measurement data to be accurate.
In order to rule out problems caused by external network latency and possible bandwidth issues that are beyond the control of our RSP, we try to perform bandwidth tests using a test server on the RSP’s network. But wait, what if our RSP is cheating somehow? How could we possibly know if this data is reliable? We need some other way to test performance in the real world.
There are numerous professional network performance analysis tools, but most people don’t have access to these, so I’ll give you an example of a simple way to reliably test real world network performance. Take a single large file, like a video or disc image, and copy it from one computer to another. Ensure the media you’re using is capable of much faster performance than the network you’re testing (such as a high quality SSD). For example, copying a single file from a Samsung 970 Pro m.2 SSD on one computer to another 970 Pro m.2 SSD on another computer ensures that a 1gbps network can be fully saturated, as it can only transfer data at a maximum of 125MB/s, while the SSDs are capable of reading and writing sequentially at several thousand MB/s. Clearly the 970 Pro is severe overkill, as any SSD capable of sustained sequential write performance of at least 200MB/s will be more than capable of saturating a 1gbps link.
So this is the sort of real world testing I do, because transferring files between my home network and remote servers is a task I perform routinely. Here’s an example, copying a Virtual Machine’s data file from a remote host on 100/100 business fibre to my home PC over an SSTP VPN connection:
10.7MB/s is 85.6mbps. An SSTP VPN tunnel adds network overhead (around 13%) due to the header and control packets of the PPP protocol, so performance will always be lower than the network’s theoretical maximum bandwidth. 85.6mbps isn’t too far from the 96-97mbps ballpark achieved from the ABB bandwidth tests though, in fact if we remove the 13% overhead we see performance is around 98.4mbps. You can see from that graph that performance remained fairly consistent over the ~28 minutes of the file transfer, which wasn’t bad considering someone else was watching YouTube from another PC in my house at the same time.
All of this tells us that the performance of my domestic NBN connection with Aussie Broadband is pretty close to theoretical maximum, and it’s also fairly consistent and reliable over time. So imagine my surprise when I received my ‘Monthly Report Card’ from SamKnows, showing a summary of my NBN connection’s performance over time, and it looked like this:
Woah, what is that?!
Hmm, that doesn’t seem right…
I only connected the whitebox on the 10th of May, so that’s when it started collecting data. According to the device, my connection’s performance has hovered between around 42 and 59mbps. That doesn’t correlate with my own testing, whether it’s speed tests or real-world experience.
Even the average day performance isn’t particularly illuminating – how can I see testable performance at 95+mbps that the SamKnows whitebox is measuring at ~53mbps? Let’s take a look at the SamKnows dashboard:
This is the monthly summary screen. You can see average download is 51.6mbps while upload is 16.1mbps, both of which look terrible. Jitter isn’t great either, but the things that caught my eye here were the latency of 343ms and DNS response of 136ms. Both of those are really high. For DNS requests, you want your computers and devices to be resolving DNS using the fastest possible server, which usually means the closest. Most home networks are essentially auto-configured – they tell devices inside the network to use the router as the DNS source, and the router gets its DNS tables from the ISP’s DNS server (the closest possible server with the fastest response times). So when your browser asks for the IP address of google.com the DNS server responds as quickly as possible, so you’re not sitting there waiting for your browser to access the google.com search page.
DNS responses of 136ms are well over 10 times longer than I’d expect a DNS request to take – it should be close to 10ms. Here’s a trace route test against the Aussie Broadband DNS server (18.104.22.168):
Now it’s quite possible for DNS response tests to show poor numbers by either testing slow DNS servers, or testing DNS servers a long way away that have lots of networks in-between. This isn’t a fair or reasonable DNS test though, because it has absolutely no material relevance to performance of the network you’re actually meant to be testing.
Then we come to the latency chart. Here’s what it looks like in a bit more detail:
That latency looks pretty consistent over time, but it’s ridiculously high. I know for a fact that there is nothing like that sort of latency between my premises and ABB’s systems, or any local (ie in Australia) network systems – I would have to be trying to connect to servers on the other side of the planet to see latency figures like that. Here’s a trace route to an address I just randomly thought of that’s in the UK, bt.co.uk (British Telecom is the UK’s equivalent to Telstra):
You can see that it takes 13 hops before I’ve cracked 300ms in latency, where it sits fairly consistently as it navigates European and UK networks. There’s something not right with the way the SamKnows whitebox is measuring my network performance.
The only way to figure out what the SamKnows whitebox is talking to is to take a look at its network traffic. I can do this using Wireshark, a network analysis tool. I ran a wireshark trace and logged all the data for about 80 minutes, only paying attention to traffic flowing from or to the SamKnows box (filtered by MAC address). Sifting through the list of IP addresses the device was communicating with gave me this list (with notes I made while looking at the Wireshark data):
Once I had this list of servers, I could figure out their relative connection topology with respect to my residential NBN service, again using trace route:
So what does all that tell us? Well, to draw detailed, accurate conclusions I would have to analyse the amount of data sent and received from each server, and from there I could estimate how much testing it is doing to each one. That would take quite a lot of work though, and I’m not sure at this stage that it’s entirely warranted.
What we can see is there is one server, 22.214.171.124, which is on the Aussie Broadband network, that the SamKnows whitebox seems to be actively testing with a certain amount of data. However, there are two servers that received the vast majority of network traffic during my packet capture, 126.96.36.199 and 188.8.131.52. These servers have latency response in the high 300’s, and they’re both located in the UK. The next most active server is 184.108.40.206. This actually has a DNS resolvable name of n8-the1.sameknows.com, and is also located in the UK.
We can see that DNS queries are performed against Google’s 220.127.116.11 server, using UK domains. There are tests performed against bbc.co.uk media servers and a netflix speed test against an Amazon AWS server in the USA.
We can also say with a fair bit of certainty that the performance data is weighted heavily in favour of testing performed against the UK servers, not the server located in the Aussie Broadband network. I can draw that conclusion because that’s what the Wireshark data shows me, but also because we have the latency figures, which would be far, far lower if those tests were being performed against a server based in Australia with a sub-50ms latency.
I can’t say with any degree of knowledge how widespread this configuration is – I have only performed this analysis on a single SamKnows whitebox that’s connected to my network. There is nothing about it that leads me to believe it is faulty – it seems to be communicating with the SamKnows service, testing and reporting data correctly. It is just configured in such a way that its data is absolutely useless for measuring the performance of my NBN internet connection, because the servers it is testing against are halfway around the planet.
Through discussion with others, I am aware that I’m not the only one seeing unexpected results from the SamKnows whitebox, though. I’ll be interested to see if others publicly reveal similar discoveries in the future.
There is one thing, however, that catches my attention. Remember the SamKnows speed test from earlier?
Isn’t it interesting that this test isn’t actually that bad, yet the data from the whitebox looks terrible? If tests on that page are being served by their UK based server (and they don’t have peers on other networks around the world), then how and why could speed tests from there be not too bad, while data from the whitebox is so terrible? The latency figure is interesting, though – its close to the latency of the server at 18.104.22.168, which seems to be within the ABB network. If that’s the case, why aren’t 100% of the whitebox’s bandwidth tests being performed against that server instead of to servers on the other side of the planet?
Assuming the whitebox itself isn’t faulty, there is one possible explanation for the poor performance, which is routine in network management – traffic prioritisation. If other network operators were to deliberately de-prioritise traffic to specific servers, or specific types of traffic from external sources, they could artificially prejudice results like this. You can see from the route traces above that this whitebox traffic passes over other provider networks. In the absence of enforceable Net Neutrality legislation, this is yet another reason that speed measurement target servers should be located inside the networks of their respective RSPs.
Whatever the cause behind what I’ve seen with the whitebox device at my premises, the data it’s generating is bad, but what’s most concerning is that this bad data is informing policy. To draw an analogy, it’s like making policy decisions on road safety in Sydney based on road accident data from Cairo. The overall bandwidth and responsiveness of the network from my premises to servers based in the UK and the USA is only relevant if you’re designing a global network and you’re interested in dramatic increases in bandwidth between continents. If you want to know what the performance of my NBN service is, you can only measure it from my premises back to my RSP.
Whether this traffic is subject to network prioritisation or not doesn’t matter too much. The reality is that there are far too many variables involved across multiple, disparate, uncontrolled environments to make such measurements useful and valid. The only way to measure the performance of retail services is to test their connection from the premises back to the ISP/RSP’s network, or back to some transparent, independent central network within Australia that is well understood and is guaranteed to provide reliable and unbiased access to all RSPs equally. It appears SamKnows has such a server already within ABB’s network, so we can assume they probably have something similar within the networks of other Australian RSPs. If that’s they case, why aren’t the whiteboxes performing bandwidth tests against their respective RSP-located servers exclusively?
The SamKnows whitebox that I have doesn’t do that, and the data it generates is added to the pool of data that is being reported to the ACCC, who are publishing it in reports that are then referred to by RSPs to sell their services, the ACCC to criticise what they believe is poor performance, and the Parliament to debate policy. Unless this monitoring programme is completely overhauled to ensure the whiteboxes are only measuring performance against RSP hosted servers (or at the very least network agnostic servers hosted within Australia) and there is complete transparency around the way these devices are operating in future, it should be scrapped. Because right now all we know is we have bad data that can’t be trusted, and when you use that in the belief that it’s reliable it’s worse than if you weren’t doing any measurement at all.