TPG is one of Australia’s most value-for-money ISPs. I have been a loyal customer of theirs, being on an old Naked ADSL2+ plan with VoIP bundled in, and generally speed and reliability have been nothing short of brilliant for the price paid (averaging 6 hours downtime per year). There are, however, many shortcomings in the arena of support – if you need it, this is not the ISP for you. For people who support themselves, it’s a perfectly fine choice. I have yet to pick up the phone and call them once in five years, although it turns out contacting them through the forums is a much better way (or you can Whingepool, but it can take a bit longer), which I have done several times.
But lately, things have been going a bit strange. Occasionally DNS requests fail the first time, and succeed the second (after a ipconfig /flushdns to remove the cached lookup failure). Most days, the VoIP ATA fails to register with the SIP server around the middle of the day – pings to tpgvoip.tpg.com.au fail with a lookup failure. This happened before, but the DNS record is valid – a check using other DNS lookup services online like this one says that it’s fine. Most DNS requests for other domains are fine, although there are a few which fail – so we know the DNS server is “alive”, just not really responding correctly.
We can observe the VoIP being reverse DNS-ed to proxy-c5.soulaustralia.com.au which completes the lookup just fine when we plug this into the VoIP ATA. But as some people have realized – it’s an endless game of changing the SIP server back and forth when one fails, the other *might* work. The other thing I noticed was that the IP address of the lookup changes all the time, with a short time to live of 60s – likely for load balancing.
The problem at hand is much more fundamental …
Today, it got worse – www.tpg.com.au was also a lookup failure.
Maybe it’s a large number of people signing up to their unlimited ADSL2+ offerings, network worms, inconsiderately coded file sharing programs (e.g. defaulting to reverse-DNS every IP address in the peer list just to show the string and a “flag”, you know the sort). Whatever it is, TPG is definitely having trouble as of the past few days running a stable DNS offering.
For the less tech savvy, DNS is short for Domain Name Server, basically a computer which translates domain names (www.example.com) into IP addresses (184.108.40.206) for the computer to connect to. Everything which is entered as a domain name relies on the DNS to transform it to an address before any connection (and consequently data) can be made.
Verifying the Problem
Before we can blame the DNS for the failures – there are several pitfalls in your average home network which you must check first.
First of all, most modem/router units with DHCP server will act as a DNS relay. This means that all computers connected to the modem/router will query the modem/router which will forward requests to whatever DNSes it knows of, if necessary.
The advantage of this setup is that the router can locally cache and remember what previous DNS lookup results, avoiding needless queries and speeding up the process. This is enabled by DNS lookup results with long Time to Live flags allowing the result to be cached for the future. But if your router has incorrect information about the DNSes due to manual misconfiguration or a failure of autoconfiguration (DHCP) on the WAN side, then it would appear that your DNS queries are failing but the fault is not the ISP.
Another advantage of a local DNS relay is that some of them will allow you to resolve LAN computer names to IP addresses, without resorting to NetBIOS discovery … none of this will work with using an ISP DNS directly.
So I suppose the first step is to verify settings, and reboot the gateway to ensure the caches are clear. Or better yet, not trust the gateway at all.
In this case, you would directly configure the TCP/IP settings on your machine to request DNS directly from the DNS server IPs, hence bypassing the gateway’s DNS relay/caching mechanism. It would also be wise to run ipconfig /flushdns to try and remove any cached records, although some applications may also cache themselves, complicating diagnosis.
Note that in the example above, I am using a static IP configuration – with TPG’s DNS servers plugged into the preferred slot. This would make all queries to TPG’s DNS. Unfortunately, at this *exact time*, www.tpg.com.au is not resolvable! Best ISP ever!
Of course, if you need more confirmation about why the lookup failed – then Wireshark comes up handy. But I’m not so worried, because there is a way around it …
Alternative DNS Servers
Now that I’ve pretty much narrowed it down to TPG’s DNS (in 5 minutes), the solution is obvious. Leverage a better DNS service! While there used to be many open DNS resolver solutions, they also suffered from overloading and slow service due to their remote location (i.e. your request has to travel through more internet, reaching more potential congestion). Choosing to offload your DNS to someone else is a risk since you can be redirected unknowingly if your DNS of choice is hijacked.
Luckily, Google has a worldwide presence with great peering arrangements, and are in a great position to offer free DNS. And they do, with grand promises of reliability, speed, security. I’ll trust them on that one. They even have memorable addresses – 220.127.116.11 and 18.104.22.168. If you’re stuck on a network, you know the IP and gateway but have no DNS – give these a go!
A quick test of them …
… shows perfect results. My VoIP box came right up when I changed my preferred DNS server to them as well …
Of course, you don’t want to use them if they’re quite a bit more distant and take longer to reply to DNS queries … but luckily they’re just around the corner (ping-wise)! (Note, in the traceroute, the first two hops are internal to my network!)
In fact, my network gear is very old and antiquated – the ADSL2+ modem is a D-Link DSL-2740B wireless modem-router which was thrown out by someone else. It has faulty wireless, and its NAT table is easily filled causing crashing at even very few connections necessitating hard resets. No problem, disable the wireless, piggyback it to my old $30 (5 years ago) TP-Link TL-WR641G wireless G router as my wireless and routing gateway. Just set up an “intranet” in 10.0.x.x between the modem and the router, DMZ the router (thus no longer relying on the D-Link NAT and it’s crappiness), and set the TP-Link to go do all the port forwards, uPnP, DHCP, etc. Instantly, we’ve gone from 200 connections = a crash to about 3000 connections = a crash.
It was done this way because we formerly ran on a Telstra Bigpond 2wire ADSL2+ modem which had its power supply fail, but it can’t handle routing (sanely) either, due to limited CPU and NAT table size. It didn’t fix the NAT table issue, because that firmware kept everything in the NAT table despite being DMZplus-ed, but at least this setup makes it easy to switch on a modem failure. I’m basically squeezing everything from what is essentially unwanted equipment.
So, I’m actually quite a fan of my router’s DNS caching/relay services. And I want my internal lookups to work. And I don’t want to reconfigure every device on the network.
The solution is obvious – configure the router to manually use 22.214.171.124 as primary, and TPG’s DNS as secondary. Then whenever a request comes in from a client on the network, if it’s not cached, it will be forwarded to Google first – and I trust them for now. It also means less loading on TPG’s (likely) overstressed or misconfigured DNS.
Some networks may actually block the use of other DNSes, as they are a potential way of evading data quota arrangements (think of the project called Iodine), but at least this works with TPG for now.