So after the discoveries made in earlier posts, I decided to make a bit of an investigation into the data itself. I chose M50, towards UNSW as the candidate route for investigation. So I scripted wget for a 30-second poll over a while 24-hour period, wrote a crappy C-based parser (still not 100% perfect) to extract time and offsets.
I imported this data into Excel, and used it to plot the maximum delay, minimum delay and average delay into a chart – click for full size.
- I selected the longest and latest transmitted row for each bus run number – although, some rows showed slight inconsistencies with regard to delays for some reason or other.
- The runs did not have the same number of offset figures – which makes it hard to assess what is really going on – although many buses with 1 or 2 offsets only were probably not being tracked correctly and was discarded.
- One bus with over 100 minute delay was discarded.
- Delay of 60+ minutes may have been a “rescheduled” service – the data is a bit icky.
- Surprisingly, a handful of buses are early, but most of them are late up to 10 minutes.
- Some have a large range, suggesting something bad happened along the way while the bus was running.
Here’s another graph – this shows the offset trend for each service as a function of the stop-sequence of the bus – notice some have “creeping” increasing delays, and generally most are “wavy” around a mean … implying the route times are well-designed.
[To all readers: Apologies, mySQL on the Pi has been dying randomly, possibly due to RAM starvation when crawlers start hitting my site hard. As such, downtime on this site is rather unavoidable until I upgrade to something bigger and better.]