The other day, I showed just how different transport applications query for and receive their live train data. Of course, this data begged to be appropriated – so I set myself a little challenge to visualize it.
The first thing was to choose a data source. I eventually settled upon the source used by TransitTimes+ by Zervaas Enterprises. Part of the reason was its compact replies, and its time-stamping of the positioning data, and the overall ease of querying it.
Without the permission of the app developer, I did my best to query the server every 15-30 seconds and append the responses to a log file which was to be processed later. As a result, I have probably cost the developer a bit of server time and bandwidth (sorry!) but the least I can do is encourage you all to give TransitTimes+ a go, and see how you like it.
Anyway – the next step was to process the data. I won’t give away my code because it is nasty and it might encourage others to recklessly abuse the data sources – but the methodology in general was to:
- Extract the timestamp, tripID, ID, latitude, longitude into files named by tripID.
- Process these tripID files to remove duplicate position/time rows (as polling the server will result in multiple same replies) and limit the time range (to 24 hours in this case).
- Delete all tripID files with one row or less – they will not produce a trace.
- Further process the tripID files to produce a KML file by converting the data rows into gx:Track elements. This involves converting Unix time into <when> tag compatible format, and writing the latitude and longitude into the <gx:coord> tags. Each Placemark has to be given an ID – I chose to use the first four characters of the tripID which appeared unique. To improve the output, one can also look at the time steps between positions and break them into <gx:MultiTrack> tracks.
It sounds straightforward, but it took me about four hours to code it all (not being a seasoned programmer, using just straight plain C) and get it to work properly. But to people who might want to work this way just be aware that you will eat up SSD write cycles very quickly by the sheer amount of file open/append/close operations basic programming would entail. I used up about 2Tb of write cycles on my OCZ Vertex 3 doing this – but that’s not too bad.
By doing this, I managed to produce this video showing the network working in a 24 hour period.
[Best viewed in HD]
I find it rather elegant and hypnotic to watch certain parts of the network.
Some interesting things include the “jumps” in location in the data. While most trains follow the tracks, quite a few “jump” off the track to some other location and then “jump” back – maybe this indicates the data source had mixed another train’s data in from time to time, or when data supply is lost for a while, strange spurious data may occur. As far as I can tell, these jumps seem to be present in the recorded data (but I haven’t had the time to go through the file piece by piece to be sure) – so I don’t know where they are being introduced. Unfortunately, that spoils the illusion of sitting at the control room, watching trains go around …
The shape of the paths don’t exactly follow the rail line – this is due to the granularity of reporting time and the positions which are referenced to rail reference positions – likely to do with the signals.
While processing the data, it was also found that the strings of 4-digit numbers which seem to represent trains seem to rotate around – after a while, you can see numbers from 1001 all the way up to 9999. Maybe they don’t have a fixed relationship to carriages as once suggested – but I can’t be certain.
If anything, I’ve definitely annoyed the suppliers of live transit data enough – from now on, I promise, I will not be hitting their servers apart from using their apps. If I’ve caused any inconvenience – I’m deeply sorry.