Everything You've Been Told About Data Retention Is Wrong

No, your online activity is not being monitored. Data retention does not work that way, goodnight.

Hello, my name is Lance E. McDonald. I spend most of my time on twitter yelling about computers, anime, and video games, but I actually get paid to create and implement software solutions at an internet provider in Australia. The most recent project I had to spend time on was a script that scrapes through account logs and archives the information required to meet the government’s new data retention laws. You’ve probably heard a lot about these laws in the news lately, and I’m guessing almost the entirety of what you heard has been clickbait-fuelled trash. I thought I’d show you what the data actually looks like, and talk about how this whole thing works.

I’m going to just open with a photo of my own personal data that has been retained by my internet provider over the past 10 days. I have accessed thousands of websites, downloaded about 30GB of data and basically used the internet like a typical five-person family would over a ten day period. I have manually rebooted my modem once in this time, while writing this piece. This is the entirety of the data that the government has access to about my usage over the past ten days:

(Disclosure: As well as removing identifying information, the IP address field and the “data volume” field have been removed from this screenshot. Data volume shows how much data measured in bytes I downloaded in the past 10 days; it was around 30GB).

Does this look a bit low detail compared to what you would expect? There’s nothing here other than “Lance turned on his modem, and then turned it off 10 days later” and the next line is “Lance turned his modem back on a few seconds later.” But this is actually what the attorney general’s guidelines describe the data as being expected to look like for internet providers. Data items should be “hours to several days, weeks, or longer apart”.

I’ve seen lists online with titles like, “Here’s what you need to do to avoid the new data retention legislation” consisting of VPN services, recommendations to use Tor, and a bunch of other arbitrarily selected pieces of advice that have zero impact on the data that is actually being retained. I’ve even seen a few anti-virus companies leveraging the public fear to try to sell some kind of encryption services. Perhaps if all the garbage being spread about your ISP recording what you do online were actually true, then sure, using a VPN would definitely hide that. But using a VPN to avoid the new data retention laws is like tinting your car windows to stop speed-cameras from hearing conversations inside your car: cars don’t work that way, conversations don’t work that way, and speed cameras don’t work that way; you’re not even close.

Recently we’ve been flooded by popular news reports making claims such as, “The government can tell you’ve been using Facebook Messenger, they just can’t read your conversations”. This simply isn’t true; your internet provider isn’t retaining anything about what services you use online as this isn’t part of the legislation. The ISP will only retain data about services they directly provide to you: they’re providing you a link to the internet, so they need to record the time that link was connected, and then the time it was disconnected. No data about what you’re using that link for is retained, no metadata, nothing.

The government’s new data retention laws require internet providers to remember, for two years, what IP address is assigned to a customer every time that customer’s modem is turned on, and what times that same IP address is released from the customer when their modem is turned off again. Also recorded is the location of whatever radio tower/telephone exchange/fibre node to which your modem is actually connected. If you’ve ever looked at a Telstra Detailed Bill, you can see how your internet sessions typically say what town you were in when you were using the internet on your phone. This shows the tower to which your phone’s modem was connected at the time.

The other aspects of telecommunication data retention revolve around telephone calls, SMS messaging, and email transmission. Not much is changing in regards to phone-calls and SMS; your provider will continue to keep a list of every phone number you call and how long you speak to those people, as has always been the case. The same goes for SMS, every time you send a message, the number to which you sent it is retained in a database. The only new requirement is that the data is now kept for two years. Previously it did not need to be retained, and providers only did so for billing purposes.

I will say this, though: email data retention is changing quite a lot, and is far more aggressive. The legislation hasn’t been completely clear on the matter, but it’s likely that it will be treated similarly to SMS, and every time you send an email your provider will record the transaction for two years, albeit discarding the body of the email. Please don’t use your internet provider’s email service if you have privacy concerns. Use Gmail or Outlook.com if you’re not using business class services already.

A huge part of the misconception about data retention equating to internet surveillance is the fact that the legislation requires that your telecommunications service provider retain data on “the destination of a communication”, and this is indeed one of the key data points being recorded by all service providers… except internet providers:

(I like the mysterious extra bracket before the question-mark at the start, professional.)

So, as is mentioned above, it’s worth taking a quick look at section 187A of the recently distributed Telecommunications (Interception and Access) Act 1979 where we can see that the intention has never been to perform surveillance.

For internet providers, the “destination of a communication” (which can be argued to mean “the websites you visit” or “people to whom you send messages”) is strictly not required to be monitored or retained. If an internet provider does choose to retain this information, that is their own prerogative, and the government would require a warrant to access that kind of information (again, this is if it was even being stored in the first place, as it is outside of this legislation). Most of this data is impossible to retain, though, as most communication services online now are encrypted with SSL, through which your provider can’t see.

The whole thing might bring to mind recent cases where end-users have downloaded copyrighted materials and the rights-holders have managed to subpoena customer information from the internet provider. How does this work? Well, rights-holders tend to hang out in public torrent swarms watching people seeding their intellectual property, and they take note of every IP address engaging in the illegal activity. Then they send annoying emails to the ISP who owns those IP addresses, insisting they forward email warnings to their customers.

Most ISPs put these emails in the trash, the logic being that if the rights-holder wants legal action they should be speaking to the police, not an internet provider. The rights-holders aren’t approaching internet providers and saying, “Tell us everyone who pirated our movie”, because the internet provider doesn’t retain data about what their customers do online; they’re saying, “We saw these people pirating our movie and we want you to tell them to stop.” As has always been the case, if you’re seen breaking the law, you’ll probably be identified. If you break the law but no one sees it happen, data retention won’t help anyone catch you (the moral grounds for pirating Game of Thrones are obviously a whole different kettle of fish).

Eventually, one rights-holder, someone to do with the movie Dallas Buyers Club, got sick of internet providers throwing their emails in the trash and took the providers to court. The court decided that, in this case, the rights-holder should be allowed to speak to the customers directly.

In the end, nothing much came of it. Things might be changing on this matter in the near future as providers will likely soon be required to send customer details directly to the rights-holder on a 3-strike system so the rights-holder can send a scary email directly to the customer. This is an unrelated legislation, though. And besides, you probably use private trackers anyway, don’t you?

Data retention is like the TAC/VicRoads knowing what your license plate is, and how long you’ve had that license plate, but not where or when you drive each day. If your licence plate is spotted at the scene of a crime, the police can ask VicRoads, “Who owned this licence plate on this day?” But the police can’t go to VicRoads and say, “Here is a list of illegal car crimes, please tell me every driver who did these crimes in the past two years.” It’s just not possible to catalogue or index the data that way. The police need to find the crime, then VicRoads can help identify the criminals. The information kept under the new legislation can’t be used to proactively fight crime, it can only be used to react to a crime after it’s already been done, and as long as the crime was witnessed by someone, or captured in a server log somewhere.

So what’s the point of this data that’s being retained? Does it have anything to do with terrorism? Probably not. In my experience, the data is only used in child pornography cases. Typically the process goes that the police will raid an illegal pornography server and get physical access to the machine. Inside the machine, they find a list of every IP address that has ever connected to it, thus they have a list of every IP address that committed the crime of accessing that pornography server. The police contact the internet providers that own those IP addresses, and the internet providers look in their data retention logs to see which customers were assigned those IP addresses at those times. The internet provider then hands that list of customers to the police.

This actually happens, and has been happening for years. Most internet providers have already been retaining this data the whole time.

You might have heard that a number of internet providers have been granted an 18-month extension on their data retention obligations. This is typically due to the bureaucratic process more than anything else. The majority of internet providers already met their data retention obligations years ago, and now we’re just seeing the government finally put a strict rule set on exactly how this is meant to be done.

It can be very exciting to imagine that the world works in a way where the government is some malevolent, all-powerful force capable of seeing and attempting to control what you do. But the internet is still primarily outside the government’s reach, despite what rival political parties will pin on each other or what the media will say to trick you into clicking on their ads. Even your provider doesn’t have the technology to control what you do with the internet. When was that internet filter coming, again? Was it six months ago, or seven years ago? There’s been a few now, hasn’t there? The government doesn’t understand the internet and is doing enough terrible things every day that we don’t have to make up any extra stuff.

And please stop saying “metadata”, this isn’t CSI: Cyber.

You can follow Lance E. McDonald on Twitter here.


    I can't believe the government is stealing all my metadata to arrest me for stuff they don't understand isn't illegal. All to try and provoke me into buying Foxtel at 90 bucks a month.

    If you read the story, what its saying is that the Government ISNT collecting all that data. Believe it or not, thats your choice, but when one of the programmers is showing the outcomes, at least some evidence is being given.

    This story popped up on one of the sister sites earlier, and for it it comes down to how many related databases link to this info. Is this a primary key (the session ID) with a whole bunch of data (eg websites, P2P IP's) being collected on secondary tables, or is this it?

    If this is all thats being collected, theres nothing to be worried about. It just says when your modem is turned off and on, and thats it. If you do any illegal stuff its going to be hung on you with other info.

    Like IP farming from seeding torrents.

    Even your provider doesn’t have the technology to control what you do with the internet.Is this in relation to Australian ISPs or ISPs in general because they very much do have the power to control what you do with the internet if they should so choose. This is why there is a great Firewall of China and a hot debate about Net Neutrality as well as the other filtering that goes on internationally. There have been several times where black holes have opened up in the internet because someone either misconfigured a server and blocked traffic being routed through it or traffic was being routed to the wrong servers resulting in that country's filtering being applied to the traffic.

      ...and we definitely *should* be blackholing all IP blocks belonging to China as our default ISP configuration, that's for sure, until China agrees to police their hacker community to the same standards that such activities would be policed here.

    Now show us the data retention records for your mobile phone's data service.

    I suspect you'll see a much larger number of shorter data sessions as your phone either temporarily loses service, or switches from cellular data to wifi and back when you are at home/work/etc.

    Add in the coarse location data from the cell towers at the start and end of each session and you can see roughly where the person has travelled for the last two years.

      Yes, as has been the case since the birth of Cell technology.
      Maybe you can point to some concrete examples of any negatives associated with Australian law enforcement having had access to people's Cell registrations over the past 20 years?

    In isolation, the data collected by ISPs could be nothing. In combination with data collected by other organisations, it could mean something. That's why DBC wants to know the owners of certain IPs at certain times; they can then identify who pirated their movie.

      and pirating movies is illegal, right? All power to this legislation then, if it helps identify people engaging in online crime.

        Well, "illegal" and "crime" are really the wrong words to use here. It is not a criminal offense (which is the usual meaning of the word "crime"). And also not "illegal" in the common meaning of that word ("contrary to or forbidden by law, especially criminal law").

    So this article seems focused on retail ISPs. What about wholesale ISPs who provide backbone access?

      Their contract is with the retailers, not you. If they're collecting data on you, thats gonna be a breach of privacy. A pretty big one.

        The wholesale ISPs do not have the data to link a particular customer to what they collect. That's what retail ISPs must do (as described above).

        I haven't read the whole legislation, so I'm speculating on what is possible, but a wholesaler could log that this IP connected to this IP. Or this IP requested this DNS.

          It wouldnt do anything. As they arent the retailer to the end user, I doubt they would need to even retain that level of information, but if they did, all it would do would point anyone to the final retailer, which is commonly available information anyway.

          When I check my work IP out, it shows the organisation (who are big enough they're effectively an ISP), not the wholesaler, and thats where the checking would start. Anyone getting a match of this IP against anything dodgy is going to show them as the first point of contact, not the wholesaler.

          This legislation isnt about monitoring users at an ISP level and punishing them, thats not their job. Its really just about who's using an IP at any given time so when third party data becomes available (servers, downloading claims), theres a record.

          The ISP doesnt need to know every website you visit, but they do need to know that your modem was connected to IP xxx.xxx.xxx.xxx at any given time. Thats the point of the article, its just showing the modem identifier of the IP at any given time.

    Do people turn their modems off?

      Yes, occasionally. maybe once or twice a month if the ADSL seems to be more sucky than usual. Interestingly, when I signed up for getflix.com.au I realised that Bigpond were changing my IP every 10 minutes (or sooner) so I had to set up a cron job to poll the getflix website so it had my external IP and knew I was a subscriber.

      If we take the example given above, it only details the IP that the modem was assigned when it first connected, what happens when the IP changes and the modem hasn't been rebooted? I'm assuming the author has a fixed IP address, but what happens to someone on a dynamic IP address? My modem could have literally 100s of changes of IP address between reboots.

        The data isn't storing modem reboots. It's storing PPP sessions. Any time your modem negotiates a new session with the ISP, it is logged.
        When you reboot your modem, a new session is initiated, hence why it appears in the authors article. If you receive a new IP, your modem is actually negotiating a new session, so it will be logged.
        [As an aside, if you are getting a new IP every 10 minutes, I'd say that it's because your session has dropped out due to high noise on the line. If you have access to the configuration options, I'd recommend reducing your speed profile. A higher DSL sync speed usually leaves a smaller noise margin, which can result in frequent dropouts on noisier lines.]

    Head over to Delimiter, there is an article there regarding this piece. gizmodo-comes-to-false-conclusions-about-data-retention/

      I've read the article at Delimiter now, and it frankly sheds more heat than light.

      The issue it raises is essentially that your IP address as stored by the ISP can be cross-correlated with logs from the organisations providing services to make deeper inferences about your activities.

      However, to do so does actually require the organisation in question to have access to the logs on the server that you are accessing. Delimiter makes the reasonable point that the big sites like Google are generally happy to provide such data to law enforcement on request. It assumes that server logs will be accessible more or less on request, and that's a pretty strong assumption.

      For example, I could easily set up an Amazon EC2 instance as a web server (in fact I have two at the moment) and setting up web farms by way of such virtualisation farms is increasingly common. Once the VM is destroyed, any logs that it held also disappear unless the site owner chooses to retain them.

      Now, there was an FAQ released concerning the legislation earlier this year that suggested that OTT (over The Top) data would need to be retained by ISPs. This appears to no longer be the case, and as such the possibility of law enforcement fishing expeditions by way of checking only one set of logs is enormously reduced. They would need to cross-correlate two sets of logs, and one of those sets may not be accessible.

      That's not to say that there should be no concern, and running via a VPN is definitely safer, but this article is not as outrageously incorrect as the Delimiter article would have you believe.

    I've had a morbid fascination to check out the comments over at the Gizmodo version of this article... and then I realised I value my brain cells more over reading arguments defending this piece =P

Join the discussion!

Trending Stories Right Now