Amazon, Facebook, Google, Microsoft, and Apple move more money than many medium-sized nations.
Their extraordinary profits are won through extraordinary reach—this is not a secret. That a few companies are afforded unprecedented and shamefully unregulated access into our homes is now an unremarkable fact of living with tiny computers everywhere.
When Gizmodo reporter Kashmir Hill, or Kash, as I call her, approached me about her desire to rid herself of these companies, I was excited. As consumers, we are afforded only a few avenues of acceptable dissent — the most reasonable of which is that, if you don’t like what a company is doing, you can move your money and data elsewhere.
But increasingly this option is unavailable to us. The tech giants are so thoroughly woven into our lives that it’s difficult to even spot. This experiment was an opportunity to measure the reach of these companies and foreground the ways the world has become organised around them.
What I’m going to describe is how we collected and analysed data. I have also included links to scripts for macOS and OS X that will build firewall rules for your device so that you too can live a tech-giant free existence—to the extent that such a thing is even possible while remaining online.
A caveat we’ve offered before at Gizmodo: This set-up was designed to work for us internally, so it is by no means the best or only way to do this. But hopefully it will give you some insights and starter code on how to approach this problem yourself.
Instead of monitoring and blocking the network traffic from all of Kash’s devices independently, it made sense for us to connect all of her devices to a central VPN that I could control. To do so, we purchased a droplet on DigitalOcean and installed an OpenVPN server on it. I used this handy tutorial from DigitalOcean.
We then installed an OpenVPN client on all of Kash’s devices. For her Macbook we used software called Tunnelblick. Tunnelblick is a free, open source interface for OpenVPN for OS X and macOS. It is a plug-and-play application with all necessary binaries and drivers.
For her iPhone, we used OpenVPN’s own iOS client. For Kash’s smart home, we installed an OpenVPN client on her Raspberry Pi router and then connected her various Internet of Things devices to that. Here is a tutorial.
With all of her devices routing their network traffic through our VPN, we were able to capture her data flow using TCPDUMP on the tun0 interface set up by OpenVPN. All of our analysis relies on information about the destination network of each packet filtered through tun0.
In our initial analysis, we wanted a basic understanding of how much data was flowing to a tech giant during a specific behaviour, so I built network monitoring software to allow Kash and I to independently conduct experiments to gather data. For instance, when Kash wanted to go on a run, she would direct the software to begin monitoring her network traffic and then assign the data-capture a label. This is how we linked network activity to specific behaviour. The software then used WHOIS lookups to categorise where each packet was headed.
“Whois” is a widely used internet record listing that identifies who has registered a particular domain name or IP address. To run a whois lookup you can simply open up Terminal and type in ‘whois 220.127.116.11’ (Gizmodo’s IP address).
WHOIS will provide you with a plethora of useful information about a given network. For our purposes, we were looking at the OrgName field. In this case of Gizmodo, we can see that Gizmodo uses a Fastly CDN to host their content.
By running WHOIS lookups on all of the destination IP addresses from packets captured by TCPDUMP, we were able to come to an understanding about the various services that any given apps rely on. For instance, the graph below, generated by our software, depicts the breakdown of where data was sent at each minute of an Uber ride:
And this graph represents what network traffic is generated by repeatedly plugging my iPhone into its charger:
Counting and Blocking
The next step was to actually block outgoing traffic to each tech giant. To do that we first needed to identify the various IP networks that each company operates. Internet infrastructure relies on a certain level of transparency in order for data to be routed appropriately through the multitudes of networks that comprise it.
As such, we were able to utilise the public Autonomous System Numbers of the various tech giants to identify their IP networks.
You can see this in action if you run a whois query like: whois -h whois.radb.net — ‘-i origin AS32934 | grep ^route. In this example, AS32934 is one of the Autonomous Systems belonging to Facebook.
Armed with a means to categorise IP addresses, we crafted firewall rules on our VPN to drop packets associated with the five tech giants. A firewall rule specifies criteria for how your computer should handle internet packets. For example, if your VPN spots data travelling to 18.104.22.168 on port 5222, our packet filter would recognise it as traffic to WhatsApp, a Facebook-owned company, and drop it.
To make it easy for you to thwart the tech giants yourself, you can check out this Github repository. (Yes, Github is owned by Microsoft, one of the five companies we blocked.) The code is meant for macOS and OS X and relies on its packet filter—a program succinctly called PF. PF has been shipped with Mac OS X since Lion. There are some quirks to getting it running; they are detailed in the “installation” portion of the readme.
When designing our system, we did not consider how the prevalence of content delivery networks, or CDNs, would affect our blockade. Many websites and apps are not actually sent to your browser directly from their hosting provider. Instead, often times there is a middle-man, a CDN, that acts as a buffer between your browser and the company’s servers.
The reason for this is speed and security. A CDN will store versions of a company’s content in multiple geographical locations in order to deliver it to the end user faster. If you think of the internet as a bunch of wires, instead of as a kind of omnipotent cloud-like thing, the reason for this is quite obvious: the closer you are to your content, physically, the faster you will get it.
For our purposes, what this meant was that when we were blocking the companies that do web hosting as a service — such as Amazon, Google, and Microsoft — websites they host would evade our firewall if they used a CDN from a third party because it didn’t look like the website was being sent from a tech giant. That’s how Airbnb and Gizmodo itself, which are both hosted by AWS, broke through our Amazon blockade.
If you really want to be certain that you have properly thwarted a tech giant, you will need to block the IP addresses of all major CDNs. The downside of this strategy is that there will be a lot of false positives. You will block innocent services whose only known crime is their usage of a CDN. The upside is that your web browser will be mostly useless. For the brave souls that want to go down this route, you can run the code in the aforementioned repository with the ‘—fascist’ flag.
Dhruv Mehrotra is an activist and engineer who thinks about networks, power, and policy. His work on the Goodbye Big Five series was supported by a grant from the Eyebeam Center for the Future of Journalism.