What's in a Browser

I was recently asked to provide some instructions on how one could easily monitor ones own web traffic. In the past I have written about how to use my one true love, wireshark, for testing encryption on the Commotion project. But, a packet to packet stream of data not easy to grok for those new to the field. So, this short piece is on how to use the chrome browser, and a single extension, to get an idea of what traffic you are sending to the internet. If you follow along you will learn the basics of how to see what data the websites you visit are requesting, and where they are sending it too. We will focus on how to best search for the data you are looking at, so that you are better prepared to start your own excursions into the internals of the internet.

Before you run off and download chrome and the extension I am about to show you. I want you to take part in an experiment to see just how unique you are on the internet. Go to https://panopticlick.eff.org/ to see how unique your current browser setup is. When you click test me on this page you will be taken to a overview page. There is a lot of information on this page, but the most important piece is the first line. The more generic the browser, the more you blend in to the crowd. This page uses data your computer sends out to EVERY website it visits to tell you the uniqueness of your specific computers configuration. This includes the plug-in’s you have installed, the browser, computer, and language of your computer, among other things. I will not go in to depth about browser fingerprinting here. The article attached to the panopticlick page https://panopticlick.eff.org/static/browser-uniqueness.pdf is a great overview that I recommend you read if you are interested. Feel free to come back to this panoticlick site as you continue along this post. When we start looking at the data that your computer sends to websites you should try to find where this website is getting all its data from.

Now, if you are not already using it, download the chrome browser and we can begin. You are going to start by installing Ghostery. https://chrome.google.com/webstore/detail/ghostery/mlomiejdfkolichcflejclcbmpeaniij This application is a Tracker Blocker that alerts you to well known data collection sites and blocks your browser from sending them any data. There are a few tools for doing this, and I don’t recommend Ghostery over any others, but the way it displays alerts is going to be useful for our purposes today.

Make sure to enable Alert Bubbles and all of the possible trackers when you set up ghostery. This will allow you to see the full extent of partner sites that are tracking you.

You will notice, that if you load up seamustuohy.com with Ghostery enabled it does not show any trackers. That is because I don’t care who comes to my site or how much traffic it gets. Sites that rely on add revenue, or heavily use analytics to track what content users like rely on various trackers to do the work of collecting that data. We are going to buzzfeed (which I absolutely do not recommend at all for any purpose) to see trackers in action.

When I went to BuzzFeed Ghostery showed me a pop up with 10 different trackers listed. Clicking on the blue ghost in the upper right hand corner will allow you to explore each of the trackers to get an idea of the purpose of the tracker, and what data collected. Clicking on the crossed out link below a trackers name will show you where the data it sent leads to.

I chose Audience Amplify at random for the rest of this post. The data it sent went on a journey from the ads section of audienceamplify.com to afnxs.com which is the advertising data company AppNexus, and ends at Facebook. Even though BuzzFeed may not directly send your data to facebook, though a few partnering agreements and some redirected packets facebook is being sent some data. And you didn’t even like anything yet. We don’t know what data Facebook is receiving because we have not looked into this packet. And, because a full packet analysis would be dreadfully long we will not cover the full purpose of this packet in this post.

Lets inspect that packet now to see what actually got sent. First, turn off the filtering of Audience Amplify on Buzzfeed by using the menu in the blue ghost. There will be a toggle next to Audience Amplify that you want to turn green. Second, click on the menu (three stacked horizontal bars) next to the ghost and go to Tools –> JavaScript Console This will open up a menu on the bottom of the screen. This menu allows you to see what Chrome is doing behind the scenes.

We are interested in the data sent over the network. So, we are going to click on the Network tab on the top of this menu. On the upper left hand side of this menu is another set of icons. Click on the Filter icon to give us the option to filter our traffic.

Now that we have our inspection tools set up, lets refresh this page and find the tracker from Audience Amplify. Once you have refreshed you will see the Network section fill up with items. These are all the photos, html, cookies, scripts, etc. that this page uses. Type in audience into the search bar at the top of the javascript console. This will filter out all of the data except for packets that have the word audience in them. If you click on one of the pieces of data the view that opens to the right is the actual contents of the data that was sent.

We are looking at one piece of data among dozens that were sent to and from this site. Many of the 10 ‘malicious’ data streams that Ghostery blocked have multiple pieces of data that they blocked. Each, of these has a bevy of information within it. Because there is so much variation in the traffic you send on the internet. We are going to explore how to search for a specific value within a packet so that you can explore more on your own. For this example I am going to pick a well documented one that has a good amount of history, and is rarely used anymore. We are going to discover what this value is and how it works. Down under Response Headers there is a value called P3P. The next 900ish words of this article are only exploring this one line. This is why simple explorations of internet traffic are so hard to come by. Looking at P3P there are two sets of values: policyref and CP. I can identify the names by the fact that they use an equal sign to say that the value is equal to the text that follows. I can tell there are two separate values because they use commas to separate themselves. This is how the chrome network viewer separates these values, so it should be common across other values you look at.

Identifying each value means is the meat of understanding your network traffic. Search engines have made this process far easier. While I would usually not suggest the use of Google for searching, they currently have the best results for more obscure technical content. Because every piece of data counts when you are tacking it every piece of traffic it is common to use acronyms like P3P to label values. Since an acronym can be used in dozens of fields it is always a good idea to put it in context of networking and the header section. P3P is found in the Response Header section. Searching for Response Header P3P hits the jackpot. I not only get a wikipedia article about P3P, the second link is a concise background from some company selling P3P headers as a service. While I am work in the non-profit tech world there are some beautiful concise explanations of complex technical topics to be had in sales pitches. Using these two pages I can tell that P3P is data is used to show a websites data management practices.

To identify the specific values we saw I will go back to our search and tack the name of one of them on to our query “Response Header P3P policyref.” Wow! The first link is a W3C internet-draft and the second is their The Platform for Privacy Preferences 1.0 (P3P1.0) Specification. A brief aside about these types of documents. In your time looking at what your browsers network traffic you are going to come across documents put out by The World Wide Web Consortium (W3C). This is the group of experts who develop the standards for the web. It is these open standards that allow the internet to work even while it is distributed across millions of devices. These standards start as “draft” like our first link, and then become “standard” for the web in a process that makes the UN look efficient.

I am going to use the P3P Specification to find out what the values we found are because that is the final document. I am going to start with the policyref field. One of the great things about these specification documents is that they are usually all on one page. This will let us use (Ctrl-F) to open the word-search tool in Chrome and search for policyref. This search takes me to Section 2.2.2 HTTP Headers. The search also highlights all instances of the work policyref. Looking through the sentences that have a highlighted policyref I see the basic definition. A policyref is a URI which specifies the location of a policy reference file which may reference the P3P policy covering the document that pointed to the reference file, and possibly others as well. Beyond the legalize it sounds like it is a file with the reference to a sites policy document. When I go to the link that was in the policyref field in a new window I get taken to an XML tree that chrome styles into a fold-able list for me. Looking over the content in the document it looks like contact information for the company in case you want to opt-out or have some other dispute with them about this service.

Now that we know what policyref is for I am going to search for the other, more cryptic value we found. I am going to continue to use the W3C P3P specification we found in our search. Searching for CP I first get sent to one instance of the work TCP/IP and then after continuing find myself in section 4.1 Referencing compact policies. Looking downward in the section I saw section 4.2 had a key that contained all of the symbols listed in the CP value from our network traffic. The string we found contained the following values (NOI DSP COR ADM PSAo PSDo OURo SAMo UNRo OTRo BUS COM NAV DEM STA PRE). I found the first value of NOI in section 4.2.1 Compact access. According to that section this value means that compact-access exists for nonident. I have no idea what that means. So, I type Ctrl-f and type in nonident. After clicking through a few different uses of nonident that don’t seem to contain any definition I found section 3.2.5 The Access element. Since we were looking at compact *access** it only makes sense that the full *access section would contain the values I need. Through this I found out that nonident is just shorthand for a website that does not collect identified data. Well, this audienceamplify data is looking better already. I continue searching for the next element in the same way. I find that DSP it is the shorthand for the start of the dispute resolution section. The next item, COR, specifies that I can use their dispute resolution service to remedy wrongful actions and errors. The rest of the values can also be found using the same steps.

I chose a well documented item to explore in this post. Sadly, not every header or piece of data is written about in an open standard. You will encounter vague shorthanded items with seeming gibberish as their values as you explore more network traffic. Searching for these online will open up a world of other internet spelunkers dissecting and explaining these strange glyphs in an attempt to understand their traffic. Monitoring ones own internet traffic is an exercise in deciphering shorthand and distilling intent from it. This act will teach you more about the structure of the internet than you could believe. There are other methods of looking at your internet traffic. Many of them, like wireshark, require you to install applications that will monitor all the data your computer is sending to the world. I hope this short dive into how to explore your browser traffic leads you down a rabbit hole you enjoy. I sure do.