The NSA is somewhat infamous for collecting “just metadata”. Supposedly there wasn’t any personal information and no calls or texts were examined. A recent study shows that this is still cause for serious worry.
Sort of the classic example of this is signals intelligence used by the military. For example, if one radio sources sends out a message and within 2 hours of that message, 5 military units change their movements… it’s a good guess that that first station has a high ranking officer nearby.
First of all, metadata are the things that relate to the data. it may describe the data or be an underlying definition of the data. It makes working with data easier. Instead of looking for a single thing out of field of millions, you can filter by the metadata. Think of all the pictures on your hard drive. To find a particular picture would be a pain if you had to look at each one individually. But if you know the date (even in a range of months or a year), if you have sorted the pictures into bins for birthdays and camping trips, etc. Or if you even know the size of the picture (a picture taken with a 20 megapixel camera would tend to have a larger file size than one taken with a 5 megapixel camera), you can reduce the amount of data you have to search through.
The same is true of cell phones and credit cards. Cell phone metadata includes things like the time of the call, the number called, duration of the call, and the cell towers used in a call. Credit card metadata may include the type of the store, the purchase price, transaction time and date, etc.
The recent research suggests that information is more than enough to tag specific people with specific cards (or phones) given just a few more pieces of information… the type of information that governments have, or can get, access to easily.
The authors acquired a large sample of credit card transactions from a bank. The data set was from a single country and contained 1.1 million individuals’ transactions over a three month period. There was no personally identifying information in the data set. Still, it was pretty easy to identify people.
For example, let’s say that we are searching for Scott in a simply anonymized credit card data set (Fig. 1). We know two points about Scott: he went to the bakery on 23 September and to the restaurant on 24 September. Searching through the data set reveals that there is one and only one person in the entire data set who went to these two places on these two days. |S(Ip)| is thus equal to 1, Scott is reidentified, and we now know all of his other transactions, such as the fact that he went shopping for shoes and groceries on 23 September, and how much he spent.
The results are pretty frightening. By knowing just 4 pieces of outside information, 98% of all individuals could be identified and tracked over the three month period.
Removing the direct price data and using only a range of price data only lowered the percentage of identifiable people to 90% for 4 pieces of data.
Four pieces of data are pretty easy to come by for most people. Think about how often you post on facebook where you are at. Use that and a data set with no identifiers, it’s likely that you could be identified and tracked from then on… by someone who was interested in doing so.
So yes, metadata is vitally important and can be used to determine and track individuals. My favorite example of this comes from Target being able to identify pregnant women and target (get it?) them with coupons. A man came into the store complaining that his high school age daughter was getting coupons for cribs and diapers. She hadn’t told him she was pregnant yet.
Netflix tracks the movies we watch. Amazon tracks my purchases. Both to make recommendations that I might like. Target and a couple of other places know my birthday. Google knows where I am and alerts me to traffic problems. My car sends me a text to let me know it needs an oil change.
It’s the trade-off we have to live with for being in a connected world. I like getting a coupon from my pizza place every Friday… they know I will be ordering pizza over the weekend. I like being able to pull up their website, click “my favorites” and get what I want delivered. I like being able to ask my phone to plot me a route that avoids traffic.
I just hope that businesses start to see the benefits in basic security and use it.