Hours after The Intercept dropped a leaked report detailing Russian efforts to hack the 2016 election, federal government investigators announced the arrest of the suspected leaker, 25-year-old NSA employee Reality Leigh Winner. According to a Department of Justice affidavit, Winner left a trail of physical and digital evidence identifying her as the leaker.
After reading the affidavit, Ted Han, director of technology at DocumentCloud, became curious about what incriminating information could have been hidden in the document. Zooming in on the PDF, he found what he was looking for — tiny yellow dots on the pages.
“Microdots basically provide a smoking gun,” Han said. “It sure makes it really easy to look up specifically what print job that document came from.” The pattern of dots, when looked up through a guide posted by the Electronic Frontier Foundation, reveal the exact date, time, and printer of the document.
It might be surprising to learn that most documents printed by color printers contain these tracking dots. (You can look up your own printer on the Electronic Frontier Foundation’s list, which was started more than decade ago.)
But it’s not just printers that embed hidden information into the media we make and share. Metadata — or “data about data”, like timestamps and location tags — is everywhere.
Harlo Holmes, director of newsroom digital security at the Freedom of the Press Foundation, says metadata has always been around and goes unnoticed.
“Throughout the history of media, there have always been examples of programming those technologies with the capability of generating metadata that lends to attribution,” Holmes said.
Here are five examples of metadata breadcrumbs our everyday actions leave behind.
Photo metadata — also called exchangeable image file format, or EXIF data — is rich in information that can easily identify the photographer.
In 2012, Vice accidentally revealed the location of millionaire John McAfee, who was hiding from the Belize government, by failing to remove geolocation metadata from photos.
Metadata is attached to each and every photograph we take, including the exact time and date. If the camera or phone has GPS enabled, as nearly every smartphone does, it will also record an exact location. The location is not a simple a city and state, but a specific latitude and longitude. It also can provide a unique identifier or serial number of the device that was used to take it.
“Kind of like how Polaroids have their image part, and then they have this thick frame that actually contains a lot of the chemical substrate of the actual image that you see,” Holmes said. “Enough metadata can reveal the who, what, when, where, why and how of any particular image.”
And that metadata stays with the photo, unless you use software that specifically scrubs it — regardless of whether you embed it in a PDF, insert it in a Word document, or text it to a friend.
The single most condemning thing that might have led to Winner’s arrest? Using a personal email to contact the Intercept.
Email headers — the lines of text that accompany every message to help its delivery — are a sort of metadata that can be very revealing.
In the email apps we use, the only lines of the header we usually see are “to” and “from” fields, but that’s not all that’s being included. Depending on the email client being used, the local IP address and the IP address of your internet connection are also conveyed.
“You should just assume when you send an email that the IP address of your computer is being sent, which can be used to identify you,” said Jonathan Rudenberg, who researches cell phone metadata and location privacy. “It can be used to correlate things, to get an approximate location, down to the city you’re in.”
In short, think of email like snail mail — with a coded return address.
A survey last fall found that more than half of online shoppers begin their search on Amazon.
That’s good news for online advertisers, but bad news for those who want to stay incognito across the web.
When a shopper goes to Amazon and looks for a product, cookies (not the sweet kind) can be used to log that information across several websites.
That means advertisements for the exact pair of shoes you were looking for on Amazon will now show up in ads on the front page of The New York Times. This kind of method is called behavioral retargeting.
“Advertising tracking is one of the most pervasive forms of tracking,” said Rudenberg.
Installing an ad blocker can prevent this sort of tracking, but, without it, companies are able to build extensive profiles of a person’s activity across several sites.
Demand for encrypted messages and calls has increased in the recent political climate — but it’s important to keep metadata about those conversations private, too.
Martin Shelton, a security user researcher who specializes in working with at-risk groups, says this is often overlooked in debates about encrypting and securing conversations.
Even without wiretapping, or listening into conversations, call records can be just as sensitive and revealing as the contents of the calls themselves.
For example, in 2013, the Department of Justice seized two months’ of phone records from 20 reporters for the Associated Press.
“The Department of Justice wasn’t listening into phone calls, but they were accessing information about who was calling who, when, and for how long, and this is super revealing information in the context of a sensitive source,” Shelton said.
In cases where sources may be in danger, simply revealing that contact was made with a journalist can be compromising. Call records can also reveal patterns of communication between individuals that may not want to be publicly associated.
Similarly, even if a message sent on an app is fully and securely encrypted, metadata about a message, such as to whom and where it was sent, can provide a picture of your activities.
The messaging app Whatsapp rolled out end-to-end encryption in April 2016, which is good news for keeping the content of conversations secure. But it doesn’t hide the user’s metadata.
Because of data-sharing policies, this means that WhatsApp’s parent company, Facebook, has access to revealing metadata, including the user’s phone number, contact list and usage data — when WhatsApp was last opened and what kind of device it was used on.
Free Wi-Fi is a major draw to cafes, restaurants, and parks. But did you know that connecting to Wi-Fi can also be used to track your movements and activities?
When Wi-Fi is enabled on a smartphone, it is constantly scanning for devices to connect to, sending something called a media access control (MAC) address associated with the device you’re using as part of the process.
Traditionally, those addresses would stay constant for each device — and lead to an easy way for others to track your whereabouts.
“Everytime you go to the mall, for example, your phone is looking for a Wi-Fi access point, it would broadcast that serial number,” Rudenberg said. “And if there was a tracking device installed, it would be able to see that you were visiting the mall every weekend.”
In recent years, randomization has been used to help prevent tracking — meaning that MAC addresses sent out by a device are constantly switched up when you’re trying to connect to Wi-Fi.
Still, in 2017, only 6 percent of Android phones were found to properly randomize MAC addresses. IPhones handle randomization correctly but updates in iOS 10 make phones easy to identify nonetheless.
So, when a device does end up connecting to a Wi-Fi hotspot, the unique MAC address of your phone or laptop is revealed.
In short, it’s still very easy for a device to be tracked when it’s on a public Wi-Fi network.
“Wi-Fi was not designed with privacy in mind,” Rudenberg said.
For the average consumer, metadata is harmless — meant to be useful, not intrusive.
It helps sort photos into tidy albums on your cell phone and shows when your text message was sent and delivered to a friend.
But when individuals have a need to stay anonymous, or governments and corporations push privacy limits, it can be dangerously revealing.
“At this point the only thing that can happen is education,” Rudenberg said. “And we can also build technology to be better.”
For example, the Signal messaging app strips metadata from photos sent in the app, as well as fully encrypting conversations.
Shelton, however, emphasizes that it is incredibly hard for anyone to remain anonymous.
“I try not to think of it in terms of untraceability, traceability,” Shelton said. “But rather making it a lot more difficult, or prohibitively expensive for an attacker or a network monitor, or an investigator, whoever might be trying to figure out the source of the leak, to conduct their investigation.”
Winner failed to follow the Intercept’s guidelines for leakers, emailing the news outlet from a personal email account earlier this year.
“Obviously in this case she could have done a lot better, but even people who are deliberately trying and are being much more calculating about these things, it’s easy to slip up,” Shelton said.
The telltale giveaway in this case?
A crease in the leaked document, that tipped off to the NSA that the papers had been physically printed. Government records showed that only six individuals in the agency had printed the document.
Even in our increasingly digital world, the most damning evidence can be man-made.
The story you just read is not locked behind a paywall because listeners and readers like you generously support our nonprofit newsroom. If you’ve been thinking about making a donation, this is the best time to do it. Your support will get our fundraiser off to a solid start and help keep our newsroom on strong footing. If you believe in our work, will you give today? We need your help now more than ever!