Probably, the best place to start is with the 9/11 attacks. Two of the terrorists made reservations on American Airlines Flight 77. Their names were also on a CIA watch list. But, we didn’t connect those two pieces of information.
If we had, we could have identified their home addresses from the information they provided to the airlines. And, by a simple cross-check, we would’ve discovered that three other individuals associated with those addresses—one of them named Mohammed Atta—had also made flight reservations on September 11.
And if we cross-checked the call-back phone number that Atta gave the airline, we would’ve discovered that five other individuals had also provided the same phone number to reservation agents.
And—wait for it—had we looked in one more place in the airline database, we would’ve discovered the name of yet one more individual who used the same frequent flyer number as had one of the men on the CIA watch list. And then, if we’d branched out to public sources, we’d have found that two more individuals had the same address.
Finally, the remaining six individuals associated with the attacks have been identified through a routine review of the U.S. Immigration and Naturalization Service’s records—that is, the INS’s list of expired visa or illegal entry.
One terrorist was on that list, and five others had public records of having lived with him or among each other. And all, of course, shared the common characteristic of making reservations on flights for the morning of September 11.
In short, as a Department of Defense review committee concluded, with just seven clicks of the mouse through existing databases, all 19 terrorists could have been identified and linked to one another.
Two years later, the story of Ra’ed al-Banna—a Jordanian who attempted to enter the United States at Chicago’s O’Hare Airport on June 14, 2003—provided another powerful illustration of how big data might be used. And this time, it was a success.
Al-Banna was probably a clean skin—a terrorist with no known record. He was carrying a valid business visa in his Jordanian passport. He was pulled over from the main line of entrants at the airport and questioned.
His answers were inconsistent and evasive; so much so that the U.S. Customs and Border Protection officer denied his application for entry, and ordered him returned to his point of origin. As a matter of routine, al-Banna’s photograph and fingerprints were collected before he was sent on his way.
More than a year later, in February 2005, a car filled with explosives rolled into the crowd in the town of Hillah, Iraq. More than 125 people died. The suicide bomber’s hand and forearm were found chained to the steering wheel of the car. After U.S. forces took fingerprints from the hand, a match was found to al-Banna’s in Chicago 20 months earlier.
Now, the Department of Homeland Security operates a sophisticated data analysis program called the Automated Targeting System, or ATS, to assess the comparative risks of arriving passengers.
In a typical year, approximately 350 million people cross U.S. borders, and over 85 million of them arrive by air. Since it’s not practical to subject all of these travelers to intense scrutiny, some form of assessment and analysis must be used to make more rapid choices about how and when to conduct inspections. ATS is that system, and ATS flagged al-Banna for heightened scrutiny.
Learn more about government regulation of cyberspace.
David McCandless is a data journalist. Recently, he created a chart based on a sophisticated computer program to scour the web and scrape bits of data from lots of sources.
McCandless’s chart represented hundreds of thousands of data points displayed graphically. It represents an annual human activity, and it shows a large peak in the spring and another one toward the end of the year. This is not the frequency with which we watch sporting events. It’s also not greeting card buying. So what is it?
Here’s the answer: Facebook breakup data. This is a graph about how love ends, at least among Facebook users. The big peaks are for spring break and the holidays. It’s really knowledge discovery, a pattern we would not see without big data. It is exciting—or, from a different view, maybe kind of disturbing. And it’s also why Facebook is worth billions of dollars.
Also, isn’t it a little spooky how accurate the ads are when we go online?
What you probably don’t know is that when you go to a particular website—say, Google—that website shares your visit with lots of other websites. It colludes with them to build a better picture of who you are.
A snapshot of your web browsing activity shows a field of dots, each one representing a different website. If you hover over the dot that represents Google, one can see that Google shares the browsing history with over 20 other websites—some you frequent often, others, never.
This picture is a graphic example of how our personal web browsing history is being converted into information about us.
Learn more about the mechanics of data harvesting.
One final example, perhaps a prosaic one. You might well have an E-ZPass in your car, one of those electronic devices that allow you to pass quickly through the toll plaza on the highway. Instead of waiting to pay a cash toll, you pass through a drive-through lane, your E-ZPass is electronically recorded, and your bank or credit account is automatically debited.
When your balance gets low, the system pings your credit card and recharges the balance automatically. It’s very simple, easy, and very convenient. It’s also a tracking system. The E-ZPass contains a near-permanent record of your car travels.
In recent years, many have come to see data as a treasure trove for tracking. It may be intrusive, but that’s where we are today. Big data is all part of the big picture of our lives pretty much, whether we like it or not.