Apr 13

Crowdsourcing the Manhunt

In 2003, Howard Reingold wrote a book called Smart Mobs. At the beginning of that book he wrote about an anti-war protest held in San Francisco in 2002. He noted that it was one of the first decentralized protests in human history. Instead of everyone meeting in one place to demonstrate, protesters passed on some simple instructions to anyone who was interested. The instructions were simple; shut down business, traffic, and cause peaceful mayhem to protest the war.

I was living in San Francisco at the time and to get to work I had to walk through South of Market, where most of the protests were scheduled. That day was total chaos. People chained themselves together on the freeway on-ramp, blocking all inbound traffic on the bay bridge. Another group staged a big press scene by vomiting on the Federal Building. Reingold pointed out that this was one of the moments that illustrated that we had reached a tipping point as a connected society, and it had changed our social actions in a major way. People had shared their goal for the protest over SMS and email, and the end result of the demonstration was far superior than anyone could have programmed individually. And so it has gone with most things crowd-sourced since that time, (so much so that this story doesn’t seem all that novel ten years later.)

Which brings me to the man hunt for the alleged Boston Marathon bomber that (thankfully) concluded tonight. If the San Francisco protest in 2002 was a watershed moment for how we organized and collaborated, I think this week has tons of signifigance both in how we collectively solve problems, and the impact of living with devices that are constantly capturing data (phones, tweets, geo-tags, sensors, etc, etc.) 

This week, we saw

  • Reddit and 4Chan turn into full-on detective agencies using open platforms and their own style of communication to identify possible suspects.
  • News reports that had a photos of the boat the boy was hiding in before they could gain access to the perimeter.
  • Analysis on the suspect’s Twitter account to figure out his daily patterns
  • Collaboratively created maps of anything and everything related to the series of events. 
  • A collective fundraising to help one of the amputees pay for health insurance. 
  • Police using data from the marathon to identify runners who would have been finishing at the time of the explosion in order to possibly connect with their families in the search for the suspects. 
  • Reddit order pizza for all the participating police departments.

Indeed, when we had very little to go on, we had the data created as a byproduct of the event and of the suspects lives….and that turned out to be quite a lot to work with.

The Boston Marathon bombing and ensuing manhunt will probably go down as one of the biggest news events of this decade. But I actually think the way the internet voluntarily assembled to solve problems together with all the residual data from the event is a much bigger moment. 

Apr 12

Big Data vs. A Lot of Data

The term Big Data is getting thrown around a lot lately. As is the case with buzzwords, people have begun to use the term to describe a broad category of interest, (similar things happened to “innovation”, “social”, and “Web 2.0”.) If this wasn’t enough, add all the hype/marketing from hardware, software, and service firms driving the “importance of Big Data”, and finding any real clarity becomes impossible.

A lot of people seem to be using “big data” as a proxy for systems at scale and the data that comes with those systems. The general suggestion is that if you have a large system with lots of users, there must be patterns hidden in that data. And it follows that those hidden patterns must be worth something to somebody (right?)…so there’s gold in them there digital hills. (So many references to prospecting in the data world; mining, sharding, etc.)

I had the good fortune of hearing Cesar Hidalgo this week the Media Lab. He spends a lot of time thinking about networks and large data sets, and he had some great thoughts on the topic. In his talk, Hidalgo defined a nice framework to distinguish Big Data from a lot of data. He had three simple qualifying questions.

– Do you have size? – This is pretty relative to the problem you’re working on. But it’s usually in the hundreds of thousands/millions of records. You’ll need enough to provide some statistical significance across your population. But the greater the set of data the more edges you may be able to discover.

– Do you have resolution? – This brings some analysis to the data at hand. Just as all rock does not contain gold, all data does not contain (new) patterns. Low-fidelity data might be all customers transitions with order-level (total amount spent, etc). High-fidelity data would be all the customer transitions with item-level data, (the thing the customer purchased to make up the transaction.) Visa has the former and Amazon has the latter, and it’s no surprise Amazon knows you better. High-resolution data will illuminate new patterns, like Target’s recent misstep of identifying a pregnant teen before she could tell her father.

– Do you have scope? – This question starts to consider the reach of your data. Are you only gathering data against a very focused problem, or are you gathering data that will give you insight beyond your core business? Being able to understand patterns outside your immediate market will create new opportunities for understanding. As an example, Hidalgo spoke about telephone companies, who know your calling patterns, but also can also make determinations around mobility patterns because they know which cell towers you’ve used during your day.

So, though there’s a lot of noise around this space there’s a lot to be done here. And as the hardware, software, and services companies wind people up to capture more data, there will be more patterns to discover – this space is very self-fulfilling like that. Along those lines, this stat came up during the talk: 70% of all data captured about people it’s gathered by machines. So as we put more sensors in everything, we’ll push this ratio further.)

Getting beyond the hype, I’m excited to see what type of new patterns emerge from deeper analysis of data. There’s definitely space for data scientists to unearth new patterns that help designers create new experiences. But to be certain, the real opportunity isn’t in Big Data, it’s in gaining better resolution to the problems we’re trying to solve and the markets we’re trying to serve.

(If you’re into this sort of thing, here’s another talk by Cesar Hidalgo. It’s really nice, definitely worth your time.)