The weekly newsletter for Fed2
by ibgames

EARTHDATE: April 1, 2012

Official News page 9


WINDING DOWN

An idiosyncratic look at, and comment on, the week's net and technology news
by Alan Lenton

Happy April Fool’s Day everyone. I decided to give you all a break from jokes and discuss something serious (actually, I forgot it was 1st April today). Instead I decided it was long overdue for me to take a look at what is known as ‘Big Data’, which after something like five years of spreading through the technostructure is now coming of age.

There will be no Winding Down next week, because it’s Easter. I, like all red-blooded UK males, will be spending Easter doing DIY — in this case finishing painting our new apartment. How do I know that everyone else will be doing that too? Because there are more household accidents in the UK during these statutory holidays than at any other time of the year!

And so to Big Data...


Story: Big Data — big stick or big bonus?

Usually when politicians get their hands on anything techie it’s a sure sign that the technology is either past its prime, a no-hoper, or it has obvious applications for keeping an eye on what their electorate are doing. Thus it was with some interest that I read that the Obama administration is launching a US$200 million ‘Big Data’ initiative.

So, what is Big Data? Let me give you an example of Big Data. Twitter has somewhere over 200 million accounts, which generate over 230 million messages a day. That’s about 84 billion messages a year. If you could analyze these messages you could produce a lot of information about the habits of Twitter users. Information which would be very valuable to advertisers, law enforcement, politicians, and Twitter itself. The problem is that’s an awful lot of data to analyze, particularly when it is scattered over tens of thousands of servers. And, above all, it has to be analyzed fast, to find, and capitalize on, the trends before anyone else does.

With the size of the data collected being many orders of magnitude larger than before, and its scattered nature, the old methods of analyzing data were not going to work — well not this side of the heat death of the universe, anyway. New methods were needed.

It really started with Google. Google is, essentially, an advertising broker. That’s how it makes all the money that it uses to do other, more interesting things. But Google needed some way of processing all the data is obtained about what its users (not customers — its customers are the people who pay to place ads on its services) were searching for, and how they were searching. There were many advantages to being able to analyze this data so that it could provide information to its customers on the efficiency of their advertising, and how to improve it. As a bonus, it would also provide information on how to make the searches more relevant to the users, so they would continue to come back.

Google developed a method of dealing with this sort of material. This is not the place to go into the technical details, suffice to say the idea was taken up and variants and extensions are now in use anywhere there is a mass of data available. And when I say ‘anywhere’, I mean anywhere, because it’s not just the internet giants that use it. Many big companies, especially retail outlets and multi-nationals have collected vast amounts of data about their sales and customers over the years, and are now finding ways to analyze and exploit that data. Other organizations that are starting to apply these methods to existing data include governments, research organizations, and charities. And, of course, now the potential has been revealed — even to politicians — it is a rapidly expanding area for research. Hence the Obama initiative.

But analyzing the data is only part of the story. To be useful, the data needs to be displayed in ways which make it possible to immediately understand the implications of the analysis if you’re not a techie. Just to give you an example, look at this, it’s a standard map of wind conditions in the USA. Now look at this. It’s a real time display of wind in the USA using data from the National Digital Forecast Database. I doubt if I need to ask which of these visualizations gives non-meteorologists the best grasp of what’s going on! (Incidentally, you can click on the moving wind map to zero in on a location.)

But what does it mean for ordinary people? Well, it’s not all bad. On the other hand, some of the implications are pretty grim. I guess we should look at the downside first... In order to do that I’m going to break my no more ‘New York Times’ rule, because they have an article on how companies learn your secrets which is unmatched by anything else I came across while researching this piece. It seems that The Target group of shops , and by extension, other retail chains, keep a vast amount of material about their customers. Target assigns each shopper a unique code — the Guest ID number — that records everything they buy. They also keep info if a customer uses a credit card or a coupon, mails in a refund, calls the customer help line, opens an e-mail they are sent by Target or visits the Target web site. All this is recorded and linked to the customer’s Guest ID.

I can only quote the article to show how much information this is: ‘Also linked to your Guest ID is demographic information like your age, whether you are married and have kids, which part of town you live in, how long it takes you to drive to the store, your estimated salary, whether you’ve moved recently, what credit cards you carry in your wallet and what web sites you visit. Target can buy data about your ethnicity, job history, the magazines you read, if you’ve ever declared bankruptcy or got divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own.’ That’s a lot of information. Link is up with big data tools and visualization and you start to get some idea of some pretty unpleasant possibilities for anyone who values their privacy.

On a more individual level individual level take a look at this nasty little app which is a product of adding (‘mashing’ as its called in the trade) a number of different big data sets.

Of course, the real bugbear is what the government is going to do when it has the tools of Big Data in its grubby little paws. Governments hold enormous amounts of data, but most of it is in different, but overlapping databases. This fact makes serious government manipulation of its citizens more difficult than it might otherwise have been, hence the assorted projects (mostly failures) over the last 20-30 years to combine these databases into one big uber-database. Big Data tools, though, might as well be tailor made for the business of analyzing information housed in scattered databases, making the looming specter of the society portrayed in George Orwell’s book 1984 come ever closer.

But is there a ‘good’ side to Big Data? Yes, there definitely is. Like most technologies, Big data is neither inherently good nor bad. One of the areas where Big data is showing its benevolent side is in medicine.

Take for instance Louisville. More the 100,000 people in the city suffer from asthma. Now the city is, with the help of IBM, launching research into what triggers asthma in its citizens, by giving a sample of them inhalers which record when they are used. That data is collected and will be matched up to city information on traffic, air pollution, pollen levels and a host of other possible triggering factors. Although the sample of asthma sufferers is relatively small, the city condition data is not, and without the tools and techniques developed for Big Data, it would not have been possible to consider this sort of study.

Other medical uses of big data include a study of the side effects of taking different drugs at the same time. This was done by analyzing the data from hundreds of thousands of ‘adverse events’ reported to the US Food and Drugs Administration every year. The result showed up thousands of previously unknown side effects.

So what can be done to mitigate the ‘bad’ effects of Dig Data? The techniques, for good or evil, are out of the box, and there is no way of stuffing them back in, even if we wanted to. Some of the more disturbing effects may well be mitigated by sociological changes — for instance people becoming more aware of privacy issues. Some of it may well be handled by technical breakthroughs, such as easy to use encryption (but don’t hold your breath). And some of the problems may be handled by political action.

But for anything useful to happen, we all need to be aware of what’s going on and prepared to take a stand against what we see as abuses of these techniques. That’s not going to be easy, if only because of inertia and the undoubted fact everyone has a different idea of what is unacceptable on the part of government and businesses. Still, that’s all part of what democracy is about.

http://www.cccblog.org/2012/03/29/obama-administration-unveils-200m-big-data-rd-initiative/
http://www.arnnet.com.au/article/418396/big_promise_big_data/
http://www.theregister.co.uk/2012/03/19/clearstory_big_data_viz_stealth/
http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?_r=1&pagewanted=all
http://www.forbes.com/sites/jerrymichalski/2012/03/10/big-data-stalker-economy/
http://www.ted.com/talks/jer_thorp_make_data_more_human.html
http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html
http://www.courier-journal.com/article/20120315/NEWS01/303150008/Louisville-launch-data-driven-asthma-study
http://www.forbes.com/sites/danwoods/2012/03/09/expanded-data-access-for-c-level/
http://www.nature.com/news/drug-data-reveal-sneaky-side-effects-1.10220


Acknowledgements

Thanks to readers Barb, Fi, and to Slashdot's daily newsletter for drawing my attention to material used in this issue.

Please send suggestions for stories to alan@ibgames.com and include the words Winding Down in the subject line, unless you want your deathless prose gobbled up by my voracious Spamato spam filter...

Alan Lenton
alan@ibgames.com
1 April 2012

Alan Lenton is an on-line games designer, programmer and sociologist, the order of which depends on what he is currently working on! His web site is at http://www.ibgames.net/alan.

Past issues of Winding Down can be found at http://www.ibgames.net/alan/winding/index.html.


Fed2 Star index Previous issues Fed 2 home page