I recently had the pleasure of being asked to do an opening speech at a Digital Leaders session on “People Versus Data – open data and the erosion of privacy” at Techhub Swansea.
Nowadays there’s no shortage of talk around data being the new natural resource, the ‘oil of the digital economy’ or how ‘open data’ is driving growth, innovation and ingenuity. Marketing and consultancy rhetoric or reality?
I would argue that its now very much reality and as David McCandless, data-journalist nicely puts it, Data is the new soil “a fertile, creative medium” for journalists, which we can “irrigate with networks and connectivity”. But it has value way beyond journalism…
Clearly the world is changing, and at pace. Witness any digital disruption talk and there’s a good chance that Uber or similar applications will get a mention. And I make no apologies for doing the same here. Stop and think;
While it’s true that main focus for these disruptive players is the ‘Customer Interface’ they are, at their heart, primarily data driven businesses.
And 2013 saw the first ‘Billion Dollar’ open data company when Monsanto acquired the Climate Corporation, a business built upon analysing information already in the public domain.
Clearly data has become a natural resource.
We could fill many a blog article on the gnarly subject of the role of Government.
But for now let’s take Tim O’Reilly, founder of O’Reilly Media 2009 definition;
“The original vision of the role of government: a convener of things that we as individuals and companies can’t do alone….”
In this model, it is reasonable that Government’s role is to act as the arbiter, the keeper of data, data that is ultimately owned by the citizen. This is very much the approach being followed by UK government. However, as I explain later on, surely Government will, in the future, have more to do in safeguarding the privacy of the individual? And if not Government then who?
The one certainty is that the explosion in data that we have seen over the last few years will continue and quicken. We are witnessing the great datafication (the instrumenting and measurement of everything) and digitization of our world.
In 2013 it was stated that 90% of the world’s data was produced in the last 2 years. What does that figure stand at in 2016?
Innovative companies are emerging such as 23andMe, who will digitise your genetic profile, allowing you to understand genetic risk factors, probability of genetic traits such as lactose intolerance and discover family members you never knew you had! Along with this, the increasing market in ‘wearables’ (such as Fitbits) and similar smartwatches are enabling a rich source of personal health information. Data, data and more data.
Many businesses are already looking at innovative ways of making use of these new data sources, with firms such as Vitality Health offering discounts based upon fitness activity recorded through digital devices.
We are on the cusp of an explosion in vehicle-to-vehicle, vehicle-to-driver and vehicle to infrastructure communication. Again this is the production of rich data sets around individuals’ routine, driving and travel habits.
Similarly, the Internet of Things ‘Markitechture’ is getting more airtime and becoming a reality, it refers to a loose bunch of technologies and systems that involve the gathering of more data from an ever-increasing number of connected devices.
The growth in machine learning technologies, delivered via ‘the cloud’ opens up all kinds of possibilities for more advanced analysis and matching of data. For example, IBM, Google and Microsoft all now offer image matching functionality.
In the Microsoft case this includes the ability to test the probability of 2 facial images being of the same individual. It allows 30,000 comparisons per month free of charge and very low cost for additional volumes. Suddenly advanced technologies are accessible to everyone and the impact this will have on an individual’s desire to remain anonymous will be profound.
The ubiquity of computing, connectivity and cloud technologies (which have served to commoditise more advanced technologies) have allowed the increased datafication and subsequent digitization and analysis of increasing elements of our everyday lives.
There are now staggering amounts of information relating to all aspects of our lives being gathered and analysed.
But so what? What does this data show and what are the risks and implications?
Big data has become a very popular and overloaded term in the last few years. It relates to the computational analysis of large data sets to identify patterns, trends and associations. Although there’s a key point to note here, as Viktor Mayer-Schonberger, Professor of Internet Governance, Oxford University. Points out, the data sets are not just large in that they contain the details of, increasingly, all of the things under the sun, but all of the details of all of those things under the sun.
The datafication of the world plus the cost of computation and data storage means it practical and feasible to capture ‘everything about everything’ and then analyse to reveal the patterns, trends and associations. But as we shall see, sometimes we share more than we possibly intended or realised.
Data, when analysed at scale, reveals patterns, trends and associations that may not be obvious at all when looking at smaller data sets.
Something as seemingly innocuous as a Tweet, when analysed at scale, reveals a surprising amount of information, since Twitter is actually recording the location and language used for each Tweet.
Similarly, analysis of Facebook relationship status changes allows the determination of ‘peak break up times’. While this provides some entertainment value, perhaps more concerning is the use of social media to reveal the hidden wealth in divorce cases!
Slowly and somewhat unwittingly we end up revealing more and more data about ourselves, our relationship with others and the world around us. I refer to this phenomenon as ‘data leakage’.
Take the site Strava.com, which is popular with runners and cyclists alike. For the cyclist, it allows you to plot your rides and then compare your time to others. But the GPS coordinates uploaded as a record of your ride give detailed start and finish locations, making it easy to pinpoint where you live.
Then there’s the option to record what bike you own and, if you are really proud, share a few photos of it.
Then, to top it all of you can share with the world your riding routine, showing the times when you are typically out on your bike and when it is at home.
It goes without saying that all of this becomes a bike thief’s dream. Strava has responded by introducing controls that address some of these risks, but the user needs to be educated to use them effectively.
LG electronics made the headlines in 2013 when it was revealed that their Smart TVs were recording the owners viewing habits and sending the data back to the manufacture and 3rd parties for analysis, whether the user had given their consent or not.
These are just a few examples where the user (and data owner) is more than happy share information for one purpose but unwittingly that information then gets used for another purpose, which may be sinister or may, at the very least be outside that which the user had considered and/or believed they had agreed to.
The problem is the unintended sharing of data, along with the use of data for purposes other than which it was originally intended.
In a world where there is an increasing amount of data being recorded and stored about ‘everything of everything’ linking or aggregating these data sets provides further enormously valuable insights.
So the data sharing continues and the amount of data being aggregated continues unabated and in fact increases as the tools and knowledge required become more accessible.
One intriguing example of the impact of aggregate data can be found at watchdogs.com. Although the site was pulled together as a promo for a computer game, all of the data is genuine and semi real-time.
Whether it’s setting up your new Smart TV, giving an update on social media or
We click on “I agree” with no thought of consequences, only of our convenience.
So we have made our digital beds, and we must willingly lie in it. Surely we have all agreed to this? Yes, there’s a cost (to my privacy) but it’s worth it because it makes my life easier?
But do we really understand what the future looks like? We know what the past looks like, we have just experienced it and we know things are changing at pace and in ways we could never have envisioned.
So we can only begin to think about what the future looks like. Will we be happy with ‘Public by Default’ or will there be some backlash when people start to realise the full impact of the increased sharing and analysis of their personal data, and the resultant erosion of their privacy.
What about the data that I have unintentionally leaked, or the innovative secondary use of my data, a use I never signed up to? And what about the increased use of predictive systems, will I, just like the Minority Report, find myself being guilty because the sheer volume of aggregate data and predictive models suggest I would be probable of some misdemeanour?
I would argue that we are in the cosy space now where we are seeing the upside (the benefits) of ‘putting our data out there’ but we have yet to experience many of the down sides (the costs). There’s a natural lag in the system.
Maybe the ‘Millenials’ or ‘Digital Natives’ (folks born post Internet, as opposed to Digital Immigrants) will realise at their first interview or when they enter into their first serious relationship that sharing everything on social media was not the wisest of moves.
These millennials today will be the policy makers of tomorrow. So it goes without saying that what happens over the coming years will fundamentally shape policy in the subsequent period.
I would argue that several things will endure; Organisations will continue seeking to realise the value in personal data, while individuals will want to maintain some level of privacy, although the bar will be variable across different groups. Clearly something will need to change.
It must fall to policy makers and technology innovators; create a secure, trusted and standardised ‘privacy-rights infrastructure’
This will need to allow business innovation and monetization of data while also offering individuals choices for protecting their personal information. Government has a key role to play in this.
To realise this we need, amongst many other things, logical models to decompose the problem, to be able to answer questions such as “just what bits of data about me are valuable and what elements am I willing to share”.
Technological developments such as ‘blockchain’ are showing great promise in allowing the decentralisation of privacy and the potential to ensure more controlled data sharing at an infinite level of granularity.
And that’s just the easy bit. Then there’s the policy, legal and practical considerations of sharing more and more personal information, covering a myriad of topics such as the secondary and future use of data (beyond the initial explicit use cases) the rise of prediction algorithms and overall ‘policing’ of the privacy control systems.
But we shouldn’t lose sight of the benefits either. While there are some challenges to be met around privacy, the datafication and digitisation of this world is already having a massive positive impact from improving business efficiency, creating new business opportunities, through to tackling disease and climate change.
“In times like these when unemployment rates are up to 13%, income has fallen by 5% and suicide rates are climbing I get so angry that the government is wasting money on things like collection of statistics!” – From Hans Rosling’s, The Joy of Stats