The Age of Data - 6 min read

We live in the Age of Data: according to credible estimates, up to 90% of all data ever produced by humans has been created in the last five years. As we live our lives increasingly through technology, that volume of data will only grow. Every aspect of what we do can now leave a digital trace. Take a typical morning commute in London: using the city’s electronic payment ‘Oyster’ card on the Underground; driving in an ‘intelligent’ car; or just wearing a fitness monitor, the data is accruing. This implies a volume of data that staggers comprehension. IDC, a leading market intelligence firm, predicts that the ‘digital universe’ will be 180 zettabytes of data in 2025. That is 18 followed by 21 zeros. Let that figure settle in. 21 zeros. 

As volume increases, so too does variety. Textual content remains a major stream; some structured into the well-defined categories of personal data – age, sex and income, etc. – but the vast majority semi or wholly unstructured. Other growing streams – many dynamic – include electronic financial flows, real time traffic management systems, or the multimedia offerings of social networking sites, to name but a few. Of particular current interest are the increasing amounts of data being generated by machine to machine communications, known colloquially as the ‘Internet of Things.’ The time is coming when the majority of electronic devices, domestic and otherwise, will produce and share data. As Paul Sonderegger of Oracle describes it, ‘data will be the ultimate externality: we will generate them whatever we do.’ 

But data is a positive externality: the more data we produce, the more utility it should have. In fact, access to data is the key driver behind current innovation. In the early years of the internet, business use of data was narrow. It was about better target marketing and advertising. But now, with advances in machine learning algorithms, the possibilities have exploded. Machine learning can feed off huge amounts of data to identify patterns, combinations and correlations that would be beyond humans to spot.

We see their impact in many aspects of our lives – online retail ‘recommenders’; voice, facial and handwriting recognition; filtering tools such as online search engines and spam filters; language translation; and outlier identifiers in anti-money laundering and anti-fraud software. The machine learning expert Pedro Domingos has highlighted how essential massive amounts of data are to best using these tools: if ‘learning algorithms are the seeds, [then] data is the soil.’

So why have we not yet seen more value from data? Those who have worked for years in the field will say that utilising one data stream alone is hard. There are complex technological, managerial and legal considerations to take into account. Data needs to be acquired, warehoused and maintained in a secure environment. It has to be cleansed for errors, omissions and duplications, aggregated and integrated, analysed and modelled. Then – finally – it has to be visualised and explained, so that the non-technically-minded can make better informed decisions. All this requires sophisticated and expensive resources; data warehouses, platform architecture, and above all, competent data engineers and scientists. These rare beasts are few and far between. 

Tougher still is sharing or trading data streams. For a start, organisations often feel their self-interest goes against collaboration. By collecting, exploiting and protecting what they have, a firm can enhance its products and services and generate competitive advantage. And even if one overcomes that commercial barrier, there are legal and reputational issues surrounding the ownership and security of data. As a result, most data trading and/or sharing is bilateral, ad hoc and takes an age to arrange. The legal transactional costs and friction – drawing up contracts that seek to specify the ground rules in multiple different situations – are often enough to put off even the most devoted data sharer. And the penalties for getting it wrong are getting worse. There are legal, ethical and reputational risks that go with data sharing, as the recent Cambridge Analytica/Facebook allegations show. With new government regulations, such as the EU’s General Data Protection Regulation (GDPR), due to come into force in May 2018, the legal penalties for mishandling data have greatly increased.  In short, ‘trading data is tedious,’ according to Alexander Linden of Gartner. 

But – and this is a big, important but – there are strong, positive trends that give reason for hope. The technological problems around the acquisition, processing and management of data are not what they once were. The advent of open source software projects such as Hadoop support the analysis of vast amounts of data through scalable, distributed Cloud computing. Alongside this, there are innovative new legal approaches being considered to better enable sharing and trading. A UK government sponsored report on Artificial Intelligence recently recommended the creation of ‘data trusts’, providing repeatable framework agreements that would allow for the commonplace sharing and trading of data. Switzerland already has ‘data cooperatives’ for health data, where patients can decide whether they want their data to be included in research projects.

The real key will be encouraging data monopolies – especially global companies – to trade and share their data. Some governments have started to by example, sharing some data for free. India does this already with its digital-identity system, Aadhaar. Other governments have legislated to ensure mandatory sharing in key industries. Germany, for example, requires car insurers to share statistics on car accidents, without which small insurers would be at a disadvantage.

But legislation and exhortation only go so far. Businesses need positive reasons to share too; and being businesses that means enhancing value. As a result, the next step needs to come not from governments, but from business itself.

Which is why we have created HARBR. Our team have worked in all aspects of the world of data over many years. We have lived the problems, but we can also see the vast opportunities access to data will bring, both in innovation, and commercial benefit. Which is why our vision is so ambitious: all data, one platform, effortlessly accessible. We believe this is the only way to fully realise the possibilities of the ‘Age of Data.’

There is much more to be said about how we make this happen. Over the coming months, we will be sharing our thinking in more detail on the issues touched upon in this paper. We want to talk with you about how we:

  • Leverage the invisible ubiquity of data in modern life, and its role in AI and wider innovation;
  • Value data, and consider its role as a economic asset to be shared or perhaps, donated;
  • Tackle legal and ethical concerns about the use (and potential abuse) of personal information;
  • Develop data expertise and democratise it to the man and woman ‘on the street’; and
  • Use the power of data to enrich public debate in a time of ‘alternative facts.’

As we post our thinking on these issues, we will be eager to hear your views and ideas, too. Collaboration is at the heart of what we do, so we’re looking forward to talking.

martin yong