[promoted] Democratizing Data: Bringing Data Harvesting and Analytics to the Masses

Flux Thumb Good

Some
people like to say that “data is king.”
The team at Flux prefers a slightly different term: It believes that “data is a
kingmaker.” The ability to collect, store and analyze data is what gives
individuals and organizations power. 

To date,
access to this kingmaking power is very unequal. It has been concentrated in the
hands of a select few companies, most of which were already quite powerful to
begin with. They alone possess the enormous resources required to collect
quality data and turn it into value. 

Meanwhile,
the rest of the world is drowning in data, but that data and the power it
confers remain out of our reach. By extension, the ability to leverage that
data in ways that maximize its potential for serving the collective good of
society as a whole is limited.

Fortunately,
that is now changing. New means of providing access to data, as well as novel
tools for correlating data, contextualizing data and analyzing data in real
time, promise to usher in an age of democratized data. 

In this
article, Flux provides a look at how data is being democratized. It focuses on
the specific use case of collecting and analyzing environmental data, although
the trends discussed below have the potential to play out anywhere that data
leads to insight. 

The Challenges of Data Democratization

In
theory, anyone can collect data or access the various open data sets that are
collected by government agencies and other organizations committed to open
data.

Anyone
can theoretically analyze and process data, too; after all, most of the major
frameworks for big data processing, like Hadoop and Spark, are open source.
There’s no technical or legal barrier stopping someone from downloading and
running them. 

At a
practical level, however, actually collecting, transforming, storing and/or
analyzing data on a large scale is unfeasible for most individuals and
organizations today. That’s true for a number of reasons:

  • Data is
    harvested at high, broad levels
    . Most open-access environmental data sets focus on broad regions.
    If you want to study a particular place or microclimate, it can be hard to
    find the data you need. And while you could theoretically collect the data
    yourself, deploying and managing your own sensors or other data collectors
    is often not realistic because of a traditional lack of affordable, open
    data sensors.
  • Data may
    be biased
    . Even
    if you have access to raw open-access data, you can’t be certain that the
    data is not presented in ways that skew your ability to interpret it
    accurately or fairly.
  • Data
    storage is expensive
    . Sure,
    you can now store data in the public cloud for just pennies per gigabyte.
    But when your data sets reach terabytes in size, those costs add up and
    few organizations have the budget to sustain them over the long term. (And
    if you don’t collect data over the long term, you are likely to miss out
    on important insights, especially in contexts like environmental data
    where change typically results from infrequent,
    sudden events
    .)
  • Pre-collected
    data is outdated data
    . If you
    rely on data that was collected by someone else, chances are that the data
    will be stale by the time you access it. Plus, it will take you more time
    to transform the data, clean up data-quality issues and run analysis. By
    the time all of this is done, the insights you can glean from the data may
    be outdated. The only way to solve these problems is to collect and analyze
    data in real time, but again, most organizations lack the resources to do
    this on their own.
  • Lack of
    data interpretation and artificial intelligence (AI) tools
    . Again, many frameworks for
    collecting and processing large data sets are open source. But advanced
    tools for making sense of data, like AI algorithms, tend to be
    proprietary. The companies that develop these tools invest huge amounts of
    money in them and rarely make them available to third parties.
  • Poor
    incentives for data and AI sharing
    . Part of the reason for the challenge described in the preceding
    point is that few organizations have strong incentives to share their data
    and proprietary AI tools. To date, most companies that benefit from data
    monetize it through advertising or internal research; there has been
    little reward for them in sharing their data and data tools with third
    parties.

What all
of the above means is that data has tended, so far, to be very undemocratic. It
increases the power of organizations that are powerful to begin with and
therefore have the resources to undertake large-scale, proprietary data
collection and analysis programs. It leaves everyone else struggling to make
sense of the tidbits of data that are available from open data sets, which are
usually of limited value for gaining real-time insights. And even if you find
access to meaningful, relevant data, you may not have the advanced AI tools
that are necessary to turn the data into value.

This is
why we struggle to maximize the value of all of the data that is being
generated around us. As Microsoft’s Lucas Joppa noted in Nature:

“Today,
we know more than ever about human activity. More than one-quarter of the 7.6
billion people on Earth post detailed information about their lives on Facebook
at least once a month. Nearly one-fifth do so daily. … Yet we are flying
blind when it comes to understanding the natural world.” 

Joppa
continued by pointing to some of the reasons why we do such a poor job of
transforming all of the environmental data surrounding us into insight. The
problem is not only that scientists “don’t have the kinds of data needed to
make such predictions” but also that they “lack the algorithms to convert data
into useful information.” 

When all
but a handful of organizations have the power to glean meaningful insight from
environmental data, and they are not actually doing it, people who interact in
critical ways with the environment — like foresters and builders — cannot make
data-driven decisions that are in the best interests of all stakeholders. Nor
can anyone hold them to account.

What It Takes to Democratize Data

It
doesn’t have to be this way. Data can be democratized in ways that make it
practical for any person or group to derive insights from the data surrounding
us.

Doing so
requires:

  • The ability to store and share data in
    an open, decentralized way
    . Shared data would not only make more data available to people who
    need it but would also — and this is a really important part of data
    democratization — allow us to place data from many different sources on
    the same plane and in the same context, so that we maximize visibility and
    insights.
  • An incentives system that rewards
    people for sharing data with each other and makes it feasible to monetize
    data in ways that are not purely self-serving
    .
  • Access to AI-powered data analysis
    tools that anyone can use
    .
  • Open, affordable data harvesters that
    anyone can deploy
    .

These
solutions are all part of the platform that Flux is building. Flux is the
antidote to the natural tendency of big data and AI to monopolize power, rather
than democratize it. By leveraging blockchain technology for open, affordable
data storage, Flux provides the advanced AI tools necessary to reward organizations
for sharing and collecting data via open-sourced hardware sensors called MICOs.
In summary, Flux is creating a new environmental standard for data collection,
storage and intelligence. 

Learn
more by downloading the Flux white paper.

This promoted article originally appeared on Bitcoin Magazine.