SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
1
The implications of Big Data for BTS and
COS
George Kershoff
Bureau for Economic Research (BER), Stellenbosch University, South Africa
Presented at the 7th joint EC-OECD workshop on “Recent developments in Business and Consumer
Surveys” held in Paris on 30 November and 1 December 2015
Abstract
In the analogue era information was scarce and came from questionnaires and
sampling. Since the dawn of the digital age in 2012 far more data than ever before is
stored and it is mainly collected passively, i.e. while people go about doing what they
normally do, such as run their businesses, use their cell phones and conduct internet
searches.
Analysts, policy makers and business people value business tendency surveys (BTS)
and consumer opinion surveys (COS) specifically because the survey results are
available before the corresponding (official) quantitative data. However, Big Data has
begun to make inroads on areas traditionally covered by BTS and COS. It has a
competitive edge over BTS and COS, as it is available in real-time, is based on all
observations and does not rely on the active participation of respondents.
Furthermore, Big Data has little direct production costs, because it is merely a by-
product of business processes. In contrast, putting together and maintaining a sample
of active respondents and collecting information through questionnaires as in the case
of BTS and COS, require the upkeep of a costly infrastructure and the employment of
people with scarce, specialised skills.
However, BTS and COS also have a competitive edge over Big Data in certain aspects.
These aspects could broadly be put into two groups, namely 1) BTS and COS offer
information that Big Data cannot supply and 2) BTS and COS do not suffer from some
of the shortcomings of Big Data. The biggest competitive advantage of BTS and COS
is that they measure phenomenon that Big Data does not cover. Big Data records only
actual outcomes, while BTS and COS also cover unquantifiable expectations and
assessments. Although Big Data often claims that it covers the whole population
universe (and not only a selection) this does not necessarily prevent bias. For
example, twitter feeds could be biased, because certain demographic or less activist
groups are under-represented. In contrast, the research design and random sampling
of BTS and COS limit their selection bias.
To remain relevant and survive, producers of BTS and COS will have to adapt and
publicise their unique competitive advantage vis-à-vis Big Data in the future. The
biggest shift will probably require that producers of BTS and COS make users more
aware of the value of the unique forward looking information of BTS and COS (i.e.
their recording of expectations about the future).
2
Introduction
Awareness of Big Data has risen particularly sharply over the last two years or so. The publication
of Mayer-Schönberger and Cukier’s book “Big Data” and its shortlisting for the Financial Times’
Business Book of the Year award in 2013 have introduced the phenomenon to many more people.
Edward Snowden’s (a former CIA employee) leaking of classified information from the U.S.
National Security Agency (NSA) in 2013, which revealed the existence of numerous global
surveillance programs, also made the general public more aware of the existence of Big Data.
Official statistical agencies have been investigating how to better incorporate more administrative
data and Big Data in the production of official statistics for some time. In addition to the long-time
developers and users of Big Data (such as Google and the other search engines, Amazon, other
tech companies and pioneers in some other industries, such as the US retailer, Target), of late
many more private sector firms have begun to realise and investigate the business potential of Big
Data.
This paper considers the implication of Big Data for business tendency surveys (BTS) and
consumer opinion surveys (COS). Although more people talk about Big Data nowadays, they often
mean different things. So, this papers starts off with a description of the different aspects and
applications of Big Data in order to better define and apply it to BTS and COS. This will, for
instance, show that some forms of Big Data, such as internet search terms, have turned out to be
unstable and therefore a less reliable source of information than expected previously. In contrast,
if formidable logistical and analytical challenges could be overcome and it becomes possible to link
a variety of private and administrative data sets (i.e. integrate heterogeneous data resources), the
promise of providing real-time information about a very big part of the population universe could
become a reality (Big Data, 2015). Given such developments, what are the advantages and
disadvantages of BTS and COS compared to Big Data? How ought BTS and COS adapt to continue
to remain relevant, valuable and viable in the long run?
This paper should be treated as a discussion document rather than an expert opinion piece. It
intends to stimulate and create awareness among those responsible for BTS and COS in countries
that have not yet been affected much by Big Data developments and get feedback from those that
have been affected more.
Technological developments and the emergence of Big Data
Over the past 15 years the world has seen the exponential growth of the size and speed of
computer processing power, networks and storage. At the same time, the increased use of the
internet, the digitalisation of business processes and the rise in the number of mobile devices and
sensors have led to an explosion in the volume of data generated.
Big Data has developed out of these two threads. At its core, “Big Data is about predictions. … it is
about applying math to huge quantities of data in order to infer probabilities: that an email
message is spam; that the typed letters ‘teh' are supposed to be ‘the’ …” (Mayer-Schönberger &
Cukier, 2013). Big Data is not only used by internet search engines, but also by governments,
manufacturers, the media, retailers and other private firms to “spot business trends, prevent
diseases, combat crime and so on” and to better “target advertising and products at consumers”
(Big Data, 2015). According to Gartner "Big Data is high volume, high velocity, and/or high variety
information assets that require new forms of processing to enable enhanced decision making,
insight discovery and process optimization" (Big Data, 2015). Big Data refers to the large, complex
data sets – created and collected through technology (Big developments in Big Data: how
astronomy is driving data science in Africa, 2015).
According to Cukier (2013) datafication, i.e. “the ability to render into data many aspects of the
world that have never been quantified before”, led to the growth in the size of the available data.
3
“For example, location has been datafied, first with the invention of longitude
and latitude, and more recently with GPS satellite systems. Words are treated
as data when computers mine centuries’ worth of books. Even friendships and
‘likes’ are datafied, via Facebook”.
“Datafication is not the same as digitization, which takes analog content --
books, films, photographs -- and converts it into digital information, a sequence
of ones and zeros that computers can read. Datafication is a far broader
activity: taking all aspects of life and turning them into data. Google’s
augmented-reality glasses datafy the gaze. Twitter datafies stray thoughts.
LinkedIn datafies professional networks”.
This data has been put to new uses “with the assistance of inexpensive
computer memory, powerful processors, smart algorithms, clever software, and
math that borrows from basic statistics. Instead of trying to ‘teach’ a computer
how to do things, such as drive a car or translate between languages, which
artificial-intelligence experts have tried unsuccessfully to do for decades, the
new approach is to feed enough data into a computer so that it can infer the
probability that, say, a traffic light is green and not red or that, in a certain
context, lumière is a more appropriate substitute for ‘light’ than ‘léger’ ”.
Different data collection methods
In the analogue era information was scarce and came from questionnaires and sampling.
Mankind’s ability to measure limited it to measure only the most important things. Exactness was
crucial, because “when data was sparse, every data point was critical, and thus great care was
taken to avoid letting any point bias the results” (Mayer-Schönberger & Cukier, 2013). The current
BTS COS method comes from this era, as it consists of the (active) quizzing of a selection
(sample) of respondents.
Since the dawn of the digital age in 2012 far more data can be analysed – in some cases, all data
related to a particular phenomenon. Furthermore, the data is collected passively, i.e. while people
go about doing what they normally do, such as run their business, drive and walk around (and
thereby unknowingly trigger sensors), use their cell phones, conduct internet searches or use their
retail store loyalty cards.
According to Harford (2014), the Big Data “that interests many companies is what we might call
‘found data’, the digital exhaust of web searches, credit card payments and mobiles pinging the
nearest phone mast”. Big Data is “cheap to collect relative to [its] size, … [is] a messy collage of
datapoints collected for disparate purposes and … can be updated in real time”.
Ferreira (2015) points out that “telematics in car tracking and vehicle monitoring devices — and
even toll gantries — gather information about where and how [people] drive. Some businesses,
such as freight operators, use this technology to protect their assets while others, such as
insurance firms, use it to incentivise good driving or recover stolen vehicles.” The combination of
GPS devices, wearables (e.g. pedometers and training watches) and smartphones have the
potential to produce an exponential growth in behavioural data of people’s favourite routes and
routines.
In contrast to most traditional survey methods, Big Data is collected as a by-product of normal
activity, rather than requiring individuals or firms to respond to survey questions after the event.
One could therefore say that Big Data mainly comes from “back-end operations” (Cukier, 2013).
4
Implications of Big Data for BTS and COS
Competitive edge of Big Data
Real time availability
The fact that BTS and COS survey results are available before the corresponding (official)
quantitative data, make these results particularly valuable to analysts, policy makers and business
people.
However, Big Data has begun to make inroads on areas traditionally covered by BTS and COS and
its real-time availability gives it a competitive edge over BTS and COS. (See the box below.)
Selected examples of how Big Data is used to monitor the current performance of the macro-
economy
Proprietary (private sector) data sources
In the United States, MarketPsych (www.marketpsych.com) is analysing the datafied text of tweets and works
together with Thomson Reuters to produce different indices across many countries, updated every minute, on
emotional states such as optimism and gloom, i.e. consumer confidence. The Thomson Reuters MarketPsych
Indices (TRMI) analyse:
“news and social media in real-time to convert the volume and
variety of professional news and the internet into manageable
information” and “two key types of indicators are provided [1]
Emotional indicators, such as Gloom, Fear and Joy [2] Buzz metrics
that indicate how much something is being talked about in the news
and social media. These include macroeconomic themes such as the
level of talk about Litigation, Mergers and Volatility (Data –
overview, 2015) and
“has shown itself to be predictive of the flash PMI and end-of-month
PMI values” (General Usage Patterns, 2015).
SWIFT (Society for Worldwide Interbank Financial Telecommunication) (www.swift.com), the global interbank
payments processor, found that their data correlates well with global economic activity, i.e. business
confidence. The SWIFT Index (SWIFT Index. An early fact-based leading indicator for short-term GDP
evolution, 2015):
“is available on a monthly basis and [provides] a reliable indication
of the economic activity evolution based on the volume of SWIFT
customer payments messages (MT103). The global coverage of this
traffic makes it a reliable barometer of the GDP evolution at Global,
Regional and sometimes National level. The SWIFT Index measures
the variation of the volumes of sent customer payments messages
excluding the impact of events not linked to economic activity”.
In South Africa, the monthly Bankserv Africa Economic Transaction Index (BETI) (www.economists.co.za) uses
interbank payments data to compile a tracker of current economic activity (BETI - June’s positive numbers off-
set previous poor month, 2015). Gill, Perera and Sunner (2012) applied this method in Australia.
INRIX (www.inrix.com), a provider of a variety of internet services and mobile applications for road traffic and
driver services in the United States, Europe and various other countries (INRIX, 2015), uses their data as a
proxy for economic activity. They argue that traffic data is a good proxy, as “it measures people going to work
and deliveries being made. While the capacity of a road changes infrequently, the commerce and people that
use them fluctuate” (Yanofsky, 2013).
CHEP, a company that provides the pallets used in industrial and retail supply chains, generates huge volumes
of data during their operations. They operate in countries such as Australia, the United States and South
Africa. In Australia, the AFGC (Australian Food and Grocery Council) CHEP Retail Index (www.afgc.org.au)
(2015) has turned out to be a reliable indicator of retail sales and is available ahead of the corresponding
number produced by the Australian Bureau for Statistics (ABS). The index is based on CHEP pallet movements
and covers 10 million data points and 10 000 customer accounts.
5
A San Francisco based start-up, SpaceKnow, is using satellite imagery to track production (Sinclair, 2015).
They created an index of Chinese factory production using algorithms to monitor more than 6 000 industrial
facilities and hope to supplant survey-based purchasing manager indexes (PMIs) with software that identifies
signs of economic activity, such as transport vehicles in parking lots. They plan to eventually track all the
world’s trucks, ships, mines and warehouses (Kearns, 2015).
The “Billion Prices Project” (BPP) is an initiative of Alberto Cavallo and Robert Rigobon of the MIT Sloan School
of Management where they use software to crawl (scrape) the internet to collect the prices of products sold
online. The BPP daily inflation indices cover nearly 70 countries and use daily price fluctuations of over five
million items sold in over 300 online retailers and are available through a commercial venture, PriceStats
(www.pricestats.com) and State Street (Inflation Series, 2015; Our Global Reach, 2015; Mayer-Schönberger &
Cukier, 2013; Hartley, 2015). In South Africa, the Pre-CPI release of the Inflation Factory
(www.theinflationfactory.com) is based on the prices of products offered on the internet for sale in South
Africa.
An app of the start-up Premise (www.premise.com) uses paid individuals to use their smartphones to take
pictures of the price tags of food and other local goods around the world and then record these prices. In 2015,
an arm of the United Nations found that Premise can compile a monthly food consumer price index for Brazil by
up to 25 days before the official government agency (De La Merced, 2015). The biggest potential benefit of
Premise’s app is that it is rolled out in countries with underdeveloped national statistics (Hartley, 2015).
A start-up, Real Time Macroeconomics (www.realtimemacroeconomics.com), is using online data to monitor
the labour market (such as job openings, layoff announcements, wage growth and employment) in real-time.
Their indices are built to closely follow the respective US Bureau of Labour Statistics indices, but have the
added flexibility of near real-time data delivery and more granularity (Hartley, 2015). Indeed
(www.indeed.com), a job search company, produces similar data (Sinclair, 2015).
Zillow (www.zillow.com), an online real-estate service provider in the United States, collects information about
home sales and mortgages (Sinclair, 2015). “At Zillow’s core is [their] living database of more than 100 million
U.S. homes, featuring both public and user-generated information including number of bedrooms and
bathrooms, tax assessments, home sales and listing data of homes for sale and for rent. This data allows
[them] to calculate, among other indicators, the Zestimate, a highly accurate, automated, estimated value of
almost every home in the country as well as the Zillow Home Value Index and Zillow Rent Index, leading
measures of median home values and rents”. (Zillow Real Estate and Rental Data: Why we are different,
2015). In South Africa, the PayProp Rental Index is compiled from real-time transaction data and provides a
comprehensive overview of the state of the residential rental market in the country (Rental Index, 2015).
Official Statistics and Administrative data sources
Official statistical agencies have recognised the potential value of administrative and other forms of Big Data
sets and have started investing heavily in searching for ways to harvest this information.
Statistics Netherlands successfully uses traffic loop data for transport statistics and mobile phone data for
daytime population statistics (Kroese, 2015).
Internet search data
Google Trends (www.google.com/trends) provide a real-time daily and weekly index of the volume of queries
that users enter into Google.
In their seminal work in 2009, Choi and Varian (2012) showed how this search engine data could be used to
forecast near-term values of economic indicators. They included examples of motor car sales, unemployment
claims, travel destination planning and consumer confidence. Gill, Perera and Sunner (2012) found similar
positive results in Australia. Likewise, McLaren (2011) showed that internet search data could be applied to
estimate certain labour and housing trends in the UK.
Passive data collection of all the data related to a particular phenomenon
Another competitive advantage of Big Data vis-à-vis BTS and COS is that it is based on all
observations of a particular phenomenon and does not rely on the active participation of
respondents. Big Data, therefore, does not suffer from the same sampling and non-sampling
errors than BTS and COS.
6
This has become an even bigger advantage of late, as all surveys – including BTS and COS – that
actively collect data (i.e. a selection of respondents have to complete a questionnaire or
participate in a telephone interview) increasingly struggle to sustain participation. Response rates
have suffered, because it has become more difficult to attract people’s attention and encourage
participation, as more and more things lay claim to their finite time. This is largely the result of
technological developments that have led to a situation in which people are constantly bombarded
with e-mails, social media notifications, requests and other distractions.
So far, these developments have not affected all firm sizes and demographic groups equally. For
instance, in emerging economies respondents at micro and small sized firms were affected to a
lesser extent than those at medium sized and large firms. This is in part due to the fact that the
former generally does not have broadband internet access and e-mail (having to use their cell
phones instead) and is therefore less distracted. Likewise, for a while younger people and high
income earners adopted the new technologies faster than the older generation and low income
earners and were therefore more distracted and less willing to participate in COS.
Some institutions that use market research firms to conduct the fieldwork for COS on their behalf
have also noticed how Big Data has adversely affected a part of the business model of these firms.
Big Data has eroded the demand for survey-based market research, as many of the large retailers
and banks nowadays do not require the services of a market research firm, because they can use
their own transaction records and other internal data (e.g. collected through loyalty cards) to track
their customers’ preferences.
Financial considerations
While BTS and COS are costly to conduct, Big Data has little direct production cost, because it is
merely a by-product of business processes or an input to deliver the actual product or service.
The financial survival of institutions that receive public funding to conduct BTS and COS are not at
risk as long as the governments in these countries continue to value and fund them. In contrast,
Big Data poses a more immediate risk to the funding of institutions that depend on private sector
support. Many of these institutions have adopted sponsorships to finance the production of BTS
and COS, because the income out of the sale of survey results reports largely dried up when
internet use became more widespread about a decade ago. In terms of this funding model, private
firms support the surveys financially in exchange for the right to attach their name to the survey
(media rights) and have their people make the results public. These firms then get valuable
exposure and publicity through the media coverage of the survey results. The media coverage, in
turn, depends on the track record, early availability and market moving potential of the data.
Big Data has made it easier for far more people to produce indices that track the performance of
the macro-economy than before. Nearly all that is required to compile an index is the data (with
the personal identifiable information removed if applicable) and some elementary analysis. In
contrast, putting together and maintaining a sample of active respondents and collecting
information through questionnaires, require the upkeep of a costly infrastructure and the
employment of people with scarce, specialised skills.
BTS and COS face additional competition if it could be shown that an index compiled from Big Data
has the same or even better track record (relative to a widely accepted benchmark, which is
currently mostly the official quantitative data series). The competition becomes even fiercer if it is
possible to produce indices based on Big Data faster and at a more disaggregated (granular) level
than the BTS and COS ones. While policy makers will mostly remain interested in the aggregate
results, private firms would rather demand the disaggregated (e.g. the demand for a specific
product or service in a given geographical area) than the aggregate results, because from their
perspective the aggregate is only a proxy for the local demand of their product or service.
7
Competitive edge of BTS and COS
BTS and COS also have a competitive edge over Big Data in certain aspects. These aspects could
broadly be put into two groups, namely 1) BTS and COS offer information that Big Data cannot
supply and 2) BTS and COS do not suffer from some of the shortcomings of Big Data.
BTS and COS cover variables that Big Data cannot directly record
The biggest competitive advantage of BTS and COS vis-à-vis Big Data is that they measure
phenomenon that Big Data does not cover. Big Data records only actual outcomes, while BTS and
COS also cover unquantifiable expectations and assessments. Fluctuations in these “feelings” of
business people and households have proven to be key determinants of their investment and
spending / saving decisions, which in turn propel the business cycle. The use / application and
value of BTS and COS will therefore likely shift from being primarily early indicators (surveys of
what happened in the current period) to measures of expectations (what respondents expect for
the next period), because the latter is not passively revealed and therefore tracked by Big Data. It
is herewith assumed that the survey expectation data performs better (i.e. they provide additional
information) than a simple autoregressive lag of the data (i.e. the best univariate forecast for y t+1
is not simply y t).
The length of historical time series will provide BTS and COS with another competitive edge, at
least until the time series from Big Data become sufficiently long at some time in the future. The
long time series of BTS and COS are particularly valued, as historical cycles and relationships
provide handy clues about the future.
The micro data (i.e. individual responses with the personal identifiable information removed, but
with the sector, size and region preserved) is another unique offer of BTS, as this makes it
possible to study the same respondent over time.
BTS and COS do not suffer from the same shortcomings than some Big Data applications
BTS and COS offer representative samples instead of potentially biased selections of all the data related to a particular
phenomena
Although Big Data often covers all users / data (compared to only a selection / sample as in the
case of BTS and COS), this does not necessarily prevent bias. For instance, twitter feeds could be
biased, because certain demographic or less activist groups are under-represented. Another
example would be data stemming from vehicle monitoring devices. Even though all the vehicles
with such devices could be tracked, these devices are not fitted in all the vehicles in a country.
In contrast, the research design and random sampling limit the bias (i.e. enable one to state with
95% confidence that the true value falls within a certain range, for instance) in the selection of
respondents in BTS and COS. However, this benefit of impartiality of BTS and COS is not
unconditional and is only sustained if the response rate is relatively high and the research design
provides for the known under-representation in the sample of, for instance young people, people
without land line telephone numbers or small firms. Furthermore, it needs to be acknowledged
that the increased struggle to achieve acceptable response rates discussed earlier tempers this
competitive edge of BTS and COS.
Of the different kinds of Big Data, internet search data has attracted the largest share of criticism,
particularly about representativeness. According to Gill, Perera and Sunner (2012: 10), the
shortcoming of internet search data includes its “relatively short history, the possibly
unrepresentative nature of the sample given the variation in internet use across different groups
by age and income, and the likelihood of considerable noise in the data (owing to factors such as
changes in the market share of firms like Amazon, and changes in search terms and behaviour)”.
Big Data could overcome this problem if more heterogeneous data sources are integrated.
However, at present this is more a promising new frontier than a reality, as the integration of
8
diverse data sources presents formidable logistical and analytical challenges (Big Data, 2015,
Einav & Levin, 2013: 25).
Producers and users of BTS and COS are more attuned to causation and do not only look for correlations
The emergence of Big Data presents a paradigm shift. “In 1990 data was scarce, but interpretation
was readily available. In 2013 data was everywhere, but interpretation was scarce” (Van der Veen,
2013). The focus, therefore, has to shift from “collecting to [the] filtering of data”. With Big Data,
you have to “ask yourself what you see” in contrast to “seeing what you asked for” in the case of
the conventional data collection and analysis methods. In the case of the former, “the data defines
the model” and in the second “the model defines the data you want” (Van der Veen, 2013).
The emergence of Big Data led to three shifts in how we think about data – “from some to all,
from clean to messy and from causation to correlation” (Cukier, 2013).
“After reliably providing a swift and accurate account of flu outbreaks for
several winters, the theory-free, data-rich model [of Google Flu Trends] had
lost its nose [in 2013] for where flu was going. Google’s model pointed to a
severe outbreak but when the slow-and-steady data from the CDC [Centre for
Disease Control in the United States] arrived, they showed that Google’s
estimates of the spread of flu-like illnesses were overstated by almost a factor
of two.
The problem was that Google did not know – could not begin to know – what
linked the search terms with the spread of flu. Google’s engineers weren’t
trying to figure out what caused what. They were merely finding
statistical patterns in the data. They cared about correlation rather
than causation. This is common in Big Data analysis. Figuring out what causes
what is hard (impossible, some say). Figuring out what is correlated with what
is much cheaper and easier. That is why, according to Viktor Mayer-
Schönberger and Kenneth Cukier’s book, Big Data, ‘causality won’t be
discarded, but it is being knocked off its pedestal as the primary fountain of
meaning’.
But a theory-free analysis of mere correlations is inevitably fragile. If you have
no idea what is behind a correlation, you have no idea what might cause that
correlation to break down. One explanation of the Flu Trends failure is that the
news was full of scary stories about flu in December 2012 and that these stories
provoked internet searches by people who were healthy. Another possible
explanation is that Google’s own search algorithm moved the goalposts when it
began automatically suggesting diagnoses when people entered medical
symptoms” (Harford, 2014, own emphasis).
According to Marozov (2013)
“the goal of both data mining and predictive analytics is to generate useful
patterns that are far beyond the ability of the human mind to detect or even
explain. In other words, we don't need to inquire why things are the way
they are as long as we can affect them to be the way we want them to
be. …
[However], we can draw a distinction here between Big Data—the stuff of
numbers that thrives on correlations—and Big Narrative—a story-driven,
anthropological approach that seeks to explain why things are the way they are.
Big Data is cheap where Big Narrative is expensive. Big Data is clean where Big
Narrative is messy. Big Data is actionable where Big Narrative is paralyzing”.
(own emphasis)
9
Cukier (2013) cautions opponents of the use of Big Data not to raise the bar too high, as figuring
out the “why” is even with conventionally-generated data often not possible. “Of course, knowing
the causes behind things is desirable. The problem is that causes are often extremely hard to
figure out, and many times, when we think we have identified them, it is nothing more than a self-
congratulatory illusion”. Sinclair (2015) points out that “in the age of Big Data, the challenge will
lie in carefully filtering and analyzing large amounts of information. It will not be enough simply to
gather data; in order to yield meaningful predictions, the data must be placed in an analytical
framework”.
Given their research design and hands-on collection method, BTS and COS perform better than Big
Data in providing the “why”. The competitive advantage of BTS and COS is currently enhanced by
the fact that economists, statisticians and analysts are still closely involved with their generation
and interpretation, whereas Big Data is connected to data scientists and IT engineers. As more
economists, statisticians and analysts engage with Big Data over time, its explanatory power will
likely improve. For instance, Varian (2014: 3) notes that the “collaborations between computer
scientists and statisticians in the last decade or so [were] fruitful, and [he] expects [that]
collaborations between computer scientists and econometricians will also be productive in the
future”.
Final remarks
Analysts, policy makers and business people value BTS and COS specifically because the survey
results are available before the corresponding (official) quantitative data. However, Big Data has
begun to make inroads on areas traditionally covered by BTS and COS. It has a competitive edge
over BTS and COS, as it is available in real-time, is based on all observations, does not rely on the
active participation of respondents and has little direct production cost.
To remain relevant and survive, producers of BTS and COS will have to adapt and publicise their
unique competitive advantage vis-à-vis Big Data in the future. The biggest shift will probably
require that producers of BTS and COS make users more aware of the value of the unique forward
looking information of BTS and COS (i.e. their recording of expectations about the future) instead
of only focussing on their ability and value to provide information about current phenomena before
the corresponding official data becomes available.
Users will also continue to value the long historical time series and the micro data that BTS and
COS could provide for some time into the future. However, this value will erode over time, as the
Big Data series become longer; more economists, statisticians and analysts start to engage with
Big Data in addition to only data scientists as is currently mostly the case; and Big Data producers
overcome formidable logistical and analytical challenges and it becomes possible to link a variety
of private and administrative data sets in the future.
References
AFGC CHEP Retail Index, 2015. [Online]. Available: http://www.afgc.org.au/media-centre/afgc-
chep-retail-index/ [28 August 2015]
BETI - June’s positive numbers off-set previous poor month, 2015. [Web log post and Press
Release]. 8 July. Available: http://www.economists.co.za/blog/index.php?/essays/2015/07/betii-
junes-positive-numbers-off-set-previous-poor-month/. [25 August 2015]
Big Data. 2015. [Online]. Available: https://en.wikipedia.org/wiki/Big_data [22 June 2015]
Big developments in Big Data: how astronomy is driving data science in Africa. 2015. [Online].
Available: http://www.uct.ac.za/dailynews/?id=9358 [30 September 2015]
Choi, H. and Varian, H. 2012. Predicting the Present with Google Trends. Economic Record (issued
by the Economic Society of Australia), vol. 88. June: 2-9.
10
Cukier, K.N. 2013. The Rise of Big Data. Foreign Affairs. May / June. Available:
https://www.foreignaffairs.com/articles/2013-04-03/rise-big-data?page=show [26 August 2015]
Data – overview, 2015. [Online]. Available: https://www.marketpsych.com/guide/data [28 August
2015]
De La Merced, M. 2015. Lawrence Summers to join the board of “hyperdata” start-up. New York
Times. 15 July. Available: http://www.nytimes.com/2015/07/16/business/dealbook/lawrence-
summers-to-join-board-of-hyperdata-start-up.html?_r=1. [25 August 2015]
Einav, L. and Levin, J.D. 2013. The data revolution and economic analysis. NBER Working Paper
19035. May.
Ferreira, K. 2015. Navigating the tricky ethical terrain of Big Data. Business Day. 6 July. Available:
http://www.bdlive.co.za/business/technology/2015/07/06/navigating-the-tricky-ethical-terrain-of-
big-data [28 August 2015]
General Usage Patterns, 2015. [Online]. Available: https://www.marketpsych.com/guide/usage
[26 August 2015]
Gill, T. Perera, D. and Sunner, D. 2012. Electronic Indicators of Economic Activity. Reserve Bank of
Australia Bulletin. June: 1-11.
Harford, T. 2014. Big Data: Are we making a big mistake? Financial Times. 28 March. Available:
http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-
00144feabdc0.html?ftcamp=published_links/rss/magazine/feed//product#axzz2xGX61fMS [26
August 2015]
Hartley, J. 2015. The success of monitoring the economy with Big Data. Huffington Post. 17 March.
Available: http://www.huffingtonpost.com/jon-hartley/the-success-of-
monitoring_b_6875126.html?ir=Business&utm_hp_ref=business. [25 August 2015]
Inflation Series. 2015. [Online]. Available: http://www.pricestats.com/inflation-series [28 August
2015]
INRIX. 2015. [Online]. Available: https://en.wikipedia.org/wiki/INRIX [1 September 2015]
Kearns, J. 2015. Satellite images show Economies growing and shrinking in real time.
BloombergBusiness. 8 July. Available: http://www.bloomberg.com/news/features/2015-07-
08/satellite-images-show-economies-growing-and-shrinking-in-real-time [28 August 2015]
Kroese, B. 2015. Innovation and Big Data at Statistics Netherlands. UNSD lunch time seminar on
“Big Data: How do we meet the expectations?” held on 4 March in New York. Available:
http://unstats.un.org/unsd/statcom/statcom_2015/seminars/big_data/default.html [28 August]
Mayer-Schönberger, V. & Cukier, K. Big Data. 2013. A revolution that will transform how we live,
work and think. [Kobo version]. Available: www.kobo.com. [22 June 2015]
McLaren, N. 2011. Using internet search data as economic indicators. Bank of England Quarterly
Bulletin. Q2:134 – 140.
Morozov, E. 2013. With Big Data surveillance, the government doesn’t need to know “why”
anymore. Slate. 24 June. [Online]. Available:
http://www.slate.com/articles/technology/future_tense/2013/06/with_big_data_surveillance_the_
government_doesn_t_need_to_know_why_anymore.single.html. [26 August 2015]
Our Global Reach. 2015. [Online]. Available: http://www.pricestats.com/approach/data-
composition [28 August 2015]
11
Rental Index, 2015. [Online]. Available: https://za.payprop.com/cgi-
bin/giga.cgi?cmd=rental_index [28 August 2015 2015]
Sinclair, T.M. 2015. Economic Forecasts in the Age of Big Data. [Online]. Available:
http://www.project-syndicate.org/commentary/big-data-economic-forecasts-by-tara-m--sinclair-
2015-08 [28 August]
SWIFT Index. An early fact-based leading indicator for short-term GDP evolution, 2015. [Online].
Available: http://www.swift.com/products_services/swift_index_details [28 August 2015]
Van der Veen, G. 2013. Big Data: Big Opportunity. UNSD Friday seminar on “Big Data for Policy,
Development and Official Statistics” held on 22 February in New York. Available:
http://unstats.un.org/unsd/statcom/statcom_2013/seminars/Big_Data/default.html [26 August
2015]
Varian, H.R. 2014. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives. 28
(2). Spring: 3-28.
Yanofsky, D. 2013. Why traffic jams are the sign of a healthy economy. Quartz. 30 July. [Online].
Available: http://qz.com/109859/why-traffic-jams-are-the-sign-of-a-healthy-economy/. [25
August 2015]
Zillow Real Estate and Rental Data: Why we are different, 2015. [Online]. Available:
http://www.zillow.com/research/data/ [28 August 2015]

Contenu connexe

Tendances

Free Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USA
Free Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USAFree Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USA
Free Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USA
OpenGISData
 

Tendances (20)

Big Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
Big Data in the Fund Industry: From Descriptive to Prescriptive Data AnalyticsBig Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
Big Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
 
Open Innovation - Winter 2014 - Socrata, Inc.
Open Innovation - Winter 2014 - Socrata, Inc.Open Innovation - Winter 2014 - Socrata, Inc.
Open Innovation - Winter 2014 - Socrata, Inc.
 
Big data for the next generation of event companies
Big data for the next generation of event companiesBig data for the next generation of event companies
Big data for the next generation of event companies
 
Government’s Digital Strategy Could Cut Red Tape
Government’s Digital Strategy Could Cut Red TapeGovernment’s Digital Strategy Could Cut Red Tape
Government’s Digital Strategy Could Cut Red Tape
 
Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?
 
Big data and value creation
Big data and value creationBig data and value creation
Big data and value creation
 
Io t white-paper-final-fr-1
Io t white-paper-final-fr-1Io t white-paper-final-fr-1
Io t white-paper-final-fr-1
 
Dark Data Revelation and its Potential Benefits
Dark Data Revelation and its Potential BenefitsDark Data Revelation and its Potential Benefits
Dark Data Revelation and its Potential Benefits
 
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
 
Making the world a better place, one analysis at a time
Making the world a better place, one analysis at a timeMaking the world a better place, one analysis at a time
Making the world a better place, one analysis at a time
 
Analytical Storytelling: From Insight to Action
Analytical Storytelling: From Insight to ActionAnalytical Storytelling: From Insight to Action
Analytical Storytelling: From Insight to Action
 
Free Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USA
Free Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USAFree Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USA
Free Data Isn't the Same as Freeing Data - Mark Madsen - Third nature, USA
 
Tomorrow's Data Heros
Tomorrow's Data HerosTomorrow's Data Heros
Tomorrow's Data Heros
 
Ten IT-enabled business trends for the decade ahead
Ten IT-enabled business trends for the decade aheadTen IT-enabled business trends for the decade ahead
Ten IT-enabled business trends for the decade ahead
 
Defining Digital Transformation - the research
Defining Digital Transformation - the researchDefining Digital Transformation - the research
Defining Digital Transformation - the research
 
Big Data at #WADAY11
Big Data at #WADAY11 Big Data at #WADAY11
Big Data at #WADAY11
 
The Economic Crisis: Danger AND Opportunity
The Economic Crisis: Danger AND OpportunityThe Economic Crisis: Danger AND Opportunity
The Economic Crisis: Danger AND Opportunity
 
Achieve Federal Open Data Policy Compliance - Slides
Achieve Federal Open Data Policy Compliance - SlidesAchieve Federal Open Data Policy Compliance - Slides
Achieve Federal Open Data Policy Compliance - Slides
 
Big Data Predictions ebook
Big Data Predictions ebookBig Data Predictions ebook
Big Data Predictions ebook
 
Open Data Value Framework: Open Data's Four Pillars of Value
Open Data Value Framework: Open Data's Four Pillars of ValueOpen Data Value Framework: Open Data's Four Pillars of Value
Open Data Value Framework: Open Data's Four Pillars of Value
 

En vedette

En vedette (14)

El trabajo del futuro
El trabajo del futuroEl trabajo del futuro
El trabajo del futuro
 
Preguntas de diagnostico
Preguntas de diagnosticoPreguntas de diagnostico
Preguntas de diagnostico
 
Look 3
Look 3Look 3
Look 3
 
Que son delitos informáticos #2
Que son delitos informáticos #2Que son delitos informáticos #2
Que son delitos informáticos #2
 
Planeación método alfabético
Planeación  método alfabético Planeación  método alfabético
Planeación método alfabético
 
Diagrama de flujo Margarita Gómez Palacio
Diagrama de flujo Margarita Gómez Palacio Diagrama de flujo Margarita Gómez Palacio
Diagrama de flujo Margarita Gómez Palacio
 
Presentation_NEW.PPTX
Presentation_NEW.PPTXPresentation_NEW.PPTX
Presentation_NEW.PPTX
 
Images (2) (4)
Images (2) (4)Images (2) (4)
Images (2) (4)
 
Resumen respuestas ejercicio 3
Resumen respuestas ejercicio 3Resumen respuestas ejercicio 3
Resumen respuestas ejercicio 3
 
ANALISIS PERUBAHAN TINGKAHLAKU( Dampak Game Online Terhadap Perilaku Sosial R...
ANALISIS PERUBAHAN TINGKAHLAKU( Dampak Game Online Terhadap Perilaku Sosial R...ANALISIS PERUBAHAN TINGKAHLAKU( Dampak Game Online Terhadap Perilaku Sosial R...
ANALISIS PERUBAHAN TINGKAHLAKU( Dampak Game Online Terhadap Perilaku Sosial R...
 
Perfil Nutricional de México
Perfil Nutricional de MéxicoPerfil Nutricional de México
Perfil Nutricional de México
 
Micro Nutrients and their Deficiency by Dr. Sookun Rajeev Kumar
Micro Nutrients and their Deficiency by Dr. Sookun Rajeev KumarMicro Nutrients and their Deficiency by Dr. Sookun Rajeev Kumar
Micro Nutrients and their Deficiency by Dr. Sookun Rajeev Kumar
 
DROGAS TERAPEUTICAS
DROGAS TERAPEUTICASDROGAS TERAPEUTICAS
DROGAS TERAPEUTICAS
 
Adicciones en la adolescencia.
Adicciones en la adolescencia.Adicciones en la adolescencia.
Adicciones en la adolescencia.
 

Similaire à The implications of Big Data for BTS and COS

201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014
Francisco Calzado
 
CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DD
David Darrough
 
From Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data AFrom Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data A
gwdeodhar
 

Similaire à The implications of Big Data for BTS and COS (20)

Big Data - Big Deal? - Edison's Academic Paper in SMU
Big Data - Big Deal? - Edison's Academic Paper in SMUBig Data - Big Deal? - Edison's Academic Paper in SMU
Big Data - Big Deal? - Edison's Academic Paper in SMU
 
Age Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big DataAge Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big Data
 
Big data 1 4 vint-sogeti-on-big-data-1-of-4-creating clarity with big data
Big data 1 4 vint-sogeti-on-big-data-1-of-4-creating clarity with big dataBig data 1 4 vint-sogeti-on-big-data-1-of-4-creating clarity with big data
Big data 1 4 vint-sogeti-on-big-data-1-of-4-creating clarity with big data
 
Sogeti on big data creating clarity
Sogeti on big data creating claritySogeti on big data creating clarity
Sogeti on big data creating clarity
 
Australia bureau of statistics some initiatives on big data - 23 july 2014
Australia bureau of statistics   some initiatives on big data - 23 july 2014Australia bureau of statistics   some initiatives on big data - 23 july 2014
Australia bureau of statistics some initiatives on big data - 23 july 2014
 
Exploiting the Internet of Things
Exploiting the Internet of ThingsExploiting the Internet of Things
Exploiting the Internet of Things
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...
 
The dawn of Big Data
The dawn of Big DataThe dawn of Big Data
The dawn of Big Data
 
Sogeti on big data creating clarity - Report 1-4 on Big Data - Sogeti ViNT
 Sogeti on big data creating clarity - Report 1-4 on Big Data - Sogeti ViNT Sogeti on big data creating clarity - Report 1-4 on Big Data - Sogeti ViNT
Sogeti on big data creating clarity - Report 1-4 on Big Data - Sogeti ViNT
 
Carousel30: Big data for digital marketers
Carousel30: Big data for digital marketersCarousel30: Big data for digital marketers
Carousel30: Big data for digital marketers
 
Big data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketingBig data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketing
 
201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014201404 White Paper Digital Universe 2014
201404 White Paper Digital Universe 2014
 
Gartner eBook on Big Data
Gartner eBook on Big DataGartner eBook on Big Data
Gartner eBook on Big Data
 
Big data and digital transformation
Big data and digital transformationBig data and digital transformation
Big data and digital transformation
 
CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DD
 
The Future of Big Data
The Future of Big Data The Future of Big Data
The Future of Big Data
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
130214 copy
130214   copy130214   copy
130214 copy
 
From Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data AFrom Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data A
 
Big Data-Job 2
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
 

Plus de George Kershoff

The BER's business tendency surveys in South Africa: method and results
The BER's business tendency surveys in South Africa: method and resultsThe BER's business tendency surveys in South Africa: method and results
The BER's business tendency surveys in South Africa: method and results
George Kershoff
 

Plus de George Kershoff (6)

The implications of Big Data for BTS and COS
The implications of Big Data for BTS and COSThe implications of Big Data for BTS and COS
The implications of Big Data for BTS and COS
 
Conducting financial sector surveys in South Africa
Conducting financial sector surveys in South AfricaConducting financial sector surveys in South Africa
Conducting financial sector surveys in South Africa
 
An assessment of the the BER's manufacturing survey in South Africa
An assessment of the the BER's manufacturing survey in South AfricaAn assessment of the the BER's manufacturing survey in South Africa
An assessment of the the BER's manufacturing survey in South Africa
 
Conducting financial sector surveys in South Africa
Conducting financial sector surveys in South AfricaConducting financial sector surveys in South Africa
Conducting financial sector surveys in South Africa
 
An assessment of the the BER's manufacturing survey in South Africa
An assessment of the the BER's manufacturing survey in South AfricaAn assessment of the the BER's manufacturing survey in South Africa
An assessment of the the BER's manufacturing survey in South Africa
 
The BER's business tendency surveys in South Africa: method and results
The BER's business tendency surveys in South Africa: method and resultsThe BER's business tendency surveys in South Africa: method and results
The BER's business tendency surveys in South Africa: method and results
 

Dernier

VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort : 9352852248 Make on-demand Arrangements Near yOU
 
VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...
VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...
VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
dipikadinghjn ( Why You Choose Us? ) Escorts
 

Dernier (20)

falcon-invoice-discounting-unlocking-prime-investment-opportunities
falcon-invoice-discounting-unlocking-prime-investment-opportunitiesfalcon-invoice-discounting-unlocking-prime-investment-opportunities
falcon-invoice-discounting-unlocking-prime-investment-opportunities
 
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
VIP Call Girl in Mumbai Central 💧 9920725232 ( Call Me ) Get A New Crush Ever...
 
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
From Luxury Escort Service Kamathipura : 9352852248 Make on-demand Arrangemen...
 
Top Rated Pune Call Girls Shikrapur ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
Top Rated  Pune Call Girls Shikrapur ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...Top Rated  Pune Call Girls Shikrapur ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
Top Rated Pune Call Girls Shikrapur ⟟ 6297143586 ⟟ Call Me For Genuine Sex S...
 
VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...
VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...
VIP Call Girl in Thane 💧 9920725232 ( Call Me ) Get A New Crush Everyday With...
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdf
 
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
 
Gurley shaw Theory of Monetary Economics.
Gurley shaw Theory of Monetary Economics.Gurley shaw Theory of Monetary Economics.
Gurley shaw Theory of Monetary Economics.
 
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
Navi Mumbai Cooperetive Housewife Call Girls-9833754194-Natural Panvel Enjoye...
 
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
 
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
Kharghar Blowjob Housewife Call Girls NUmber-9833754194-CBD Belapur Internati...
 
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
 
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
 
Top Rated Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Dighi ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
 
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
VIP Independent Call Girls in Mira Bhayandar 🌹 9920725232 ( Call Me ) Mumbai ...
 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdf
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
 

The implications of Big Data for BTS and COS

  • 1. 1 The implications of Big Data for BTS and COS George Kershoff Bureau for Economic Research (BER), Stellenbosch University, South Africa Presented at the 7th joint EC-OECD workshop on “Recent developments in Business and Consumer Surveys” held in Paris on 30 November and 1 December 2015 Abstract In the analogue era information was scarce and came from questionnaires and sampling. Since the dawn of the digital age in 2012 far more data than ever before is stored and it is mainly collected passively, i.e. while people go about doing what they normally do, such as run their businesses, use their cell phones and conduct internet searches. Analysts, policy makers and business people value business tendency surveys (BTS) and consumer opinion surveys (COS) specifically because the survey results are available before the corresponding (official) quantitative data. However, Big Data has begun to make inroads on areas traditionally covered by BTS and COS. It has a competitive edge over BTS and COS, as it is available in real-time, is based on all observations and does not rely on the active participation of respondents. Furthermore, Big Data has little direct production costs, because it is merely a by- product of business processes. In contrast, putting together and maintaining a sample of active respondents and collecting information through questionnaires as in the case of BTS and COS, require the upkeep of a costly infrastructure and the employment of people with scarce, specialised skills. However, BTS and COS also have a competitive edge over Big Data in certain aspects. These aspects could broadly be put into two groups, namely 1) BTS and COS offer information that Big Data cannot supply and 2) BTS and COS do not suffer from some of the shortcomings of Big Data. The biggest competitive advantage of BTS and COS is that they measure phenomenon that Big Data does not cover. Big Data records only actual outcomes, while BTS and COS also cover unquantifiable expectations and assessments. Although Big Data often claims that it covers the whole population universe (and not only a selection) this does not necessarily prevent bias. For example, twitter feeds could be biased, because certain demographic or less activist groups are under-represented. In contrast, the research design and random sampling of BTS and COS limit their selection bias. To remain relevant and survive, producers of BTS and COS will have to adapt and publicise their unique competitive advantage vis-à-vis Big Data in the future. The biggest shift will probably require that producers of BTS and COS make users more aware of the value of the unique forward looking information of BTS and COS (i.e. their recording of expectations about the future).
  • 2. 2 Introduction Awareness of Big Data has risen particularly sharply over the last two years or so. The publication of Mayer-Schönberger and Cukier’s book “Big Data” and its shortlisting for the Financial Times’ Business Book of the Year award in 2013 have introduced the phenomenon to many more people. Edward Snowden’s (a former CIA employee) leaking of classified information from the U.S. National Security Agency (NSA) in 2013, which revealed the existence of numerous global surveillance programs, also made the general public more aware of the existence of Big Data. Official statistical agencies have been investigating how to better incorporate more administrative data and Big Data in the production of official statistics for some time. In addition to the long-time developers and users of Big Data (such as Google and the other search engines, Amazon, other tech companies and pioneers in some other industries, such as the US retailer, Target), of late many more private sector firms have begun to realise and investigate the business potential of Big Data. This paper considers the implication of Big Data for business tendency surveys (BTS) and consumer opinion surveys (COS). Although more people talk about Big Data nowadays, they often mean different things. So, this papers starts off with a description of the different aspects and applications of Big Data in order to better define and apply it to BTS and COS. This will, for instance, show that some forms of Big Data, such as internet search terms, have turned out to be unstable and therefore a less reliable source of information than expected previously. In contrast, if formidable logistical and analytical challenges could be overcome and it becomes possible to link a variety of private and administrative data sets (i.e. integrate heterogeneous data resources), the promise of providing real-time information about a very big part of the population universe could become a reality (Big Data, 2015). Given such developments, what are the advantages and disadvantages of BTS and COS compared to Big Data? How ought BTS and COS adapt to continue to remain relevant, valuable and viable in the long run? This paper should be treated as a discussion document rather than an expert opinion piece. It intends to stimulate and create awareness among those responsible for BTS and COS in countries that have not yet been affected much by Big Data developments and get feedback from those that have been affected more. Technological developments and the emergence of Big Data Over the past 15 years the world has seen the exponential growth of the size and speed of computer processing power, networks and storage. At the same time, the increased use of the internet, the digitalisation of business processes and the rise in the number of mobile devices and sensors have led to an explosion in the volume of data generated. Big Data has developed out of these two threads. At its core, “Big Data is about predictions. … it is about applying math to huge quantities of data in order to infer probabilities: that an email message is spam; that the typed letters ‘teh' are supposed to be ‘the’ …” (Mayer-Schönberger & Cukier, 2013). Big Data is not only used by internet search engines, but also by governments, manufacturers, the media, retailers and other private firms to “spot business trends, prevent diseases, combat crime and so on” and to better “target advertising and products at consumers” (Big Data, 2015). According to Gartner "Big Data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" (Big Data, 2015). Big Data refers to the large, complex data sets – created and collected through technology (Big developments in Big Data: how astronomy is driving data science in Africa, 2015). According to Cukier (2013) datafication, i.e. “the ability to render into data many aspects of the world that have never been quantified before”, led to the growth in the size of the available data.
  • 3. 3 “For example, location has been datafied, first with the invention of longitude and latitude, and more recently with GPS satellite systems. Words are treated as data when computers mine centuries’ worth of books. Even friendships and ‘likes’ are datafied, via Facebook”. “Datafication is not the same as digitization, which takes analog content -- books, films, photographs -- and converts it into digital information, a sequence of ones and zeros that computers can read. Datafication is a far broader activity: taking all aspects of life and turning them into data. Google’s augmented-reality glasses datafy the gaze. Twitter datafies stray thoughts. LinkedIn datafies professional networks”. This data has been put to new uses “with the assistance of inexpensive computer memory, powerful processors, smart algorithms, clever software, and math that borrows from basic statistics. Instead of trying to ‘teach’ a computer how to do things, such as drive a car or translate between languages, which artificial-intelligence experts have tried unsuccessfully to do for decades, the new approach is to feed enough data into a computer so that it can infer the probability that, say, a traffic light is green and not red or that, in a certain context, lumière is a more appropriate substitute for ‘light’ than ‘léger’ ”. Different data collection methods In the analogue era information was scarce and came from questionnaires and sampling. Mankind’s ability to measure limited it to measure only the most important things. Exactness was crucial, because “when data was sparse, every data point was critical, and thus great care was taken to avoid letting any point bias the results” (Mayer-Schönberger & Cukier, 2013). The current BTS COS method comes from this era, as it consists of the (active) quizzing of a selection (sample) of respondents. Since the dawn of the digital age in 2012 far more data can be analysed – in some cases, all data related to a particular phenomenon. Furthermore, the data is collected passively, i.e. while people go about doing what they normally do, such as run their business, drive and walk around (and thereby unknowingly trigger sensors), use their cell phones, conduct internet searches or use their retail store loyalty cards. According to Harford (2014), the Big Data “that interests many companies is what we might call ‘found data’, the digital exhaust of web searches, credit card payments and mobiles pinging the nearest phone mast”. Big Data is “cheap to collect relative to [its] size, … [is] a messy collage of datapoints collected for disparate purposes and … can be updated in real time”. Ferreira (2015) points out that “telematics in car tracking and vehicle monitoring devices — and even toll gantries — gather information about where and how [people] drive. Some businesses, such as freight operators, use this technology to protect their assets while others, such as insurance firms, use it to incentivise good driving or recover stolen vehicles.” The combination of GPS devices, wearables (e.g. pedometers and training watches) and smartphones have the potential to produce an exponential growth in behavioural data of people’s favourite routes and routines. In contrast to most traditional survey methods, Big Data is collected as a by-product of normal activity, rather than requiring individuals or firms to respond to survey questions after the event. One could therefore say that Big Data mainly comes from “back-end operations” (Cukier, 2013).
  • 4. 4 Implications of Big Data for BTS and COS Competitive edge of Big Data Real time availability The fact that BTS and COS survey results are available before the corresponding (official) quantitative data, make these results particularly valuable to analysts, policy makers and business people. However, Big Data has begun to make inroads on areas traditionally covered by BTS and COS and its real-time availability gives it a competitive edge over BTS and COS. (See the box below.) Selected examples of how Big Data is used to monitor the current performance of the macro- economy Proprietary (private sector) data sources In the United States, MarketPsych (www.marketpsych.com) is analysing the datafied text of tweets and works together with Thomson Reuters to produce different indices across many countries, updated every minute, on emotional states such as optimism and gloom, i.e. consumer confidence. The Thomson Reuters MarketPsych Indices (TRMI) analyse: “news and social media in real-time to convert the volume and variety of professional news and the internet into manageable information” and “two key types of indicators are provided [1] Emotional indicators, such as Gloom, Fear and Joy [2] Buzz metrics that indicate how much something is being talked about in the news and social media. These include macroeconomic themes such as the level of talk about Litigation, Mergers and Volatility (Data – overview, 2015) and “has shown itself to be predictive of the flash PMI and end-of-month PMI values” (General Usage Patterns, 2015). SWIFT (Society for Worldwide Interbank Financial Telecommunication) (www.swift.com), the global interbank payments processor, found that their data correlates well with global economic activity, i.e. business confidence. The SWIFT Index (SWIFT Index. An early fact-based leading indicator for short-term GDP evolution, 2015): “is available on a monthly basis and [provides] a reliable indication of the economic activity evolution based on the volume of SWIFT customer payments messages (MT103). The global coverage of this traffic makes it a reliable barometer of the GDP evolution at Global, Regional and sometimes National level. The SWIFT Index measures the variation of the volumes of sent customer payments messages excluding the impact of events not linked to economic activity”. In South Africa, the monthly Bankserv Africa Economic Transaction Index (BETI) (www.economists.co.za) uses interbank payments data to compile a tracker of current economic activity (BETI - June’s positive numbers off- set previous poor month, 2015). Gill, Perera and Sunner (2012) applied this method in Australia. INRIX (www.inrix.com), a provider of a variety of internet services and mobile applications for road traffic and driver services in the United States, Europe and various other countries (INRIX, 2015), uses their data as a proxy for economic activity. They argue that traffic data is a good proxy, as “it measures people going to work and deliveries being made. While the capacity of a road changes infrequently, the commerce and people that use them fluctuate” (Yanofsky, 2013). CHEP, a company that provides the pallets used in industrial and retail supply chains, generates huge volumes of data during their operations. They operate in countries such as Australia, the United States and South Africa. In Australia, the AFGC (Australian Food and Grocery Council) CHEP Retail Index (www.afgc.org.au) (2015) has turned out to be a reliable indicator of retail sales and is available ahead of the corresponding number produced by the Australian Bureau for Statistics (ABS). The index is based on CHEP pallet movements and covers 10 million data points and 10 000 customer accounts.
  • 5. 5 A San Francisco based start-up, SpaceKnow, is using satellite imagery to track production (Sinclair, 2015). They created an index of Chinese factory production using algorithms to monitor more than 6 000 industrial facilities and hope to supplant survey-based purchasing manager indexes (PMIs) with software that identifies signs of economic activity, such as transport vehicles in parking lots. They plan to eventually track all the world’s trucks, ships, mines and warehouses (Kearns, 2015). The “Billion Prices Project” (BPP) is an initiative of Alberto Cavallo and Robert Rigobon of the MIT Sloan School of Management where they use software to crawl (scrape) the internet to collect the prices of products sold online. The BPP daily inflation indices cover nearly 70 countries and use daily price fluctuations of over five million items sold in over 300 online retailers and are available through a commercial venture, PriceStats (www.pricestats.com) and State Street (Inflation Series, 2015; Our Global Reach, 2015; Mayer-Schönberger & Cukier, 2013; Hartley, 2015). In South Africa, the Pre-CPI release of the Inflation Factory (www.theinflationfactory.com) is based on the prices of products offered on the internet for sale in South Africa. An app of the start-up Premise (www.premise.com) uses paid individuals to use their smartphones to take pictures of the price tags of food and other local goods around the world and then record these prices. In 2015, an arm of the United Nations found that Premise can compile a monthly food consumer price index for Brazil by up to 25 days before the official government agency (De La Merced, 2015). The biggest potential benefit of Premise’s app is that it is rolled out in countries with underdeveloped national statistics (Hartley, 2015). A start-up, Real Time Macroeconomics (www.realtimemacroeconomics.com), is using online data to monitor the labour market (such as job openings, layoff announcements, wage growth and employment) in real-time. Their indices are built to closely follow the respective US Bureau of Labour Statistics indices, but have the added flexibility of near real-time data delivery and more granularity (Hartley, 2015). Indeed (www.indeed.com), a job search company, produces similar data (Sinclair, 2015). Zillow (www.zillow.com), an online real-estate service provider in the United States, collects information about home sales and mortgages (Sinclair, 2015). “At Zillow’s core is [their] living database of more than 100 million U.S. homes, featuring both public and user-generated information including number of bedrooms and bathrooms, tax assessments, home sales and listing data of homes for sale and for rent. This data allows [them] to calculate, among other indicators, the Zestimate, a highly accurate, automated, estimated value of almost every home in the country as well as the Zillow Home Value Index and Zillow Rent Index, leading measures of median home values and rents”. (Zillow Real Estate and Rental Data: Why we are different, 2015). In South Africa, the PayProp Rental Index is compiled from real-time transaction data and provides a comprehensive overview of the state of the residential rental market in the country (Rental Index, 2015). Official Statistics and Administrative data sources Official statistical agencies have recognised the potential value of administrative and other forms of Big Data sets and have started investing heavily in searching for ways to harvest this information. Statistics Netherlands successfully uses traffic loop data for transport statistics and mobile phone data for daytime population statistics (Kroese, 2015). Internet search data Google Trends (www.google.com/trends) provide a real-time daily and weekly index of the volume of queries that users enter into Google. In their seminal work in 2009, Choi and Varian (2012) showed how this search engine data could be used to forecast near-term values of economic indicators. They included examples of motor car sales, unemployment claims, travel destination planning and consumer confidence. Gill, Perera and Sunner (2012) found similar positive results in Australia. Likewise, McLaren (2011) showed that internet search data could be applied to estimate certain labour and housing trends in the UK. Passive data collection of all the data related to a particular phenomenon Another competitive advantage of Big Data vis-à-vis BTS and COS is that it is based on all observations of a particular phenomenon and does not rely on the active participation of respondents. Big Data, therefore, does not suffer from the same sampling and non-sampling errors than BTS and COS.
  • 6. 6 This has become an even bigger advantage of late, as all surveys – including BTS and COS – that actively collect data (i.e. a selection of respondents have to complete a questionnaire or participate in a telephone interview) increasingly struggle to sustain participation. Response rates have suffered, because it has become more difficult to attract people’s attention and encourage participation, as more and more things lay claim to their finite time. This is largely the result of technological developments that have led to a situation in which people are constantly bombarded with e-mails, social media notifications, requests and other distractions. So far, these developments have not affected all firm sizes and demographic groups equally. For instance, in emerging economies respondents at micro and small sized firms were affected to a lesser extent than those at medium sized and large firms. This is in part due to the fact that the former generally does not have broadband internet access and e-mail (having to use their cell phones instead) and is therefore less distracted. Likewise, for a while younger people and high income earners adopted the new technologies faster than the older generation and low income earners and were therefore more distracted and less willing to participate in COS. Some institutions that use market research firms to conduct the fieldwork for COS on their behalf have also noticed how Big Data has adversely affected a part of the business model of these firms. Big Data has eroded the demand for survey-based market research, as many of the large retailers and banks nowadays do not require the services of a market research firm, because they can use their own transaction records and other internal data (e.g. collected through loyalty cards) to track their customers’ preferences. Financial considerations While BTS and COS are costly to conduct, Big Data has little direct production cost, because it is merely a by-product of business processes or an input to deliver the actual product or service. The financial survival of institutions that receive public funding to conduct BTS and COS are not at risk as long as the governments in these countries continue to value and fund them. In contrast, Big Data poses a more immediate risk to the funding of institutions that depend on private sector support. Many of these institutions have adopted sponsorships to finance the production of BTS and COS, because the income out of the sale of survey results reports largely dried up when internet use became more widespread about a decade ago. In terms of this funding model, private firms support the surveys financially in exchange for the right to attach their name to the survey (media rights) and have their people make the results public. These firms then get valuable exposure and publicity through the media coverage of the survey results. The media coverage, in turn, depends on the track record, early availability and market moving potential of the data. Big Data has made it easier for far more people to produce indices that track the performance of the macro-economy than before. Nearly all that is required to compile an index is the data (with the personal identifiable information removed if applicable) and some elementary analysis. In contrast, putting together and maintaining a sample of active respondents and collecting information through questionnaires, require the upkeep of a costly infrastructure and the employment of people with scarce, specialised skills. BTS and COS face additional competition if it could be shown that an index compiled from Big Data has the same or even better track record (relative to a widely accepted benchmark, which is currently mostly the official quantitative data series). The competition becomes even fiercer if it is possible to produce indices based on Big Data faster and at a more disaggregated (granular) level than the BTS and COS ones. While policy makers will mostly remain interested in the aggregate results, private firms would rather demand the disaggregated (e.g. the demand for a specific product or service in a given geographical area) than the aggregate results, because from their perspective the aggregate is only a proxy for the local demand of their product or service.
  • 7. 7 Competitive edge of BTS and COS BTS and COS also have a competitive edge over Big Data in certain aspects. These aspects could broadly be put into two groups, namely 1) BTS and COS offer information that Big Data cannot supply and 2) BTS and COS do not suffer from some of the shortcomings of Big Data. BTS and COS cover variables that Big Data cannot directly record The biggest competitive advantage of BTS and COS vis-à-vis Big Data is that they measure phenomenon that Big Data does not cover. Big Data records only actual outcomes, while BTS and COS also cover unquantifiable expectations and assessments. Fluctuations in these “feelings” of business people and households have proven to be key determinants of their investment and spending / saving decisions, which in turn propel the business cycle. The use / application and value of BTS and COS will therefore likely shift from being primarily early indicators (surveys of what happened in the current period) to measures of expectations (what respondents expect for the next period), because the latter is not passively revealed and therefore tracked by Big Data. It is herewith assumed that the survey expectation data performs better (i.e. they provide additional information) than a simple autoregressive lag of the data (i.e. the best univariate forecast for y t+1 is not simply y t). The length of historical time series will provide BTS and COS with another competitive edge, at least until the time series from Big Data become sufficiently long at some time in the future. The long time series of BTS and COS are particularly valued, as historical cycles and relationships provide handy clues about the future. The micro data (i.e. individual responses with the personal identifiable information removed, but with the sector, size and region preserved) is another unique offer of BTS, as this makes it possible to study the same respondent over time. BTS and COS do not suffer from the same shortcomings than some Big Data applications BTS and COS offer representative samples instead of potentially biased selections of all the data related to a particular phenomena Although Big Data often covers all users / data (compared to only a selection / sample as in the case of BTS and COS), this does not necessarily prevent bias. For instance, twitter feeds could be biased, because certain demographic or less activist groups are under-represented. Another example would be data stemming from vehicle monitoring devices. Even though all the vehicles with such devices could be tracked, these devices are not fitted in all the vehicles in a country. In contrast, the research design and random sampling limit the bias (i.e. enable one to state with 95% confidence that the true value falls within a certain range, for instance) in the selection of respondents in BTS and COS. However, this benefit of impartiality of BTS and COS is not unconditional and is only sustained if the response rate is relatively high and the research design provides for the known under-representation in the sample of, for instance young people, people without land line telephone numbers or small firms. Furthermore, it needs to be acknowledged that the increased struggle to achieve acceptable response rates discussed earlier tempers this competitive edge of BTS and COS. Of the different kinds of Big Data, internet search data has attracted the largest share of criticism, particularly about representativeness. According to Gill, Perera and Sunner (2012: 10), the shortcoming of internet search data includes its “relatively short history, the possibly unrepresentative nature of the sample given the variation in internet use across different groups by age and income, and the likelihood of considerable noise in the data (owing to factors such as changes in the market share of firms like Amazon, and changes in search terms and behaviour)”. Big Data could overcome this problem if more heterogeneous data sources are integrated. However, at present this is more a promising new frontier than a reality, as the integration of
  • 8. 8 diverse data sources presents formidable logistical and analytical challenges (Big Data, 2015, Einav & Levin, 2013: 25). Producers and users of BTS and COS are more attuned to causation and do not only look for correlations The emergence of Big Data presents a paradigm shift. “In 1990 data was scarce, but interpretation was readily available. In 2013 data was everywhere, but interpretation was scarce” (Van der Veen, 2013). The focus, therefore, has to shift from “collecting to [the] filtering of data”. With Big Data, you have to “ask yourself what you see” in contrast to “seeing what you asked for” in the case of the conventional data collection and analysis methods. In the case of the former, “the data defines the model” and in the second “the model defines the data you want” (Van der Veen, 2013). The emergence of Big Data led to three shifts in how we think about data – “from some to all, from clean to messy and from causation to correlation” (Cukier, 2013). “After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model [of Google Flu Trends] had lost its nose [in 2013] for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC [Centre for Disease Control in the United States] arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two. The problem was that Google did not know – could not begin to know – what linked the search terms with the spread of flu. Google’s engineers weren’t trying to figure out what caused what. They were merely finding statistical patterns in the data. They cared about correlation rather than causation. This is common in Big Data analysis. Figuring out what causes what is hard (impossible, some say). Figuring out what is correlated with what is much cheaper and easier. That is why, according to Viktor Mayer- Schönberger and Kenneth Cukier’s book, Big Data, ‘causality won’t be discarded, but it is being knocked off its pedestal as the primary fountain of meaning’. But a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down. One explanation of the Flu Trends failure is that the news was full of scary stories about flu in December 2012 and that these stories provoked internet searches by people who were healthy. Another possible explanation is that Google’s own search algorithm moved the goalposts when it began automatically suggesting diagnoses when people entered medical symptoms” (Harford, 2014, own emphasis). According to Marozov (2013) “the goal of both data mining and predictive analytics is to generate useful patterns that are far beyond the ability of the human mind to detect or even explain. In other words, we don't need to inquire why things are the way they are as long as we can affect them to be the way we want them to be. … [However], we can draw a distinction here between Big Data—the stuff of numbers that thrives on correlations—and Big Narrative—a story-driven, anthropological approach that seeks to explain why things are the way they are. Big Data is cheap where Big Narrative is expensive. Big Data is clean where Big Narrative is messy. Big Data is actionable where Big Narrative is paralyzing”. (own emphasis)
  • 9. 9 Cukier (2013) cautions opponents of the use of Big Data not to raise the bar too high, as figuring out the “why” is even with conventionally-generated data often not possible. “Of course, knowing the causes behind things is desirable. The problem is that causes are often extremely hard to figure out, and many times, when we think we have identified them, it is nothing more than a self- congratulatory illusion”. Sinclair (2015) points out that “in the age of Big Data, the challenge will lie in carefully filtering and analyzing large amounts of information. It will not be enough simply to gather data; in order to yield meaningful predictions, the data must be placed in an analytical framework”. Given their research design and hands-on collection method, BTS and COS perform better than Big Data in providing the “why”. The competitive advantage of BTS and COS is currently enhanced by the fact that economists, statisticians and analysts are still closely involved with their generation and interpretation, whereas Big Data is connected to data scientists and IT engineers. As more economists, statisticians and analysts engage with Big Data over time, its explanatory power will likely improve. For instance, Varian (2014: 3) notes that the “collaborations between computer scientists and statisticians in the last decade or so [were] fruitful, and [he] expects [that] collaborations between computer scientists and econometricians will also be productive in the future”. Final remarks Analysts, policy makers and business people value BTS and COS specifically because the survey results are available before the corresponding (official) quantitative data. However, Big Data has begun to make inroads on areas traditionally covered by BTS and COS. It has a competitive edge over BTS and COS, as it is available in real-time, is based on all observations, does not rely on the active participation of respondents and has little direct production cost. To remain relevant and survive, producers of BTS and COS will have to adapt and publicise their unique competitive advantage vis-à-vis Big Data in the future. The biggest shift will probably require that producers of BTS and COS make users more aware of the value of the unique forward looking information of BTS and COS (i.e. their recording of expectations about the future) instead of only focussing on their ability and value to provide information about current phenomena before the corresponding official data becomes available. Users will also continue to value the long historical time series and the micro data that BTS and COS could provide for some time into the future. However, this value will erode over time, as the Big Data series become longer; more economists, statisticians and analysts start to engage with Big Data in addition to only data scientists as is currently mostly the case; and Big Data producers overcome formidable logistical and analytical challenges and it becomes possible to link a variety of private and administrative data sets in the future. References AFGC CHEP Retail Index, 2015. [Online]. Available: http://www.afgc.org.au/media-centre/afgc- chep-retail-index/ [28 August 2015] BETI - June’s positive numbers off-set previous poor month, 2015. [Web log post and Press Release]. 8 July. Available: http://www.economists.co.za/blog/index.php?/essays/2015/07/betii- junes-positive-numbers-off-set-previous-poor-month/. [25 August 2015] Big Data. 2015. [Online]. Available: https://en.wikipedia.org/wiki/Big_data [22 June 2015] Big developments in Big Data: how astronomy is driving data science in Africa. 2015. [Online]. Available: http://www.uct.ac.za/dailynews/?id=9358 [30 September 2015] Choi, H. and Varian, H. 2012. Predicting the Present with Google Trends. Economic Record (issued by the Economic Society of Australia), vol. 88. June: 2-9.
  • 10. 10 Cukier, K.N. 2013. The Rise of Big Data. Foreign Affairs. May / June. Available: https://www.foreignaffairs.com/articles/2013-04-03/rise-big-data?page=show [26 August 2015] Data – overview, 2015. [Online]. Available: https://www.marketpsych.com/guide/data [28 August 2015] De La Merced, M. 2015. Lawrence Summers to join the board of “hyperdata” start-up. New York Times. 15 July. Available: http://www.nytimes.com/2015/07/16/business/dealbook/lawrence- summers-to-join-board-of-hyperdata-start-up.html?_r=1. [25 August 2015] Einav, L. and Levin, J.D. 2013. The data revolution and economic analysis. NBER Working Paper 19035. May. Ferreira, K. 2015. Navigating the tricky ethical terrain of Big Data. Business Day. 6 July. Available: http://www.bdlive.co.za/business/technology/2015/07/06/navigating-the-tricky-ethical-terrain-of- big-data [28 August 2015] General Usage Patterns, 2015. [Online]. Available: https://www.marketpsych.com/guide/usage [26 August 2015] Gill, T. Perera, D. and Sunner, D. 2012. Electronic Indicators of Economic Activity. Reserve Bank of Australia Bulletin. June: 1-11. Harford, T. 2014. Big Data: Are we making a big mistake? Financial Times. 28 March. Available: http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a- 00144feabdc0.html?ftcamp=published_links/rss/magazine/feed//product#axzz2xGX61fMS [26 August 2015] Hartley, J. 2015. The success of monitoring the economy with Big Data. Huffington Post. 17 March. Available: http://www.huffingtonpost.com/jon-hartley/the-success-of- monitoring_b_6875126.html?ir=Business&utm_hp_ref=business. [25 August 2015] Inflation Series. 2015. [Online]. Available: http://www.pricestats.com/inflation-series [28 August 2015] INRIX. 2015. [Online]. Available: https://en.wikipedia.org/wiki/INRIX [1 September 2015] Kearns, J. 2015. Satellite images show Economies growing and shrinking in real time. BloombergBusiness. 8 July. Available: http://www.bloomberg.com/news/features/2015-07- 08/satellite-images-show-economies-growing-and-shrinking-in-real-time [28 August 2015] Kroese, B. 2015. Innovation and Big Data at Statistics Netherlands. UNSD lunch time seminar on “Big Data: How do we meet the expectations?” held on 4 March in New York. Available: http://unstats.un.org/unsd/statcom/statcom_2015/seminars/big_data/default.html [28 August] Mayer-Schönberger, V. & Cukier, K. Big Data. 2013. A revolution that will transform how we live, work and think. [Kobo version]. Available: www.kobo.com. [22 June 2015] McLaren, N. 2011. Using internet search data as economic indicators. Bank of England Quarterly Bulletin. Q2:134 – 140. Morozov, E. 2013. With Big Data surveillance, the government doesn’t need to know “why” anymore. Slate. 24 June. [Online]. Available: http://www.slate.com/articles/technology/future_tense/2013/06/with_big_data_surveillance_the_ government_doesn_t_need_to_know_why_anymore.single.html. [26 August 2015] Our Global Reach. 2015. [Online]. Available: http://www.pricestats.com/approach/data- composition [28 August 2015]
  • 11. 11 Rental Index, 2015. [Online]. Available: https://za.payprop.com/cgi- bin/giga.cgi?cmd=rental_index [28 August 2015 2015] Sinclair, T.M. 2015. Economic Forecasts in the Age of Big Data. [Online]. Available: http://www.project-syndicate.org/commentary/big-data-economic-forecasts-by-tara-m--sinclair- 2015-08 [28 August] SWIFT Index. An early fact-based leading indicator for short-term GDP evolution, 2015. [Online]. Available: http://www.swift.com/products_services/swift_index_details [28 August 2015] Van der Veen, G. 2013. Big Data: Big Opportunity. UNSD Friday seminar on “Big Data for Policy, Development and Official Statistics” held on 22 February in New York. Available: http://unstats.un.org/unsd/statcom/statcom_2013/seminars/Big_Data/default.html [26 August 2015] Varian, H.R. 2014. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives. 28 (2). Spring: 3-28. Yanofsky, D. 2013. Why traffic jams are the sign of a healthy economy. Quartz. 30 July. [Online]. Available: http://qz.com/109859/why-traffic-jams-are-the-sign-of-a-healthy-economy/. [25 August 2015] Zillow Real Estate and Rental Data: Why we are different, 2015. [Online]. Available: http://www.zillow.com/research/data/ [28 August 2015]