SlideShare une entreprise Scribd logo
1  sur  70
Télécharger pour lire hors ligne
Copyright 2013 by Data Blueprint
1
Demystifying Big Data
Yes, we face a data deluge and big data seems to
be largely about how to deal with it. But much of
what has been written about big data is focused on
selling hardware and services. The truth is that
until the concept of big data can be objectively
defined, any measurements, claims of success,
quantifications, etc. must be viewed skeptically
and with suspicion. While both the need for and
approaches to these new requirements are faced
by virtually every organization, jumping into the
fray ill-prepared has (to date) reproduced the
same dismal IT project results.
Date: May 14, 2013
Time: 2:00 PM ET/11:00 AM PT
Presenter:Peter Aiken, Ph.D.
• Every century, a new technology-steam power,
electricity, atomic energy, or microprocessors-has
swept away the old world with a vision of a new one.
Today, we seem to be entering the era of Big Data
– Michael Coren
• Every century, a new technology-steam power,
electricity, atomic energy, or microprocessors-has
swept away the old world with a vision of a new one.
Today, we seem to be entering the era of Big Data
– Michael Coren
Copyright 2013 by Data Blueprint
Get Social With Us!
Live Twitter Feed
Join the conversation!
Follow us:
@datablueprint
@paiken
Ask questions and submit
your comments: #dataed
2
Like Us on Facebook
www.facebook.com/
datablueprint
Post questions and comments
Find industry news, insightful
content
and event updates.
Join the Group
Data Management &
Business Intelligence
Ask questions, gain insights
and collaborate with fellow
data management
professionals
Demystifying Big Data
It's about separating the signal from the noise
Presented by Peter Aiken, Ph.D.
Copyright 2013 by Data Blueprint
Meet Your Presenter:
Peter Aiken, Ph.D.
• 30 years of experience in data
management
– Multiple international awards
– Founder, Data Blueprint
(http://datablueprint.com)
• 8 books and dozens of articles
• Experienced w/ 500+ data management
practices in 20 countries
• Multi-year immersions with organizations
as diverse as the US DoD, Deutsche Bank,
Nokia, Wells Fargo, and the
Commonwealth of Virginia
4
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
5
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways & Q&A
Demystifying Big Data
Tweeting now:
#dataed
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
6
Copyright 2013 by Data Blueprint
7
Bills of Mortality by Captain John Graunt
Copyright 2013 by Data Blueprint
8
Copyright 2013 by Data Blueprint
9
Mortality Geocoding
Where is it happening?
Copyright 2013 by Data Blueprint
10
Plague Peak
When is it happening?
("Whereas	
  of	
  the	
  Plague")
Copyright 2013 by Data Blueprint
11
Black Rats or Rattus Rattus
Why is it happening?
Black Rats or Rattus Rattus
Why is it happening?
Copyright 2013 by Data Blueprint
12
What will happen?
Copyright 2013 by Data Blueprint
13
John Snow's 1854 Cholera Map of London
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
14
Copyright 2013 by Data Blueprint
Data Inflation
Unit Size What	
  it	
  means
Bit	
  (b) 1	
  or	
  0
Short	
  for	
  “binary	
  digit”,	
  aAer	
  the	
  binary	
  code	
  (1	
  or	
  0)	
  
computers	
  use	
  to	
  store	
  and	
  process	
  data
Byte	
  (B) 8	
  bits
Enough	
  informaFon	
  to	
  create	
  an	
  English	
  leGer	
  or	
  number	
  in	
  
computer	
  code.	
  It	
  is	
  the	
  basic	
  unit	
  of	
  compuFng
Kilobyte	
  (KB) 1,000,	
  or	
  210,	
  bytes From	
  “thousand”	
  in	
  Greek.	
  One	
  page	
  of	
  typed	
  text	
  is	
  2KB
Megabyte	
  (MB) 1,000KB;	
  220	
  bytes
From	
  “large”	
  in	
  Greek.	
  The	
  complete	
  works	
  of	
  Shakespeare	
  total	
  5MB.	
  A	
  typical	
  pop	
  song	
  is	
  
about	
  4MB
Gigabyte	
  (GB) 1,000MB;	
  230	
  bytes From	
  “giant”	
  in	
  Greek.	
  A	
  two-­‐hour	
  film	
  can	
  be	
  compressed	
  into	
  1-­‐2GB
Terabyte	
  (TB) 1,000GB;	
  240	
  bytes
From	
  “monster”	
  in	
  Greek.	
  All	
  the	
  catalogued	
  books	
  in	
  America’s	
  Library	
  of	
  Congress	
  total	
  
15TB
Petabyte	
  (PB) 1,000TB;	
  250	
  bytes
All	
  leGers	
  delivered	
  by	
  America’s	
  postal	
  service	
  this	
  year	
  will	
  amount	
  to	
  around	
  5PB.	
  
Google	
  processes	
  around	
  1PB	
  every	
  hour
Exabyte	
  (EB) 1,000PB;	
  260	
  bytes Equivalent	
  to	
  10	
  billion	
  copies	
  of	
  The	
  Economist
ZeGabyte	
  (ZB) 1,000EB;	
  270	
  bytes The	
  total	
  amount	
  of	
  informaFon	
  in	
  existence	
  this	
  year	
  is	
  forecast	
  to	
  be	
  around	
  1.2ZB
YoGabyte	
  (YB) 1,000ZB;	
  280	
  bytes Currently	
  too	
  big	
  to	
  imagine
The prefixes are set by an intergovernmental group, the International Bureau of Weights and Measures. Source: The Economist Yotta and Zetta were
added in 1991; terms for larger amounts have yet to be established
15
Copyright 2013 by Data Blueprint
What do they mean big?
16
"Every 2 days we create as much
information as we did up to 2003"
– Eric Schmidt
The	
  number	
  of	
  things	
  that	
  can	
  produce	
  data	
  is	
  
rapidly	
  growing	
  (smart	
  phones	
  for	
  example)
IP	
  traffic	
  will	
  quadruple	
  
by	
  2015
– Asigra	
  2012
Copyright 2013 by Data Blueprint
Google Search Results for the Term "Big Data"
17
Data from Google Trends Source: Gartner (October 2012)
Copyright 2013 by Data Blueprint
Number of Internet Pages Mentioning Big Data
18
Data from Google Trends Source: Gartner (October 2012)
Copyright 2013 by Data Blueprint
Increasingly individuals make use of
the things data producing capabilities
to perform services for them
19
• IBM
– 100 terabytes uploaded daily
– $2.1 billion spent on mobile ads in 2011
– 4.8 trillion ad impressions/daily
• WIPRO
– 2.9 million emails sent every second
– 20 hours of video uploaded to youtube every minute
– 50 million tweets per day
• Asigra
– 2.5 quintillion bytes created daily
– 90% of data was created in the last two years
– By 2015 8 zettabytes will have been created
Copyright 2013 by Data Blueprint
Examples
20
Copyright 2013 by Data Blueprint
21
• 60 GB of data/second
• 200,000 hours of big data
will be generated testing
systems
• 2,000 hours media
coverage/daily
• 845 million Facebook users
averaging 15 TB/day
• 13,000 tweets/second
• 4 billion watching
• 8.5 billion devices
connected
2012 London Summer Games
22
Sloan Management Review/Harvard Business Review
Copyright 2013 by Data BlueprintMIT Sloan Management Review Fall 2012 Page 22 By Thomas H. Davenport, Paul Barth And Randy Bean
Copyright 2013 by Data Blueprint
Big Data(has something to do with Vs - doesn't it?)
• Volume
– Amount of data
• Velocity
– Speed of data in and out
• Variety
– Range of data types and sources
• 2001 Doug Laney
• Variability
– Many options or variable interpretations confound analysis
• 2011 ISRC
• Vitality
– A dynamically changing Big Data environment in which analysis and predictive
models must continually be updated as changes occur to seize opportunities
as they arrive
• 2011 CIA
• Virtual
– Scoping the discussion to only include online assets
• 2012 Courtney Lambert
23
Copyright 2013 by Data Blueprint
Nanex 1/2 Second Trading Data(May 2, 2013 Johnson and Johnson)
24
http://www.youtube.com/watch?v=LrWfXn_mvK8
The European Union last year approved a new rule mandating that all trades must exist for at
least a half-second in this instance that is 1,200 orders and 215 actual trades
Copyright 2013 by Data Blueprint
25
Historyflow-Wikipedia entry for the word “Islam”
Copyright 2013 by Data Blueprint
26
Spatial Information Flow-New York Talk Exchange
Some Far-out Thoughts
on Computers by Orrin
Clotworthy
• Predicted use of
not just
computing in the
intelligence
community
• Also forecast
predictive
analytics
• Accompanying
privacy
challenges
Copyright 2013 by Data Blueprint
27
• Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery
and process optimization.
– Gartner 2012
• Big data refers to datasets whose size is beyond the ability of
typical database software tools to capture, store, manage, and analyze.
– IBM 2012
• Big data usually includes data sets with sizes beyond the ability of
commonly-used software tools to capture, curate, manage, and process the
data within a tolerable elapsed time.
– Wikipedia
• Shorthand for advancing trends in technology that open the door to a new
approach to understanding the world and making decisions.
– NY Times 2012
• Big data is about putting the "I" back into IT.
– Peter Aiken 2007
• We have no objective definition of big data!
– Any measurements, claims of success, quantifications, etc. must be viewed
skeptically and with suspicion!
• Question: "Would it be more useful to refer to "big data techniques?"
Copyright 2013 by Data Blueprint
Defining Big Data
28
Copyright 2013 by Data Blueprint
Big Data Techniques
• New techniques available to impact the productivity (order of
magnitude) of any analytical insight cycle that compliment,
enhance, or replace conventional (existing) analysis methods
• Big data techniques are currently characterized by:
– Continuous, instantaneously
available data sources
– Non-von Neumann
Processing (defined later in the presentation)
– Capabilities approaching
or past human comprehension
– Architecturally enhanceable
identity/security capabilities
– Other tradeoff-focused data processing
• So a good question becomes "where in our existing architecture
can we most effectively apply Big Data Techniques?"
29
Computers
Human resources
Communication facilities
Software
Management
responsibilities
Policies,
directives,
and rules
Data
Copyright 2013 by Data Blueprint
What Questions Can Architectures Address?
30
• How and why do the
components interact?
• Where do they go?
• When are they needed?
• Why and how will the
changes be implemented?
• What should be managed
organization-wide and what
should be managed
locally?
• What standards should be
adopted?
• What vendors should be
chosen?
• What rules should govern
the decisions?
• What policies should guide
the process?
!

 !

!

 !

Copyright 2013 by Data Blueprint 31
Organizational Needs
become instantiated
and integrated into an
Data/Information
Architecture
Informa(on)System)
Requirements
authorizes and
articulates
satisfyspecificorganizationalneeds
Data Architectures produce and are made up of information models that are
developed in response to organizational needs
Copyright 2013 by Data Blueprint
Enterprises Spend $38 Million a Year on Data
32
• Worldwide spending on
business information is
now $1.1 trillion/year
• Enterprises spend an
average of $38 million
on information/year
• Small and Medium
sized Businesses on
average spend
$332,000
– http://www.cio.com.au/article/429681/
five_steps_how_better_manage_your_data/
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
33
Copyright 2013 by Data Blueprint
Gartner Five-phase Hype Cycle
http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
34
Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest
trigger significant publicity. Often no usable products exist and commercial viability is unproven.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the
technology shake out or fail. Investments continue only if the surviving providers improve their products to the
satisfaction of early adopters.
Peak of Inflated Expectations: Early publicity produces a number of
success stories—often accompanied by scores of failures. Some
companies take action; many do not.
Slope of Enlightenment: More instances of how the technology can benefit the
enterprise start to crystallize and become more widely understood. Second- and third-
generation products appear from technology providers. More enterprises fund pilots;
conservative companies remain cautious.
Plateau of Productivity: Mainstream adoption starts to
take off. Criteria for assessing provider viability are more
clearly defined. The technology’s broad market
applicability and relevance are clearly paying off.
Copyright 2013 by Data Blueprint
Gartner Big Data Hype Cycle
35
"A focus on big data is not a substitute for the
fundamentals of information management."
Copyright 2013 by Data Blueprint
36
Big Data in Gartner’s Hype Cycle
Copyright 2013 by Data Blueprint
Technology Continues to Advance
37
• (Gordon) Moore's law
– Over time, the number of transistors on
integrated circuits doubles approximately
every two years
Copyright 2013 by Data Blueprint
Pick any two!
and there are
still tradeoffs
to be made!
38
• Faster processors outstripped
not only the hard disk, but main
memory
– Hard disk too slow
– Memory too small
• Flash drives remove both
bottlenecks
– Combined Apple and Yahoo
have spend more than $500
million to date
• Make it look like traditional
storage or more system
memory
– Minimum 10x improvements
– Dragonstone server is 3.2 tb
flash memory (Facebook)
• Bottom line - new capabilities!
Copyright 2013 by Data Blueprint
"There’s now a blurring between the storage world and the memory world"
39
• von Neumann
bottleneck
(computer science)
– "An inefficiency inherent in
the design of any von
Neumann machine that
arises from the fact that
most computer time is
spent in moving
information between
storage and the central
processing unit rather than
operating on it"
[http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck]
• Michael Stonebraker
– Ingres (Berkeley/MIT)
– Modern database
processing is
approximately 4%
efficient
• Many "big data
architectures are
attempts to address
this, but:
– Zero sum game
– Trade characteristics
against each other
• Reliability
• Predictability
– Google/MapReduce/
Bigtable
– Amazon/Dynamo
– Netflix/Chaos Monkey
– Hadoop
– McDipper
• Big data exploits
non-von Neumann
processing
Copyright 2013 by Data Blueprint
Non-von Neumann Processing/Efficiencies
40
Copyright 2013 by Data Blueprint
Potential Tradeoffs:
CAP theorem: consistency, availability and partition-tolerance
41
Partition
(Fault)
Tolerance
AvailabilityConsistency
RDBMS NoSQL
Small datasets can be both consistent & available
Atomicity
Consistency
Isolation
Durability
Basic
Availability
Soft-state
Eventual consistency
Copyright 2013 by Data Blueprint
Potential either/or Tradeoffs
42
SQL Big Data
Privacy Big Data
Security Big Data
?
Massive High-
speed Flexible
• Patterns/objects,
hypotheses emerge
– What can be
observed?
• Operationalizing
– The dots can be
repeatedly connected
• Things are
happening
– Sensemaking
techniques address
"what" is happening?
• Patterns/objects,
hypotheses emerge
– What can be
observed?
• Operationalizing
– The dots can be
repeatedly connected
– "Big Data"
contributions are
shown in orange
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <-­‐Feedback
"Sensemaking"	
  
Techniques
!
Exis&ng!
Knowledge
/base
Copyright 2013 by Data Blueprint
Analytics Insight Cycle
43
Volume
Velocity
Variety
PotenFal/
actual	
  insights
Discernm
ent
Exploitable
Insight
PaGern/Object
	
  Emergence
Com
bined/
inform
ed	
  
insights
AnalyDcal	
  
boEleneck
Copyright 2013 by Data Blueprint
• Data analysis struggles with the social
– Your brain is excellent at social cognition - people can
• Mirror each other’s emotional states
• Detect uncooperative behavior
• Assign value to things through emotion
– Data analysis measures the quantity of social
interactions but not the quality
• Map interactions with co-workers you see during work days
• Can't capture devotion to childhood friends seen annually
– When making (personal) decisions about social
relationships, it’s foolish to swap the amazing machine
in your skull for the crude machine on your desk
• Data struggles with context
– Decisions are embedded in sequences and contexts
– Brains think in stories - weaving together multiple
causes and multiple contexts
– Data analysis is pretty bad at
• Narratives / Emergent thinking / Explaining
• Data creates bigger haystacks
– More data leads to more statistically significant
correlations
– Most are spurious and deceive us
– Falsity grows exponentially greater amounts of data
we collect
• Big data has trouble with big problems
– For example: the economic stimulus debate
– No one has been persuaded by data to switch sides
• Data favors memes over masterpieces
– Detect when large numbers of people take an instant
liking to some cultural product
– Products are hated initially because they are unfamiliar
• Data obscures values
– Data is never raw; it’s always structured according to
somebody’s predispositions and values
44
Some Big Data Limitations
Savings come from a variety
of agreed upon categories
and values:
• Reduced hospital re-
admissions
• Patient Monitoring:
Inpatient, out-patient,
emergency visits and ICU
• Preventive care for ACO
• Epidemiology
• Patient care quality and
program analysis
Copyright 2013 by Data Blueprint
$300 billion is the potential annual value to health care
45
$165
$108
$47
$9$5
Transparency in clinical data and clinical decision support
Research & Development
Advanced fraud detection-performance based drug pricing
Public health surveillance/response systems
Aggregation of patient records, online platforms, & communities
0
25
50
75
100
Current Improved
Copyright 2013 by Data Blueprint
Reversing The Measures
• Currently:
– Analysts spend 80% of their time manipulating data and 20% of their time
analyzing data
– Hidden productivity bottlenecks
• After rearchitecting:
– Analysts spend less time manipulating data and more of their time analyzing data
– Significant improvements in knowledge worker productivity
46
Manipulation Analysis
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
47
Copyright 2013 by Data Blueprint
What do we teach business people about data?
48
What percentage of the deal with it daily?
Copyright 2013 by Data Blueprint
What do we teach IT professionals about data?
• 1 course
– How to build a new
database
– 80% if UT expenses
are used to improve
existing IT assets
• What impressions do
IT professionals get
from this education?
– Data is a technical skill
that is used to develop
new databases
• This is not the best
way to educate IT and
business
professionals - every
organization's
– Sole, non-depletable,
non-degrading,
durable, strategic
asset
49
Copyright 2013 by Data Blueprint
Application-Centric Development
Original articulation from Doug Bagley @ Walmart
50
t
t
Strategy
Goals/Objectives
Systems/Applications
Network/Infrastructure
Data/Information
t
• In support of strategy, the organization
develops specific goals/objectives
• The goals/objectives drive the
development of specific systems/
applications
• Development of systems/applications
leads to network/infrastructure
requirements
• Data/information are typically
considered after the systems/
applications and network/
infrastructure have been articulated
• Problems with this approach:
– Ensures that data is formed
around the application and not
the information requirements
– Process are narrowly
formed around applications
– Very little data reuse is possible
Einstein Quote
Copyright 2013 by Data Blueprint
51
"The significant
problems we face
cannot be solved
at the same level
of thinking we
were at when we
created them."
- Albert Einstein
Copyright 2013 by Data Blueprint
What does it mean to treat data as an organizational asset?
• Assets are economic resources
– Must own or control
– Must use to produce value
– Value can be converted into cash
• An asset is a resource controlled by
the organization as a result of past
events or transactions and from which
future economic benefits are expected
to flow to the organization [Wikipedia]
• With assets:
– Formalize the care and feeding of data
• Cash management - HR planning
– Put data to work in unique/significant
ways
• Identify data the organization will need
[Redman 2008]
52
Copyright 2013 by Data Blueprint
Data-Centric Development Flow
Original articulation from Doug Bagley @ Walmart
53
t
t
Strategy
Goals/Objectives
Data/Information
Network/Infrastructure
Systems/Applications
t
• In support of strategy, the organization
develops specific goals/objectives
• The goals/objectives drive the
development of specific data/
information assets with an eye to
organization-wide usage
• Network/infrastructure components are
developed to support organization-
wide use of data
• Development of systems/applications
is derived from the
data/network architecture
• Advantages of this approach:
– Data/information assets are
developed from an
organization-wide perspective
– Systems support organizational
data needs and compliment
organizational process flows
– Maximum data/information reuse
Top
Operations
Job
Copyright 2013 by Data Blueprint
CDO Reporting
54
Top Job
Top
Finance
Job
Top
Information
Technology
Job
Top
Marketing
Job
• There is enough work to justify the function
• There is not much talent
• The CDO provides significant input to the Top Information Technology Job
Data Governance Organization
Chief
Data
Officer
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
55
Copyright 2013 by Data Blueprint
56
Traditional Systems Life Cycle Challenges
projectcartoon.com
• Original business concept
• As the consultant described it
• As the customer explained it
• How the project leader understood it
• How the programmer wrote it
• What the beta testers received
• What operations installed
• As accredited for operation
• When it was delivered
• How the project was documented
• How the help desk supported it
• How the customer was billed
• After patches were applied
• What the customer wanted
Copyright 2013 by Data Blueprint
"Waterfall" model of
Systems Development
57
Copyright 2013 by Data Blueprint
"Waterfall" model (with iteration possible)
58
Copyright 2013 by Data Blueprint
Spiral Model of
Systems
Development
59
Barry W. Boehm "A Spiral Model of
Software Development and Enhancement"
ACM SIGSOFT Software Engineering Notes
August 1986, 11(4):14-24
"The major
distinguishing
feature of the
spiral model is
that it creates a
risk-driven
approach ...
rather than a
primarily
document-driven
or code-driven
process"
Copyright 2013 by Data Blueprint
Prototyping & Big Data Technologies
60
Advanced
Data
Practices
• Cloud
• MDM
• Mining
• Big Data
• Analytics
• Warehousing
• SOA
Copyright 2013 by Data Blueprint
• 5 Data
management
practices areas /
data management
basics ...
• ... are necessary
but insufficient
prerequisites to
organizational data
leveraging
applications that is
self actualizing data
or advanced data
practices
Hierarchy of Data Management Practices (after Maslow)
Basic Data Management Practices
– Data Program Management
– Organizational Data Integration
– Data Stewardship
– Data Development
– Data Support Operations
http://3.bp.blogspot.com/-ptl-9mAieuQ/T-idBt1YFmI/AAAAAAAABgw/Ib-nVkMmMEQ/s1600/maslows_hierarchy_of_needs.png
Copyright 2013 by Data Blueprint
Experience Pebbles
62
• Data Analysis
- Origins
• Challenges
- Faced by virtually everyone
• Compliment
- Existing data management practices
• Pre-requisites
- Necessary to exploit big data techniques
• Prototyping
- Iterative means of practicing big data techniques
• Take Aways and Q&A
Demystifying Big Data
Copyright 2013 by Data Blueprint
Tweeting now:
#dataed
63
Copyright 2013 by Data Blueprint
Which Source of Data Represents the Most Immediate Opportunity?
64
Guidance
• The real change: cost-effectiveness
and timely delivery
• Business process optimization — not
technology
• High fidelity and quality information
• Big data technology, all is not new
• Keep your focus, develop skills
(Source: Gartner (January 2013)
Copyright 2013 by Data Blueprint
Gartner Recommendations
65
Impacts Top Recommendations
Some of the new analytics that are made
possible by big data have no precedence,
so innovative thinking will be required to
achieve value
Treat big data projects as innovation
projects that will require change
management efforts. The business will
take time to trust new data sources and
new analytics
Creative thinking can unearth valuable
information sources already inside the
enterprise that are underused
Work with the business to conduct an
inventory of internal data sources outside
of IT's direct control, and consider
augmenting existing data that is IT
'controlled.' With an innovation mindset,
explore the potential insight that can be
gained from each of these sources
Big data technologies often create the
ability to analyze faster, but getting value
from faster analytics requires business
changes
Ensure that big data projects that improve
analytical speed always include a process
redesign effort that aims at getting
maximum benefit from that speed
Gartner 2012
Copyright 2013 by Data Blueprint
Results: It is not always about money
• Solution:
– Integrate multiple databases into
one to create holistic view of data
– Automation of manual process
• Results:
– Data is passed safely and effectively
– Eliminate inconsistencies,
redundancies, and corruption
– Ability to cross-analyze
– Significantly reduced turnaround
time for matching patients with
potential donor -> increased
potential to make life-saving
connection in a manner that is
faster, safer and more reliable
– Increased safe matches from 3 out
of 10 to 6 out of 10
66
Copyright 2013 by Data Blueprint
Data versus Tools-A History Lesson
• Sophisticated tool without data are useless
• Mediocre tools with the data are frustrating
• Analysts will always opt for frustration over futility, if
that is their only option
– Ira "Gus" Hunt CIA/CTO
• http://www.huffingtonpost.com/2013/03/20/cia-gus-hunt-big-data_n_2917842.html
67
Copyright 2013 by Data Blueprint
Questions?
68
+ =
Unlock Business Value through
Data Quality Engineering
June 11, 2013 @ 2:00 PM ET/11:00 AM PT
Data Systems Integration & Business Value Pt. 1:
Metadata
July 9, 2013 @ 2:00 PM ET/11:00 AM PT
Sign up here:
www.datablueprint.com/webinar-schedule
or www.dataversity.net
Copyright 2013 by Data Blueprint
Upcoming Events
69
10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056

Contenu connexe

Tendances

Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataRichard Vidgen
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting FutureIJERA Editor
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 
A Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsA Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsIRJET Journal
 
US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-2013US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-20133 Round Stones
 
Everis big data_wilson_v1.4
Everis big data_wilson_v1.4Everis big data_wilson_v1.4
Everis big data_wilson_v1.4wilson_lucas
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Katie Whipkey
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 
Big data introduction by quontra solutions
Big data introduction by quontra solutionsBig data introduction by quontra solutions
Big data introduction by quontra solutionsQUONTRASOLUTIONS
 
From AI to Z: How AI is changing the relationship between people and data
From AI to Z: How AI is changing the relationship between people and dataFrom AI to Z: How AI is changing the relationship between people and data
From AI to Z: How AI is changing the relationship between people and dataiGenius
 
McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...Path of the Blue Eye Project
 
Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Setia Pramana
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 

Tendances (20)

Big data
Big dataBig data
Big data
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Mining Big Data to Predicting Future
Mining Big Data to Predicting FutureMining Big Data to Predicting Future
Mining Big Data to Predicting Future
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Big Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique BruxellesBig Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique Bruxelles
 
A Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and TrendsA Review Paper on Big Data: Technologies, Tools and Trends
A Review Paper on Big Data: Technologies, Tools and Trends
 
US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-2013US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-2013
 
Everis big data_wilson_v1.4
Everis big data_wilson_v1.4Everis big data_wilson_v1.4
Everis big data_wilson_v1.4
 
Final_Bigdata_pret
Final_Bigdata_pretFinal_Bigdata_pret
Final_Bigdata_pret
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Data
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
Data Storytelling
Data StorytellingData Storytelling
Data Storytelling
 
Big data introduction by quontra solutions
Big data introduction by quontra solutionsBig data introduction by quontra solutions
Big data introduction by quontra solutions
 
From AI to Z: How AI is changing the relationship between people and data
From AI to Z: How AI is changing the relationship between people and dataFrom AI to Z: How AI is changing the relationship between people and data
From AI to Z: How AI is changing the relationship between people and data
 
McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...
 
Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 

En vedette

Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...Data Blueprint
 
Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data Blueprint
 
Data-Ed Online: Let's Talk Metadata: Strategies and Successes
Data-Ed Online: Let's Talk Metadata: Strategies and Successes Data-Ed Online: Let's Talk Metadata: Strategies and Successes
Data-Ed Online: Let's Talk Metadata: Strategies and Successes Data Blueprint
 
Data-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content ManagementData-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content ManagementData Blueprint
 
Data-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData Blueprint
 
Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data
Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data  Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data
Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data Data Blueprint
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
 
Data-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data GovernanceData-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data GovernanceData Blueprint
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Blueprint
 
Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM Data Blueprint
 
Data-Ed Online: Practical Data Modeling
Data-Ed Online: Practical Data ModelingData-Ed Online: Practical Data Modeling
Data-Ed Online: Practical Data ModelingData Blueprint
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 

En vedette (12)

Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
 
Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management
 
Data-Ed Online: Let's Talk Metadata: Strategies and Successes
Data-Ed Online: Let's Talk Metadata: Strategies and Successes Data-Ed Online: Let's Talk Metadata: Strategies and Successes
Data-Ed Online: Let's Talk Metadata: Strategies and Successes
 
Data-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content ManagementData-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content Management
 
Data-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROI
 
Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data
Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data  Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data
Data-Ed Online: Your Documents and Other Content: Managing Unstructured Data
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Data-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data GovernanceData-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data Governance
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
 
Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM
 
Data-Ed Online: Practical Data Modeling
Data-Ed Online: Practical Data ModelingData-Ed Online: Practical Data Modeling
Data-Ed Online: Practical Data Modeling
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 

Similaire à Data-Ed: Demystifying Big Data

Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxtangyechloe
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and InternetSanoj Kumar
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01nayanbhatia2
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big dataVedanand Singh
 

Similaire à Data-Ed: Demystifying Big Data (20)

Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Our big data
Our big dataOur big data
Our big data
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Big-Data-AryaTadbirNetworkDesigners
Big-Data-AryaTadbirNetworkDesignersBig-Data-AryaTadbirNetworkDesigners
Big-Data-AryaTadbirNetworkDesigners
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 

Plus de Data Blueprint

Data Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMMData Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMMData Blueprint
 
Data-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData Blueprint
 
Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management  Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data Blueprint
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData Blueprint
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data Blueprint
 
Strategy and roadmap slides
Strategy and roadmap slidesStrategy and roadmap slides
Strategy and roadmap slidesData Blueprint
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData Blueprint
 
Data-Ed: Metadata Strategies
 Data-Ed: Metadata Strategies Data-Ed: Metadata Strategies
Data-Ed: Metadata StrategiesData Blueprint
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData Blueprint
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies Data Blueprint
 
Data-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity ModelData-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity ModelData Blueprint
 
Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data Blueprint
 
Data-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data JobsData-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data JobsData Blueprint
 
Data-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & RoadmapData-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & RoadmapData Blueprint
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data Blueprint
 
Data-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data ManagementData-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data ManagementData Blueprint
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Blueprint
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData Blueprint
 

Plus de Data Blueprint (20)

Data Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMMData Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMM
 
Data-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and Hadoop
 
Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management  Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
 
Strategy and roadmap slides
Strategy and roadmap slidesStrategy and roadmap slides
Strategy and roadmap slides
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Data-Ed: Metadata Strategies
 Data-Ed: Metadata Strategies Data-Ed: Metadata Strategies
Data-Ed: Metadata Strategies
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies
 
Data-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity ModelData-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity Model
 
Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures
 
2014 dqe handouts
2014 dqe handouts2014 dqe handouts
2014 dqe handouts
 
Data-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data JobsData-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data Jobs
 
Data-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & RoadmapData-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & Roadmap
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Data-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data ManagementData-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data Management
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
 

Dernier

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Dernier (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Data-Ed: Demystifying Big Data

  • 1. Copyright 2013 by Data Blueprint 1 Demystifying Big Data Yes, we face a data deluge and big data seems to be largely about how to deal with it. But much of what has been written about big data is focused on selling hardware and services. The truth is that until the concept of big data can be objectively defined, any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion. While both the need for and approaches to these new requirements are faced by virtually every organization, jumping into the fray ill-prepared has (to date) reproduced the same dismal IT project results. Date: May 14, 2013 Time: 2:00 PM ET/11:00 AM PT Presenter:Peter Aiken, Ph.D. • Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data – Michael Coren • Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data – Michael Coren
  • 2. Copyright 2013 by Data Blueprint Get Social With Us! Live Twitter Feed Join the conversation! Follow us: @datablueprint @paiken Ask questions and submit your comments: #dataed 2 Like Us on Facebook www.facebook.com/ datablueprint Post questions and comments Find industry news, insightful content and event updates. Join the Group Data Management & Business Intelligence Ask questions, gain insights and collaborate with fellow data management professionals
  • 3. Demystifying Big Data It's about separating the signal from the noise Presented by Peter Aiken, Ph.D.
  • 4. Copyright 2013 by Data Blueprint Meet Your Presenter: Peter Aiken, Ph.D. • 30 years of experience in data management – Multiple international awards – Founder, Data Blueprint (http://datablueprint.com) • 8 books and dozens of articles • Experienced w/ 500+ data management practices in 20 countries • Multi-year immersions with organizations as diverse as the US DoD, Deutsche Bank, Nokia, Wells Fargo, and the Commonwealth of Virginia 4
  • 5. • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways and Q&A Demystifying Big Data Copyright 2013 by Data Blueprint Tweeting now: #dataed 5 • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways & Q&A Demystifying Big Data Tweeting now: #dataed
  • 6. • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways and Q&A Demystifying Big Data Copyright 2013 by Data Blueprint Tweeting now: #dataed 6
  • 7. Copyright 2013 by Data Blueprint 7 Bills of Mortality by Captain John Graunt
  • 8. Copyright 2013 by Data Blueprint 8
  • 9. Copyright 2013 by Data Blueprint 9 Mortality Geocoding Where is it happening?
  • 10. Copyright 2013 by Data Blueprint 10 Plague Peak When is it happening? ("Whereas  of  the  Plague")
  • 11. Copyright 2013 by Data Blueprint 11 Black Rats or Rattus Rattus Why is it happening? Black Rats or Rattus Rattus Why is it happening?
  • 12. Copyright 2013 by Data Blueprint 12 What will happen?
  • 13. Copyright 2013 by Data Blueprint 13 John Snow's 1854 Cholera Map of London
  • 14. • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways and Q&A Demystifying Big Data Copyright 2013 by Data Blueprint Tweeting now: #dataed 14
  • 15. Copyright 2013 by Data Blueprint Data Inflation Unit Size What  it  means Bit  (b) 1  or  0 Short  for  “binary  digit”,  aAer  the  binary  code  (1  or  0)   computers  use  to  store  and  process  data Byte  (B) 8  bits Enough  informaFon  to  create  an  English  leGer  or  number  in   computer  code.  It  is  the  basic  unit  of  compuFng Kilobyte  (KB) 1,000,  or  210,  bytes From  “thousand”  in  Greek.  One  page  of  typed  text  is  2KB Megabyte  (MB) 1,000KB;  220  bytes From  “large”  in  Greek.  The  complete  works  of  Shakespeare  total  5MB.  A  typical  pop  song  is   about  4MB Gigabyte  (GB) 1,000MB;  230  bytes From  “giant”  in  Greek.  A  two-­‐hour  film  can  be  compressed  into  1-­‐2GB Terabyte  (TB) 1,000GB;  240  bytes From  “monster”  in  Greek.  All  the  catalogued  books  in  America’s  Library  of  Congress  total   15TB Petabyte  (PB) 1,000TB;  250  bytes All  leGers  delivered  by  America’s  postal  service  this  year  will  amount  to  around  5PB.   Google  processes  around  1PB  every  hour Exabyte  (EB) 1,000PB;  260  bytes Equivalent  to  10  billion  copies  of  The  Economist ZeGabyte  (ZB) 1,000EB;  270  bytes The  total  amount  of  informaFon  in  existence  this  year  is  forecast  to  be  around  1.2ZB YoGabyte  (YB) 1,000ZB;  280  bytes Currently  too  big  to  imagine The prefixes are set by an intergovernmental group, the International Bureau of Weights and Measures. Source: The Economist Yotta and Zetta were added in 1991; terms for larger amounts have yet to be established 15
  • 16. Copyright 2013 by Data Blueprint What do they mean big? 16 "Every 2 days we create as much information as we did up to 2003" – Eric Schmidt The  number  of  things  that  can  produce  data  is   rapidly  growing  (smart  phones  for  example) IP  traffic  will  quadruple   by  2015 – Asigra  2012
  • 17. Copyright 2013 by Data Blueprint Google Search Results for the Term "Big Data" 17 Data from Google Trends Source: Gartner (October 2012)
  • 18. Copyright 2013 by Data Blueprint Number of Internet Pages Mentioning Big Data 18 Data from Google Trends Source: Gartner (October 2012)
  • 19. Copyright 2013 by Data Blueprint Increasingly individuals make use of the things data producing capabilities to perform services for them 19
  • 20. • IBM – 100 terabytes uploaded daily – $2.1 billion spent on mobile ads in 2011 – 4.8 trillion ad impressions/daily • WIPRO – 2.9 million emails sent every second – 20 hours of video uploaded to youtube every minute – 50 million tweets per day • Asigra – 2.5 quintillion bytes created daily – 90% of data was created in the last two years – By 2015 8 zettabytes will have been created Copyright 2013 by Data Blueprint Examples 20
  • 21. Copyright 2013 by Data Blueprint 21 • 60 GB of data/second • 200,000 hours of big data will be generated testing systems • 2,000 hours media coverage/daily • 845 million Facebook users averaging 15 TB/day • 13,000 tweets/second • 4 billion watching • 8.5 billion devices connected 2012 London Summer Games
  • 22. 22 Sloan Management Review/Harvard Business Review Copyright 2013 by Data BlueprintMIT Sloan Management Review Fall 2012 Page 22 By Thomas H. Davenport, Paul Barth And Randy Bean
  • 23. Copyright 2013 by Data Blueprint Big Data(has something to do with Vs - doesn't it?) • Volume – Amount of data • Velocity – Speed of data in and out • Variety – Range of data types and sources • 2001 Doug Laney • Variability – Many options or variable interpretations confound analysis • 2011 ISRC • Vitality – A dynamically changing Big Data environment in which analysis and predictive models must continually be updated as changes occur to seize opportunities as they arrive • 2011 CIA • Virtual – Scoping the discussion to only include online assets • 2012 Courtney Lambert 23
  • 24. Copyright 2013 by Data Blueprint Nanex 1/2 Second Trading Data(May 2, 2013 Johnson and Johnson) 24 http://www.youtube.com/watch?v=LrWfXn_mvK8 The European Union last year approved a new rule mandating that all trades must exist for at least a half-second in this instance that is 1,200 orders and 215 actual trades
  • 25. Copyright 2013 by Data Blueprint 25 Historyflow-Wikipedia entry for the word “Islam”
  • 26. Copyright 2013 by Data Blueprint 26 Spatial Information Flow-New York Talk Exchange
  • 27. Some Far-out Thoughts on Computers by Orrin Clotworthy • Predicted use of not just computing in the intelligence community • Also forecast predictive analytics • Accompanying privacy challenges Copyright 2013 by Data Blueprint 27
  • 28. • Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. – Gartner 2012 • Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. – IBM 2012 • Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. – Wikipedia • Shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. – NY Times 2012 • Big data is about putting the "I" back into IT. – Peter Aiken 2007 • We have no objective definition of big data! – Any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion! • Question: "Would it be more useful to refer to "big data techniques?" Copyright 2013 by Data Blueprint Defining Big Data 28
  • 29. Copyright 2013 by Data Blueprint Big Data Techniques • New techniques available to impact the productivity (order of magnitude) of any analytical insight cycle that compliment, enhance, or replace conventional (existing) analysis methods • Big data techniques are currently characterized by: – Continuous, instantaneously available data sources – Non-von Neumann Processing (defined later in the presentation) – Capabilities approaching or past human comprehension – Architecturally enhanceable identity/security capabilities – Other tradeoff-focused data processing • So a good question becomes "where in our existing architecture can we most effectively apply Big Data Techniques?" 29
  • 30. Computers Human resources Communication facilities Software Management responsibilities Policies, directives, and rules Data Copyright 2013 by Data Blueprint What Questions Can Architectures Address? 30 • How and why do the components interact? • Where do they go? • When are they needed? • Why and how will the changes be implemented? • What should be managed organization-wide and what should be managed locally? • What standards should be adopted? • What vendors should be chosen? • What rules should govern the decisions? • What policies should guide the process?
  • 31. ! ! ! ! Copyright 2013 by Data Blueprint 31 Organizational Needs become instantiated and integrated into an Data/Information Architecture Informa(on)System) Requirements authorizes and articulates satisfyspecificorganizationalneeds Data Architectures produce and are made up of information models that are developed in response to organizational needs
  • 32. Copyright 2013 by Data Blueprint Enterprises Spend $38 Million a Year on Data 32 • Worldwide spending on business information is now $1.1 trillion/year • Enterprises spend an average of $38 million on information/year • Small and Medium sized Businesses on average spend $332,000 – http://www.cio.com.au/article/429681/ five_steps_how_better_manage_your_data/
  • 33. • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways and Q&A Demystifying Big Data Copyright 2013 by Data Blueprint Tweeting now: #dataed 33
  • 34. Copyright 2013 by Data Blueprint Gartner Five-phase Hype Cycle http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp 34 Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven. Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters. Peak of Inflated Expectations: Early publicity produces a number of success stories—often accompanied by scores of failures. Some companies take action; many do not. Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to crystallize and become more widely understood. Second- and third- generation products appear from technology providers. More enterprises fund pilots; conservative companies remain cautious. Plateau of Productivity: Mainstream adoption starts to take off. Criteria for assessing provider viability are more clearly defined. The technology’s broad market applicability and relevance are clearly paying off.
  • 35. Copyright 2013 by Data Blueprint Gartner Big Data Hype Cycle 35 "A focus on big data is not a substitute for the fundamentals of information management."
  • 36. Copyright 2013 by Data Blueprint 36 Big Data in Gartner’s Hype Cycle
  • 37. Copyright 2013 by Data Blueprint Technology Continues to Advance 37 • (Gordon) Moore's law – Over time, the number of transistors on integrated circuits doubles approximately every two years
  • 38. Copyright 2013 by Data Blueprint Pick any two! and there are still tradeoffs to be made! 38
  • 39. • Faster processors outstripped not only the hard disk, but main memory – Hard disk too slow – Memory too small • Flash drives remove both bottlenecks – Combined Apple and Yahoo have spend more than $500 million to date • Make it look like traditional storage or more system memory – Minimum 10x improvements – Dragonstone server is 3.2 tb flash memory (Facebook) • Bottom line - new capabilities! Copyright 2013 by Data Blueprint "There’s now a blurring between the storage world and the memory world" 39
  • 40. • von Neumann bottleneck (computer science) – "An inefficiency inherent in the design of any von Neumann machine that arises from the fact that most computer time is spent in moving information between storage and the central processing unit rather than operating on it" [http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck] • Michael Stonebraker – Ingres (Berkeley/MIT) – Modern database processing is approximately 4% efficient • Many "big data architectures are attempts to address this, but: – Zero sum game – Trade characteristics against each other • Reliability • Predictability – Google/MapReduce/ Bigtable – Amazon/Dynamo – Netflix/Chaos Monkey – Hadoop – McDipper • Big data exploits non-von Neumann processing Copyright 2013 by Data Blueprint Non-von Neumann Processing/Efficiencies 40
  • 41. Copyright 2013 by Data Blueprint Potential Tradeoffs: CAP theorem: consistency, availability and partition-tolerance 41 Partition (Fault) Tolerance AvailabilityConsistency RDBMS NoSQL Small datasets can be both consistent & available Atomicity Consistency Isolation Durability Basic Availability Soft-state Eventual consistency
  • 42. Copyright 2013 by Data Blueprint Potential either/or Tradeoffs 42 SQL Big Data Privacy Big Data Security Big Data ? Massive High- speed Flexible
  • 43. • Patterns/objects, hypotheses emerge – What can be observed? • Operationalizing – The dots can be repeatedly connected • Things are happening – Sensemaking techniques address "what" is happening? • Patterns/objects, hypotheses emerge – What can be observed? • Operationalizing – The dots can be repeatedly connected – "Big Data" contributions are shown in orange                                <-­‐Feedback "Sensemaking"   Techniques ! Exis&ng! Knowledge /base Copyright 2013 by Data Blueprint Analytics Insight Cycle 43 Volume Velocity Variety PotenFal/ actual  insights Discernm ent Exploitable Insight PaGern/Object  Emergence Com bined/ inform ed   insights AnalyDcal   boEleneck
  • 44. Copyright 2013 by Data Blueprint • Data analysis struggles with the social – Your brain is excellent at social cognition - people can • Mirror each other’s emotional states • Detect uncooperative behavior • Assign value to things through emotion – Data analysis measures the quantity of social interactions but not the quality • Map interactions with co-workers you see during work days • Can't capture devotion to childhood friends seen annually – When making (personal) decisions about social relationships, it’s foolish to swap the amazing machine in your skull for the crude machine on your desk • Data struggles with context – Decisions are embedded in sequences and contexts – Brains think in stories - weaving together multiple causes and multiple contexts – Data analysis is pretty bad at • Narratives / Emergent thinking / Explaining • Data creates bigger haystacks – More data leads to more statistically significant correlations – Most are spurious and deceive us – Falsity grows exponentially greater amounts of data we collect • Big data has trouble with big problems – For example: the economic stimulus debate – No one has been persuaded by data to switch sides • Data favors memes over masterpieces – Detect when large numbers of people take an instant liking to some cultural product – Products are hated initially because they are unfamiliar • Data obscures values – Data is never raw; it’s always structured according to somebody’s predispositions and values 44 Some Big Data Limitations
  • 45. Savings come from a variety of agreed upon categories and values: • Reduced hospital re- admissions • Patient Monitoring: Inpatient, out-patient, emergency visits and ICU • Preventive care for ACO • Epidemiology • Patient care quality and program analysis Copyright 2013 by Data Blueprint $300 billion is the potential annual value to health care 45 $165 $108 $47 $9$5 Transparency in clinical data and clinical decision support Research & Development Advanced fraud detection-performance based drug pricing Public health surveillance/response systems Aggregation of patient records, online platforms, & communities
  • 46. 0 25 50 75 100 Current Improved Copyright 2013 by Data Blueprint Reversing The Measures • Currently: – Analysts spend 80% of their time manipulating data and 20% of their time analyzing data – Hidden productivity bottlenecks • After rearchitecting: – Analysts spend less time manipulating data and more of their time analyzing data – Significant improvements in knowledge worker productivity 46 Manipulation Analysis
  • 47. • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways and Q&A Demystifying Big Data Copyright 2013 by Data Blueprint Tweeting now: #dataed 47
  • 48. Copyright 2013 by Data Blueprint What do we teach business people about data? 48 What percentage of the deal with it daily?
  • 49. Copyright 2013 by Data Blueprint What do we teach IT professionals about data? • 1 course – How to build a new database – 80% if UT expenses are used to improve existing IT assets • What impressions do IT professionals get from this education? – Data is a technical skill that is used to develop new databases • This is not the best way to educate IT and business professionals - every organization's – Sole, non-depletable, non-degrading, durable, strategic asset 49
  • 50. Copyright 2013 by Data Blueprint Application-Centric Development Original articulation from Doug Bagley @ Walmart 50 t t Strategy Goals/Objectives Systems/Applications Network/Infrastructure Data/Information t • In support of strategy, the organization develops specific goals/objectives • The goals/objectives drive the development of specific systems/ applications • Development of systems/applications leads to network/infrastructure requirements • Data/information are typically considered after the systems/ applications and network/ infrastructure have been articulated • Problems with this approach: – Ensures that data is formed around the application and not the information requirements – Process are narrowly formed around applications – Very little data reuse is possible
  • 51. Einstein Quote Copyright 2013 by Data Blueprint 51 "The significant problems we face cannot be solved at the same level of thinking we were at when we created them." - Albert Einstein
  • 52. Copyright 2013 by Data Blueprint What does it mean to treat data as an organizational asset? • Assets are economic resources – Must own or control – Must use to produce value – Value can be converted into cash • An asset is a resource controlled by the organization as a result of past events or transactions and from which future economic benefits are expected to flow to the organization [Wikipedia] • With assets: – Formalize the care and feeding of data • Cash management - HR planning – Put data to work in unique/significant ways • Identify data the organization will need [Redman 2008] 52
  • 53. Copyright 2013 by Data Blueprint Data-Centric Development Flow Original articulation from Doug Bagley @ Walmart 53 t t Strategy Goals/Objectives Data/Information Network/Infrastructure Systems/Applications t • In support of strategy, the organization develops specific goals/objectives • The goals/objectives drive the development of specific data/ information assets with an eye to organization-wide usage • Network/infrastructure components are developed to support organization- wide use of data • Development of systems/applications is derived from the data/network architecture • Advantages of this approach: – Data/information assets are developed from an organization-wide perspective – Systems support organizational data needs and compliment organizational process flows – Maximum data/information reuse
  • 54. Top Operations Job Copyright 2013 by Data Blueprint CDO Reporting 54 Top Job Top Finance Job Top Information Technology Job Top Marketing Job • There is enough work to justify the function • There is not much talent • The CDO provides significant input to the Top Information Technology Job Data Governance Organization Chief Data Officer
  • 55. • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways and Q&A Demystifying Big Data Copyright 2013 by Data Blueprint Tweeting now: #dataed 55
  • 56. Copyright 2013 by Data Blueprint 56 Traditional Systems Life Cycle Challenges projectcartoon.com • Original business concept • As the consultant described it • As the customer explained it • How the project leader understood it • How the programmer wrote it • What the beta testers received • What operations installed • As accredited for operation • When it was delivered • How the project was documented • How the help desk supported it • How the customer was billed • After patches were applied • What the customer wanted
  • 57. Copyright 2013 by Data Blueprint "Waterfall" model of Systems Development 57
  • 58. Copyright 2013 by Data Blueprint "Waterfall" model (with iteration possible) 58
  • 59. Copyright 2013 by Data Blueprint Spiral Model of Systems Development 59 Barry W. Boehm "A Spiral Model of Software Development and Enhancement" ACM SIGSOFT Software Engineering Notes August 1986, 11(4):14-24 "The major distinguishing feature of the spiral model is that it creates a risk-driven approach ... rather than a primarily document-driven or code-driven process"
  • 60. Copyright 2013 by Data Blueprint Prototyping & Big Data Technologies 60
  • 61. Advanced Data Practices • Cloud • MDM • Mining • Big Data • Analytics • Warehousing • SOA Copyright 2013 by Data Blueprint • 5 Data management practices areas / data management basics ... • ... are necessary but insufficient prerequisites to organizational data leveraging applications that is self actualizing data or advanced data practices Hierarchy of Data Management Practices (after Maslow) Basic Data Management Practices – Data Program Management – Organizational Data Integration – Data Stewardship – Data Development – Data Support Operations http://3.bp.blogspot.com/-ptl-9mAieuQ/T-idBt1YFmI/AAAAAAAABgw/Ib-nVkMmMEQ/s1600/maslows_hierarchy_of_needs.png
  • 62. Copyright 2013 by Data Blueprint Experience Pebbles 62
  • 63. • Data Analysis - Origins • Challenges - Faced by virtually everyone • Compliment - Existing data management practices • Pre-requisites - Necessary to exploit big data techniques • Prototyping - Iterative means of practicing big data techniques • Take Aways and Q&A Demystifying Big Data Copyright 2013 by Data Blueprint Tweeting now: #dataed 63
  • 64. Copyright 2013 by Data Blueprint Which Source of Data Represents the Most Immediate Opportunity? 64 Guidance • The real change: cost-effectiveness and timely delivery • Business process optimization — not technology • High fidelity and quality information • Big data technology, all is not new • Keep your focus, develop skills (Source: Gartner (January 2013)
  • 65. Copyright 2013 by Data Blueprint Gartner Recommendations 65 Impacts Top Recommendations Some of the new analytics that are made possible by big data have no precedence, so innovative thinking will be required to achieve value Treat big data projects as innovation projects that will require change management efforts. The business will take time to trust new data sources and new analytics Creative thinking can unearth valuable information sources already inside the enterprise that are underused Work with the business to conduct an inventory of internal data sources outside of IT's direct control, and consider augmenting existing data that is IT 'controlled.' With an innovation mindset, explore the potential insight that can be gained from each of these sources Big data technologies often create the ability to analyze faster, but getting value from faster analytics requires business changes Ensure that big data projects that improve analytical speed always include a process redesign effort that aims at getting maximum benefit from that speed Gartner 2012
  • 66. Copyright 2013 by Data Blueprint Results: It is not always about money • Solution: – Integrate multiple databases into one to create holistic view of data – Automation of manual process • Results: – Data is passed safely and effectively – Eliminate inconsistencies, redundancies, and corruption – Ability to cross-analyze – Significantly reduced turnaround time for matching patients with potential donor -> increased potential to make life-saving connection in a manner that is faster, safer and more reliable – Increased safe matches from 3 out of 10 to 6 out of 10 66
  • 67. Copyright 2013 by Data Blueprint Data versus Tools-A History Lesson • Sophisticated tool without data are useless • Mediocre tools with the data are frustrating • Analysts will always opt for frustration over futility, if that is their only option – Ira "Gus" Hunt CIA/CTO • http://www.huffingtonpost.com/2013/03/20/cia-gus-hunt-big-data_n_2917842.html 67
  • 68. Copyright 2013 by Data Blueprint Questions? 68 + =
  • 69. Unlock Business Value through Data Quality Engineering June 11, 2013 @ 2:00 PM ET/11:00 AM PT Data Systems Integration & Business Value Pt. 1: Metadata July 9, 2013 @ 2:00 PM ET/11:00 AM PT Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net Copyright 2013 by Data Blueprint Upcoming Events 69
  • 70. 10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056