Contenu connexe Similaire à Adding Hadoop to Your Analytics Mix? (20) Adding Hadoop to Your Analytics Mix?1. MAKING BIG DATA COME ALIVE
Adding Hadoop to Your Analytics Mix:
Challenges and Strategies
Madina Kassengaliyeva
July 23, 2015
2. 2
Madina Kassengaliyeva
Director, Client Services, Think Big
Madina Kassengaliyeva is responsible for ensuring successful
delivery of Think Big’s service engagements. Madina has led
strategy, engineering and data science engagements in a variety
of areas, including recommendation engines, customer
interactions optimization, marketing analytics and compliance.
Madina holds an MBA from the University of Chicago and a BA in
International Studies from American University.
Presenters
© 2015 Think Big, a Teradata Company 8/3/2015
Paul Barsch
Director, Services Marketing, Think Big
Paul Barsch directs marketing programs for Think Big, a Teradata
Company. Paul has been in IT for 15+ years in variety of roles for
Teradata, HP Enterprise Services and KPMG Consulting.
3. 3
Housekeeping
Use the widget bar below to…
Get valuable resources & complete exit survey
Ask Questions to the Presenters
Request online technical help
Go social….
…and follow the conversation
© 2015 Think Big, a Teradata Company 8/3/2015
4. 4
• Hadoop Adoption Path
• Key Challenges – Data,
Organization, Capabilities
• Ideas for Solutions
Agenda
5. 5
Common Hadoop Adoption Path
© 2015 Think Big, a Teradata Company 8/3/2015
1. Address
Immediate
Needs
2. Establish a
Data
Repository
3. Initial
Analytics
Exploration
4. Integrate
Hadoop into
the Analytics
Capabilities
• Hadoop used to
relieve a technology
pain point
• Reduce data
warehouse costs
• Speed up ETL
• The only users are in
technology teams
• More and more data gets
added to Hadoop as a
result of Phase 1
• Greater data variety,
more raw data, deeper
history
• Initial data transfer,
security, and governance
practices are established
• Still perceived as largely
a technology platform
• Limited number of people
or teams conduct POCs
using Hadoop
• Analytics techniques not
available on traditional
platforms are applied
• Early wins indicate
promising business impact
and excitement builds
• Multiple teams use
Hadoop as part of the
analytics infrastructure
• Techniques, methods,
best practices and access
patterns get codified
• Business begins to
capture consistent value
Transition from Phase 3
to Phase 4 is when key
challenges emerge
7. 7
Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
Data
Organization
Capabilities
• Impact of schema on read
• Consistent taxonomies and reference data
• Architecture - access patterns and flows
• Skills, roles and responsibilities
• Lack of common vocabulary
• Knowledge capture and sharing
• Foundational capabilities at the whim of
changing business priorities
• Future that’s hard to envision is hard to build
8. 8
Organization – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
• Skills, roles and responsibilities
o Significant skills gaps between what’s currently available and what is
needed
o Both business and technology do analytics and often engineering, blurring
lines of responsibility or ownership
o “Throw over the wall” doesn’t work
• Lack of common vocabulary
o Every BU (and every leader) have their own understanding of the same
words
o This is rarely discussed
• Knowledge capture and sharing
o Multiple teams work with the same data and similar techniques
o Organization silos do not naturally support broad knowledge transfer
9. 9
• Cross-BU committee to guide
organizational change, define
common vocabulary, defend the
effort to executive leadership and
share success
• Thorough, honest skills assessments to
identify gaps, training needs,
augmentation needs, map to roles
and responsibilities
• Documented tools requirements
based on current and projected skills
• Collaboration architecture
• Plug into existing knowledge transfer
practices and tools and allow for
informal information exchange based
on data access privileges
Organization – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
10. 10
Organization – Key Functions
© 2015 Think Big, a Teradata Company 8/3/2015
Strategy
Data Management & Governance
Architecture Tools Market
Research
Roadmap
Planning
Value
Realization
Future Data
Sources
Services
Support
Visualization &
ReportingData SME’s
Core Platform
Development Testing
Operations
Core Platform
Management
Metrics Tracking &
Reporting Platform Integration
Program
Management
Roadmap
Execution
Cross Group
Coordination
Financial
Management
Small Project
Prioritization
Communication
& Change
Management
Application
Development
Analytic
Sandbox
Data Science
Integration,
Interfaces &
Ingestion
Training
Incident Management Config, Change,
Release ManagementProblem Management
Help DeskKnowledge
Management
Technology
Governanc
e
Data
Quality &
Metrics
Access
Controls
Data
Governance
Metadata
Management
11. 11
• Foundational capabilities at the whim of changing business priorities
• Lack of consensus on what are foundational capabilities
• Let’s be honest, the “Top Project” changes often and the resources go
with it
• Foundational capabilities do not immediately impact the bottom line
• Future that’s hard to envision is hard to build
• Lack of shared vision
• Clarity needed at multiple levels – strategy, operational details, day to
day
Capabilities – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
12. 12
• Consolidate ownership in a team that has
organizational influence and includes
representatives from the business, the
infrastructure, architecture, data, and
analytics
• Back to vocabulary – agree on what
capabilities mean for your business unit and
your technology partners
• Roadmaps are useful – visual representations
of high-level goals against a time line that
should define your projects
• Dedicate resource to capabilities and
protect them
• Check in with your roadmap – does it still
reflect your vision?
Capabilities – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
Photo courtesy of Flickr. Creative Commons.
By E.Bass.
14. 14
Capabilities: Roadmap Example
© 2015 Think Big, a Teradata Company 8/3/2015
Analytics
standardized
methods,
code, tools,
team roles
Operations
standardized
processes,
tools, team
roles
Skills and roles
matrix
Data Ingestion, Transfer,
Structuring,
and Governance approach
Unified Model Management
Integrated
Data Science
Variables based on single source
structured data
Variable selection in
Hadoop
Integration with existing
scoring engine
Batch data processing in HadoopIntegration Cross-channel and intraday variables generation
Batch scoring in Hadoop
Natural language processing
to analyze text and voice
Initial real-time scoring
Execution Methodology and
project management
Data and
Models
Organization
and
Managemen
t
Analytics Knowledge
Management
Scoring Architectural
and Analytical design
Data Lifecycle Management
Real-time scoring design
Statistical and machine-learning-based
modeling
Data Exploration of unstructured data
components (e.g. URL, chat text)
Data Exploration of structured data
components (e.g. page views,
Cross-channel variables, variables from unstructured data +
intraday variables
15. 15
• Impact of schema on read
• Hadoop supports a variety of data structures, which simplifies data
ingestion and allows data users to define preferred schemas
• This shifts the burden of defining the schema to the data users
• Consistent taxonomies and reference data
• Meaningful data analysis requires known and consistent taxonomy
• New taxonomies can get created by individual teams
• Reference data changes
• Architecture - access patterns and flows
• Data flows across platforms, regular updates, physical and virtual
constraints
• Decisions on what should be done where
Data – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
16. 16
• Big issue with lots of opinions – see Data Lake
et. al
• Test and define common data manipulation
patterns for different use cases –
aggregations, reductions, basic statistical
derivations
• Centralize the responsibility for data
governance, data architecture, taxonomy,
and maintenance
• Establish knowledge sharing for data post-
analytics
Data – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
Photo courtesy of Flickr. Creative Commons.
By Renzo Ferrante
17. 17
• Data management,
knowledge, architecture, and
processing assurance
• Investment justification,
research, knowledge sharing
• Data aggregation and
enhancement
Client Example – Centralized Data Group
© 2015 Think Big, a Teradata Company 8/3/2015
Data Source 1
Data Source 2
Data Source 3
Data Source 3
Business
Group
Product
Group
Central Tech
Group
18. 18
Conclusions
© 2015 Think Big, a Teradata Company 8/3/2015
Data
Organization
Capabilities
• Centralize data management
• Knowledge of data = knowledge of business
• Technology is not enough – need the right
people and processes
• Executive commitment is key
• Tough conversations can yield much better
alignment
• Dedicate and protect resources to build
capabilities
19. 19
• 100% Big Data Focus
• Founded in 2010 with100+ engagements across 70 clients
• Unlock value of big data with data science and data
engineering services
• Proven vendor-neutral open source integration expertise
• Agile team-based development methodology
• Think Big Academy for skills and organizational development
• Global delivery model
Who is Think Big?