SlideShare a Scribd company logo
1 of 29
Download to read offline
© 2014
The Emerging Data Lake IT Strategy
An Evolving Approach for Dealing with Big Data & Changing Environments
SPEAKERS:
Thomas Kelly, Practice Director
Cognizant Technology Solutions
Sean Martin, Founder and CTO
Cambridge Semantics
bit.ly/DataLake
© 20142
We’re living in an amazing world of information sharing,
connecting with family, neighbors, vendors, and customers
all over the world
© 20143
Telling the world
about what we like
and don’t like
#HIMYMfinale
@MLB
… is now following Cognizant Technology Solutions
and Cambridge Semantics
© 20144
What we’re doing and how we’re succeeding
© 20145
We’re deciding what advertising that we want to see…
… and what we don’t
Unsubscribe
Influencing
how business
and customers
engage
© 20146
Many businesses have emerged that embrace this model of
customer engagement
and we’ve said Goodbye to businesses that didn’t
10 million stays in 2013,
without owning a hotel
Grew to nearly $75B in
annual retail revenue in 2013,
without opening a storefront Shares over 40 million
photos each day
© 20147
Retail
Engaging in a more
personalized shopping
experience, retailers are
building a stronger
relationship with each
customer
© 20148
Customer Service
Delivering a positive and
successful experience for
each customer
© 20149
Life Sciences and Healthcare
Combining health, genetic,
clinical, and public sciences
data to bring effective
therapies to patients sooner
© 201410
Financial Services
Delivering innovative
products and services,
based on a 360° view of
the Customer, across all
business lines, engaging
all available data assets,
internal and external
© 201411
The Challenges That We're Addressing
Onboarding and Integrating Data is Slow and Expensive
• Transforming data from a growing variety of technologies
• Custom coded ETL
• Existing ETL processes are not reusable
• Optimization for analytics is time-consuming and costly
• Often wait until there is a defined need for a set of data, delaying benefits
realization while waiting to onboard the data
Data Provenance is Often Poorly Recorded
• Data meaning is “lost in translation”
• Data transformations tracked in spreadsheets
• Post-onboarding, maintenance and analysis cost for onboarded data is high
• Recreating data lineage is manual, time-consuming, and error-prone
© 201412
The Challenges That We're Addressing
Target Data is Difficult to Consume
• Optimization favors known analytics, but not well suited to new requirements
• A one-size-fits-all canonical view is used rather than fit-for-purpose views
• Or, lacks a conceptual model to easily consume the target data
• Difficult to identify what data is available, how to get access, and how to
integrate the data to answer a question
Industrializing the Big Data Environment is Difficult to Manage
• Proliferation of data silos leads to inconsistency/syncing issues
• Conflicting objectives of opening access to data assets while managing
security and privacy requirements
• Velocity of business change rapidly invalidate data organization and analytics
optimizations
• Managing the integration/interaction with the multiple data management
technologies that make up the Big Data environment
© 201413
Data
Ingestion
The Data Lake is made up of four key
components
Data Lake Management
Data Management Query Management
Delivering
• Low Cost, High Performance Storage
• Flexible, Easy-to-Use Data Organization
• Performance-Optimized Analytics
• Automation of most manual Development and
Query Activities
• Self-Service End-User Features
• Intelligent Processing
© 201414
Data Ingestion
Data Lake Management
Data Management Query Management
Data Sources
Linked Data
Internet of Things IoT
Data
Ingestion
On-Demand
Query
Streaming
Semantic
Tagging
Scheduled
Batch Load
Model-
Driven
Self-Service
Desktop and Mobile
Operational
Systems
Social Media and
Cloud
© 201415
Data Management
Data Lake Management
Data Management Query Management
Provenance
Data
Movement
Data Sources
Linked Data
Internet of Things IoT
Semantic
Graph
Columnar
In Memory
Data
Ingestion
On-Demand
Query
Streaming
Semantic
Tagging
Scheduled
Batch Load
Model-
Driven
Self-Service
Desktop and Mobile
NoSQL Map Reduce
Operational
Systems
Social Media and
Cloud
HDFS Storage
Structured and
Unstructured Data
HDFS Storage
© 201416
Data
Ingestion
Data Lake Management
Data Management Query Management
Semantic
Graph
Columnar
In Memory
Provenance
Data
Movement
Data Lake Management
Data Assets
Catalog
WorkflowModels
Access
Management
Data Sources
Linked Data
Internet of Things IoT
Data Mappings
• Source-to-Target
• Transformations
• Internal and External
Data Assets
• Defined Data Orgs
(ontologies,
taxonomies, thesauri)
• Authorization and Access Rules
• Rule-based Security
• Group, Role, and User Level
Authorization
• Auditable Access
• Processes
• Schedules
• Provenance
Capture
On-Demand
Query
Streaming
Semantic
Tagging
Scheduled
Batch Load
Model-
Driven
Self-Service
Business-Focused
• Business Unit Data
Organization and Terms
• Optimized to Assist
Analytics
Monitoring
• Monitor and Manage
Data Lake Operations
Desktop and Mobile
Data Governance
• Focus on Shared Data
• Standard Models
• Controlled Vocabulary
• Common Definitions
• Standards-based Data
Views (FIBO, CDISC/RDF)
NoSQL Map Reduce
Operational
Systems
Social Media and
Cloud
Structured and
Unstructured Data
HDFS Storage
© 201417
Query Management
Data
Ingestion
On-Demand
Query
Streaming
Semantic
Tagging
Data Lake Management
Data Management
Scheduled
Batch Load
Model-
Driven
Self-Service
Query Management
Provenance
Data
Movement
Data Sources
Linked Data
Internet of Things IoT
Semantic
Graph
Columnar
In Memory
Query Data, Metadata,
and Provenance
Capture and Share
Analytics Expertise
Semantic Search
Analytics Directed to
the Best Query Engine
Data Discovery
Desktop and Mobile
NoSQL Map Reduce
Operational
Systems
Social Media and
Cloud
HDFS Storage
Structured and
Unstructured Data
HDFS Storage
© 201418
Semantic Technology Delivers “Smart” Data
Integrates a network of internal and external data assets,
insulating end users from the details of the underlying
technologies
Captures expertise (logic, inferencing) and integrates it with
the data, delivering “smart” data to non-expert users
Manages a comprehensive inventory of the data assets
Secures access to the right data assets by the right users
© 201419
Key W3C Standards in Semantic Technology
Resource Description
Framework (RDF)
Framework for storing and
integrating data and data
definitions in the form of subject-
predicate-object expressions, or
“triples”. Relationships are
organized in a logical graph
model. Reduced development
time and cost; faster time-to-
business value.
Web Ontology Language
(OWL)
An ontology is a comprehensive
model of data definitions and
relationships that is human- and
machine-readable. Ontologies
are inheritable and extensible.
Improved application quality,
flexible iterative / investigative
approach, easily adapts to
business change.
SPARQL
Query Language
SQL-like query language for
semantic data that can leverage
the ontological relationships and
constructs to execute smarter
queries. Access multiple
internal and external databases
simultaneously in a single query.
Access and integrate data
across business silos.
Inference
Reasoning over data through
business rules. Expertise is
captured and embedded in the
ontology model, accessible
through user queries. This is
the “smart” in Smart Data.
Easier end user access to
expertise; intelligent systems
capabilities.
Linked Data
Connects data contained in
different databases, allowing
queries to find, share and
combine data so insights can be
identified across the Web.
Connect disparate databases to
navigate and integrate data
regardless of location or
technology platform.
RDB to RDF Mapping
Language (R2RML)
Preserving current investments
in relational technology, R2RML
maps relational data to an
ontology. SPARQL can query
RDF and relational databases
simultaneously.
Low cost of entry to use
Semantic Technology to deliver
high-value solutions
© 201420
The Common Model is the “Data Glue”
Lead
(SFA system)
Quote
(Quote system)
Order
(OMS system)
Contract
(CMS system)
Common Model
(“Data Glue”)
Source Systems
• Different business entities in
physical systems actually share
many of the same concepts,
meanings, and relationships
• Semantic data science exposes
common business concepts and
connects them with their physical
expression in production systems
• Data is “glued” together by its
business meaning, rather than
physical structures dictated by
the underlying technologies
The conceptual model can be directly used by both business and IT users to
operationalize data services, understand the data landscape, track data lineage, and
conduct downstream analytics.
© 201421
Semantic Models Relate Data by Business
Meaning
Life
Events
Life Style
Preferences
Interests
Customer
Music
Purchasing
Personal
Network
Entertainment
Profession
© 201422
Implications to the Existing IT Architecture
and Practices
User Tools to Discover
and Optimize Data
Relationships
Structured and
Unstructured
Data, Voice,
and Video
Data Analysis
Automation
Extends Existing
Investments in
IT Architecture
Manages
Secure Access
Builds Out Enterprise
Data Models, with
Integration Hub
Capabilities
Self-Service Data Feeds
and Analytics
Infrastructure
Capacity
Elasticity
Reduction of
Data Mart Silos
Easier
Access
to
External
Data
© 201423
Data Lake Approach to Meeting Business Needs
Business Needs
Traditional Technologies
and Practices
Data Lake Technologies
and Practices
Onboard New Data
 Comprehensive analysis creates rigid
structure that is difficult to change, or
 Minimal definition of data organization
requires detailed understanding of data
contents
 Flexible data model can be revised or extended
without redesign of the database
 Agile, evolutionary refinement of the data
organization, leveraging new insights as users work
with the data
Connect External Data
 External data is collected and loaded into
the analytics repository.
 Data is streamed, or is refreshed on a
scheduled frequency.
 External data can be sourced from databases,
spreadsheets, Web pages, news feeds, and more;
data is queried through common methods, without
regard to location, with real-time values delivered at
query time.
Integrate Data between
Business Units or Business
Partners
 Governance activities establish common
vocabulary, and data definitions
 And, systems of record publish existing data
specifications or ontology model; each organization
defines data in a manner that is best suited for its
business.
 Shared data is copied to an integrated
database.
 Federation and virtualization features provide
choices in which data to copy and which data to
retain in the system(s) of record
 Organization-specific definitions may
require duplicating certain data in marts
 All models can be supported through a single copy of
the data, maintained in the data lake or system of
record.
Capture and Embed Expertise
 Expertise often captured in the reporting
and analytics; change management
challenge when updates required.
 Expertise captured in the data definitions; single,
shared definition minimizes change management
efforts
© 201424
Lessons learned from early adopters
Prioritize
Prioritize data onboarding by the data’s ability to
contribute to customer engagement
Onboard Onboard data assets as they become available
Connect Connect to available internal and external data assets
Load Load the data unfiltered/untransformed
Organize Use models to provide organization to the data
Customize
Create models that are tailored to the needs of the
business groups
Search Make it easy to find data
Secure
Manage security and privacy, but make it easy to
authorize access to data that users need
© 201425
Addressing Challenges
- Privacy vs Personal Value
- Granularity of customer understanding
- Delivering strategic objectives when projects tend
to have a technical focus
- Opening access to data
- Need for executive sponsorship
- Access to external data
- Establishing firewalls
- Persistent, pervasive data quality issues
© 201426
Clues to better customer engagement will be
found in the ever-growing volume of data that
we’re creating
© 201427
A Data Lake Strategy helps you to create a
personalized, engaging experience with each
customer
Visibility Self-Service
SmartProvenance
Open, yet Secure
Internet Scale
Agile
Adaptable
Universal
Data Access
© 201428
Questions?
© 201429
Thank you!

More Related Content

What's hot

The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Technologies
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceTony Baer
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?DATAVERSITY
 

What's hot (20)

The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
Developing a Strategy for Data Lake Governance
Developing a Strategy for Data Lake GovernanceDeveloping a Strategy for Data Lake Governance
Developing a Strategy for Data Lake Governance
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?Data Lake, Virtual Database, or Data Hub - How to Choose?
Data Lake, Virtual Database, or Data Hub - How to Choose?
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 

Similar to The Emerging Data Lake IT Strategy

Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User InformationDenodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Increasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationIncreasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationDenodo
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudPrecisely
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerMolly Alexander
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DATAVERSITY
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationDenodo
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Denodo
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudPrecisely
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Denodo
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentDenodo
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
SQL Server 2019 Data Virtualization
SQL Server 2019 Data VirtualizationSQL Server 2019 Data Virtualization
SQL Server 2019 Data VirtualizationMatthew W. Bowers
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationReinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationDenodo
 

Similar to The Emerging Data Lake IT Strategy (20)

Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Increasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationIncreasing Agility Through Data Virtualization
Increasing Agility Through Data Virtualization
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data Virtualization
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
 
Big data
Big dataBig data
Big data
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data Environment
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
SQL Server 2019 Data Virtualization
SQL Server 2019 Data VirtualizationSQL Server 2019 Data Virtualization
SQL Server 2019 Data Virtualization
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Data Analytics.pptx
Data Analytics.pptxData Analytics.pptx
Data Analytics.pptx
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital TransformationReinvent Your Data Management Strategy for Successful Digital Transformation
Reinvent Your Data Management Strategy for Successful Digital Transformation
 

More from Thomas Kelly, PMP

Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
Enterprise Semantic Technology
Enterprise Semantic TechnologyEnterprise Semantic Technology
Enterprise Semantic TechnologyThomas Kelly, PMP
 
Rapid data integration and curation
Rapid data integration and curationRapid data integration and curation
Rapid data integration and curationThomas Kelly, PMP
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big ValueThomas Kelly, PMP
 
Semantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerSemantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerThomas Kelly, PMP
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationThomas Kelly, PMP
 

More from Thomas Kelly, PMP (8)

Semantic Analytics
Semantic AnalyticsSemantic Analytics
Semantic Analytics
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Enterprise Semantic Technology
Enterprise Semantic TechnologyEnterprise Semantic Technology
Enterprise Semantic Technology
 
Mobile semantic technology
Mobile semantic technologyMobile semantic technology
Mobile semantic technology
 
Rapid data integration and curation
Rapid data integration and curationRapid data integration and curation
Rapid data integration and curation
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
 
Semantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerSemantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing Practitioner
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data Collaboration
 

Recently uploaded

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 

Recently uploaded (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 

The Emerging Data Lake IT Strategy

  • 1. © 2014 The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin, Founder and CTO Cambridge Semantics bit.ly/DataLake
  • 2. © 20142 We’re living in an amazing world of information sharing, connecting with family, neighbors, vendors, and customers all over the world
  • 3. © 20143 Telling the world about what we like and don’t like #HIMYMfinale @MLB … is now following Cognizant Technology Solutions and Cambridge Semantics
  • 4. © 20144 What we’re doing and how we’re succeeding
  • 5. © 20145 We’re deciding what advertising that we want to see… … and what we don’t Unsubscribe Influencing how business and customers engage
  • 6. © 20146 Many businesses have emerged that embrace this model of customer engagement and we’ve said Goodbye to businesses that didn’t 10 million stays in 2013, without owning a hotel Grew to nearly $75B in annual retail revenue in 2013, without opening a storefront Shares over 40 million photos each day
  • 7. © 20147 Retail Engaging in a more personalized shopping experience, retailers are building a stronger relationship with each customer
  • 8. © 20148 Customer Service Delivering a positive and successful experience for each customer
  • 9. © 20149 Life Sciences and Healthcare Combining health, genetic, clinical, and public sciences data to bring effective therapies to patients sooner
  • 10. © 201410 Financial Services Delivering innovative products and services, based on a 360° view of the Customer, across all business lines, engaging all available data assets, internal and external
  • 11. © 201411 The Challenges That We're Addressing Onboarding and Integrating Data is Slow and Expensive • Transforming data from a growing variety of technologies • Custom coded ETL • Existing ETL processes are not reusable • Optimization for analytics is time-consuming and costly • Often wait until there is a defined need for a set of data, delaying benefits realization while waiting to onboard the data Data Provenance is Often Poorly Recorded • Data meaning is “lost in translation” • Data transformations tracked in spreadsheets • Post-onboarding, maintenance and analysis cost for onboarded data is high • Recreating data lineage is manual, time-consuming, and error-prone
  • 12. © 201412 The Challenges That We're Addressing Target Data is Difficult to Consume • Optimization favors known analytics, but not well suited to new requirements • A one-size-fits-all canonical view is used rather than fit-for-purpose views • Or, lacks a conceptual model to easily consume the target data • Difficult to identify what data is available, how to get access, and how to integrate the data to answer a question Industrializing the Big Data Environment is Difficult to Manage • Proliferation of data silos leads to inconsistency/syncing issues • Conflicting objectives of opening access to data assets while managing security and privacy requirements • Velocity of business change rapidly invalidate data organization and analytics optimizations • Managing the integration/interaction with the multiple data management technologies that make up the Big Data environment
  • 13. © 201413 Data Ingestion The Data Lake is made up of four key components Data Lake Management Data Management Query Management Delivering • Low Cost, High Performance Storage • Flexible, Easy-to-Use Data Organization • Performance-Optimized Analytics • Automation of most manual Development and Query Activities • Self-Service End-User Features • Intelligent Processing
  • 14. © 201414 Data Ingestion Data Lake Management Data Management Query Management Data Sources Linked Data Internet of Things IoT Data Ingestion On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Desktop and Mobile Operational Systems Social Media and Cloud
  • 15. © 201415 Data Management Data Lake Management Data Management Query Management Provenance Data Movement Data Sources Linked Data Internet of Things IoT Semantic Graph Columnar In Memory Data Ingestion On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Desktop and Mobile NoSQL Map Reduce Operational Systems Social Media and Cloud HDFS Storage Structured and Unstructured Data HDFS Storage
  • 16. © 201416 Data Ingestion Data Lake Management Data Management Query Management Semantic Graph Columnar In Memory Provenance Data Movement Data Lake Management Data Assets Catalog WorkflowModels Access Management Data Sources Linked Data Internet of Things IoT Data Mappings • Source-to-Target • Transformations • Internal and External Data Assets • Defined Data Orgs (ontologies, taxonomies, thesauri) • Authorization and Access Rules • Rule-based Security • Group, Role, and User Level Authorization • Auditable Access • Processes • Schedules • Provenance Capture On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Business-Focused • Business Unit Data Organization and Terms • Optimized to Assist Analytics Monitoring • Monitor and Manage Data Lake Operations Desktop and Mobile Data Governance • Focus on Shared Data • Standard Models • Controlled Vocabulary • Common Definitions • Standards-based Data Views (FIBO, CDISC/RDF) NoSQL Map Reduce Operational Systems Social Media and Cloud Structured and Unstructured Data HDFS Storage
  • 17. © 201417 Query Management Data Ingestion On-Demand Query Streaming Semantic Tagging Data Lake Management Data Management Scheduled Batch Load Model- Driven Self-Service Query Management Provenance Data Movement Data Sources Linked Data Internet of Things IoT Semantic Graph Columnar In Memory Query Data, Metadata, and Provenance Capture and Share Analytics Expertise Semantic Search Analytics Directed to the Best Query Engine Data Discovery Desktop and Mobile NoSQL Map Reduce Operational Systems Social Media and Cloud HDFS Storage Structured and Unstructured Data HDFS Storage
  • 18. © 201418 Semantic Technology Delivers “Smart” Data Integrates a network of internal and external data assets, insulating end users from the details of the underlying technologies Captures expertise (logic, inferencing) and integrates it with the data, delivering “smart” data to non-expert users Manages a comprehensive inventory of the data assets Secures access to the right data assets by the right users
  • 19. © 201419 Key W3C Standards in Semantic Technology Resource Description Framework (RDF) Framework for storing and integrating data and data definitions in the form of subject- predicate-object expressions, or “triples”. Relationships are organized in a logical graph model. Reduced development time and cost; faster time-to- business value. Web Ontology Language (OWL) An ontology is a comprehensive model of data definitions and relationships that is human- and machine-readable. Ontologies are inheritable and extensible. Improved application quality, flexible iterative / investigative approach, easily adapts to business change. SPARQL Query Language SQL-like query language for semantic data that can leverage the ontological relationships and constructs to execute smarter queries. Access multiple internal and external databases simultaneously in a single query. Access and integrate data across business silos. Inference Reasoning over data through business rules. Expertise is captured and embedded in the ontology model, accessible through user queries. This is the “smart” in Smart Data. Easier end user access to expertise; intelligent systems capabilities. Linked Data Connects data contained in different databases, allowing queries to find, share and combine data so insights can be identified across the Web. Connect disparate databases to navigate and integrate data regardless of location or technology platform. RDB to RDF Mapping Language (R2RML) Preserving current investments in relational technology, R2RML maps relational data to an ontology. SPARQL can query RDF and relational databases simultaneously. Low cost of entry to use Semantic Technology to deliver high-value solutions
  • 20. © 201420 The Common Model is the “Data Glue” Lead (SFA system) Quote (Quote system) Order (OMS system) Contract (CMS system) Common Model (“Data Glue”) Source Systems • Different business entities in physical systems actually share many of the same concepts, meanings, and relationships • Semantic data science exposes common business concepts and connects them with their physical expression in production systems • Data is “glued” together by its business meaning, rather than physical structures dictated by the underlying technologies The conceptual model can be directly used by both business and IT users to operationalize data services, understand the data landscape, track data lineage, and conduct downstream analytics.
  • 21. © 201421 Semantic Models Relate Data by Business Meaning Life Events Life Style Preferences Interests Customer Music Purchasing Personal Network Entertainment Profession
  • 22. © 201422 Implications to the Existing IT Architecture and Practices User Tools to Discover and Optimize Data Relationships Structured and Unstructured Data, Voice, and Video Data Analysis Automation Extends Existing Investments in IT Architecture Manages Secure Access Builds Out Enterprise Data Models, with Integration Hub Capabilities Self-Service Data Feeds and Analytics Infrastructure Capacity Elasticity Reduction of Data Mart Silos Easier Access to External Data
  • 23. © 201423 Data Lake Approach to Meeting Business Needs Business Needs Traditional Technologies and Practices Data Lake Technologies and Practices Onboard New Data  Comprehensive analysis creates rigid structure that is difficult to change, or  Minimal definition of data organization requires detailed understanding of data contents  Flexible data model can be revised or extended without redesign of the database  Agile, evolutionary refinement of the data organization, leveraging new insights as users work with the data Connect External Data  External data is collected and loaded into the analytics repository.  Data is streamed, or is refreshed on a scheduled frequency.  External data can be sourced from databases, spreadsheets, Web pages, news feeds, and more; data is queried through common methods, without regard to location, with real-time values delivered at query time. Integrate Data between Business Units or Business Partners  Governance activities establish common vocabulary, and data definitions  And, systems of record publish existing data specifications or ontology model; each organization defines data in a manner that is best suited for its business.  Shared data is copied to an integrated database.  Federation and virtualization features provide choices in which data to copy and which data to retain in the system(s) of record  Organization-specific definitions may require duplicating certain data in marts  All models can be supported through a single copy of the data, maintained in the data lake or system of record. Capture and Embed Expertise  Expertise often captured in the reporting and analytics; change management challenge when updates required.  Expertise captured in the data definitions; single, shared definition minimizes change management efforts
  • 24. © 201424 Lessons learned from early adopters Prioritize Prioritize data onboarding by the data’s ability to contribute to customer engagement Onboard Onboard data assets as they become available Connect Connect to available internal and external data assets Load Load the data unfiltered/untransformed Organize Use models to provide organization to the data Customize Create models that are tailored to the needs of the business groups Search Make it easy to find data Secure Manage security and privacy, but make it easy to authorize access to data that users need
  • 25. © 201425 Addressing Challenges - Privacy vs Personal Value - Granularity of customer understanding - Delivering strategic objectives when projects tend to have a technical focus - Opening access to data - Need for executive sponsorship - Access to external data - Establishing firewalls - Persistent, pervasive data quality issues
  • 26. © 201426 Clues to better customer engagement will be found in the ever-growing volume of data that we’re creating
  • 27. © 201427 A Data Lake Strategy helps you to create a personalized, engaging experience with each customer Visibility Self-Service SmartProvenance Open, yet Secure Internet Scale Agile Adaptable Universal Data Access