Contenu connexe Similaire à Paris FOD Meetup #5 Cognizant Presentation (20) Plus de Abdelkrim Hadjidj (7) Paris FOD Meetup #5 Cognizant Presentation1. Future of Data meetup#5
With Cognizant
Future of Data meetup#5
With Cognizant
Abdelkrim HadjidjAbdelkrim Hadjidj
Solution Engineer - Hortonworks
@ahadjidj
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
2. Agenda
Merci à Cognizant pour leur accueil et pour la collation
On est 1085 membres aujourd’hui !!
N’hésitez pas à nous contacter si vous souhaitez participer au prochains numéros
(présentation, démo, hébergement, etc)
Agenda
– Real life use cases from across Europe (Walid Aoudi - Cognizant)
– Lessons learnt from running an enterprise data lake for 70 teams (Matthias Kluba - Société
Générale)
– BI on Hadoop: which tool for which use case (Matthieu Lamaraisse - Hortonworks)
3. The Industry’s Premier Big
Data Community Event
DataWorks Summit Berlin
April 16–19, 2018
Estrel Hotel, Sonnenallee, Berlin, Germany
4. © 2018 Cognizant
© 2018 Cognizant
March 8th, 2018
Real life use cases from across Europe
Cognizant’s Presentation
Walid Aoudi – Ph.D, Architecte Big Data & DataScientist
5. © 2018 Cognizant
Corporate Overview
~244,300
Employees
(as of Jun 30, 2016)
………………………..
100+ Global
Delivery Centers
…………………….
. . . .…………………..
………………………
.
Revenue
$12.42B in 2015
(21% YoY)
$3.37B in Q2 2016
Market Capitalization
~ $35B
Fortune’s
Most Admired Companies
Years in a Row
8 Financial Times
Global 500
281
Forbes
Fast Tech 25
18 Newsweek’s
2016 World Green Rankings
101 Fortune
500
230
Forbes
Global 2000
529
6. © 2018 Cognizant
Leader
In Data Management
and BI Service
Providers
Rated #2 on Strategy
AIM : The global leader in data and analytics
Global Top 4
in Business
Analytics Services
& Leaders
in BA and SI service
providers
Top 5 Service
Provider for
Enterprise Data
Management &
Business Analytics
Services
Leader in
Healthcare Payer
Global Banking and
Global Insurance
Big Data and Analytics
IT Services
20,000+*
Consultants
One of the largest Analytics
Practice in the Industry
650+*
Active customers including
several Fortune 500 companies
150+*
IP based Assets (Platforms,
Solutions, Tools and
Frameworks)
Top 3 Service
Provider
in Enterprise Analytics
Services
Top Analytics IT
Service Provider
of Analytics IT
Consulting Providers
Leader
in US
Healthcare Payer BI
Consulting and
Outsourcing Services
Top 3
among MDM
System Integrators
2000+*
Specialists
Domain Experts,
Masters Degree & PhD holders
800+*
Trained Data Scientists
7. © 2018 Cognizant7
Big Data and AI Capability
70+
Clients
150+
Engagements
1600+
Consultants
150+
Use Cases Repository
Won Chairman’s Award @ Leading Card & Travel Services
Firm
Innovation & Business Value Award @ Leading Managed
Healthcare Company
Informatica 2016 partner award
MarkLogic 2016 – Partner Excellence Award in US & EMEA
Analystrecognition
Awards
Impacting Businesses through Big Data
~ $ 30 M churn reduction through machine learning
75% improved and real-time fraud prediction
~ $ 1 M YoY savings through Big data offloading
$ 2 M increased cross-sell and up sell opportunities
Innovation
Data Ingestion
Workbench
BAVA iSMART BRAVO
“Leader”
Gartner Magic
Quadrant for
Business Analytics
Services,
Worldwide 2014
Rated as a Leader by
Everest’s PEAK matrix
in their “Big Data and
Analytics assessment”
for HC, Banking and
Insurance domains
Innovation &
Business Value
Award @ Leading
Managed
Healthcare
Company
Winners Circle –
Top 7 Service
Providers
BFS Analytics
Services 2016
HfS Blueprint Report
Partnerships
SightPrisim
8. © 2018 Cognizant8
Data Lake on Hadoop for Downstream Systems
@ Leading Energy and Home Services provider (1/3)
Key Highlights
Business Drivers
Change data capture from SAP –Oracle based source system
Not meeting business SLA in delivering the data for reporting and analytics
Separate storage and processing for downstream tools like SAS ETL and reporting
1 data repository
for enterprise wide
to customer data
Data ingestion from
1300+ SAP tables,
10K tables in total
Solution Highlights
Sources such as SAP, CRM and SMART meter logs were integrated on data lake
Implemented SCD on a file system and existing SAS code was executed against Hadoop with no or
zero rework
Used ODI journal table mechanism for capturing changes in batches & met business SLA with the
power of Hadoop parallelism
Business Outcome
25% Decrease in query processing latency
20% Lower cost for storage and performance gain in data processing
With the use of SCD concept, no or zero rework downstream
Technology Stack: HDP | Pig | Hive | Sqoop | SAS | ODI for CDC | Qlikview
250 users, 150
applications, 100 node
cluster
Total data volume
2 Petabytes on Prod
9. © 2018 Cognizant9
Data Lake on Hadoop for Downstream Systems
@ Leading Energy and Home Services provider (2/3)
Solution
Approach
Pushdown all the SAS
ETL code and reporting
code directly on to
Hadoop
Hadoop architecture
which worked seamlessly
on commodity hardware,
saved cost on buying
expensive Teradata
upgrades
Use of Hadoop
ecosystem products,
saved cost of buying
expensive ETL licensing
POC with 10 node HDP
1.0 cluster(hosted on HP
data center) with credit
risk data
Added new data
(Headend (xml), pulse
(mobile app field
engineer), website)
Added new uses cases
(campaign management,
record matching)
Solution Architecture
File
Feeds
HIV
E
QL
HIV
E
QL
History CaptureLanding Transformation Layer Reports
• Tables required
for application is
brought into
Data Lake using
Ingestion
Framework
• Only tables which
gets updates and
deletes will be
added into History
Capture (Insert
Only Tables will
only be added to
Landing)
• View will be
created from
Landing and History
Capture Layer
• Any complex
transformation and
business logic will
be created as new
HIVE tables
• Transformation
Layer, a single
interface for any
reporting / data
extracts
• Excel Reports will
connect to Data
Lake using ODBC
driver for HIVE
Excel
Reports
Data Sources
Other
Feeds
SAP
CRM
Telep
hony
DYNO QAM
S
ISSAC Salesf
orce
Transformation
Layer
Landing
History
Capture
Reporting
10. © 2018 Cognizant
Project Highlights:
Reduce TCO
Operational Efficiencies
Predictive Maintenance
Drive Load Reduction Programs
Early Settlements
Customer Top-Up Prediction
Error Free Billing and ½ hourly
consumption breakdowns
Hortonworks Hadoop Data Lake Based Smart Meter Analytics
Solution:
Integrated 60,000 smart meters and 3 million half-hourly
records on a daily basis.
SPARK / R Based Statistical Models
Qlik based Consumption Dashboards
Business Drivers: Cognizant Solution:
Tiered Pricing – Free Weekends
Customer Segmentation based on usage
Proactively Identify mal-functioning smart meters &
timely repair
Improved forecasting for all Smart meters, marketing and
customer behavioral insight
Prioritization of workforce planning for field staff that
repairs/replaced smart meters at 0.014% from 1% of 1.2
million meters in a 45 day period
Improved visibility and transparency of home energy
consumption to end Customers, by providing usage
breakdown, similar homes comparison and insight
Benefits Delivered:
• 10 % decrease in costs in network audit
• Prediction of last mile consumption spikes with accuracy of 89%
• Increase up to 80% in prediction of consumption leakage
• Computer Weekly Best big data, BI and analytics project of the
year
• Business Dashboards Processed files with over 4 Billion records
and fetchs a staggering 45million records daily
• 9 Billion records available to the customer
Data Lake on Hadoop for Downstream Systems (Smart Meter Analytics Implementation)
@ Leading Energy and Home Services provider (3/3)
11. © 2018 Cognizant11
100+ TB and 20
SoRs
3 countries and
all Lines of
Businesses
Single entry
ingestion through
Kafka
Key Highlights
Business
Drivers
Solution
Highlights
Business
Outcomes
Existing three country legacy DW systems on disparate technologies, with multiple view
of information, data quality and manual dependency
Challenges to a one unified centralized data platform for deriving business insights
Increase in business demand for data driven applications that leverage the
overwhelming growth of data and advancement in technology.
Cloud Based Environment for all the Process in Software Lifecycle(development
environment, GIT, cluster, Data marts etc.).
Common frame work for all the reusable components to be used in cloud.
Unlimited data storage in Azure data Lake for various variety of data (Structured,
unstructured, semi structured)
Kafka Spark Streaming for real-time data ingestions.
Confluent repository for all Schema managements.
ARM based environment creation with all security, authentication, authorization using
GitLab and chef.
Technology Stack : Microsoft Azure HDInsight, Streamset, Kafka, Spark, Apache
Beam, Chef, GitLab
Integrated ecosystem addressing full breadth of enterprise applications, analytics, users
and use cases
Expected positive shift in customer engagement, Products, Pricing and Offers, Claims
management, etc.
Deriving faster and accurate business insights to aid rapid and precise decision making
15. © 2018 Cognizant15
Financial DataLake Implementation @ Global Custodian
Business Drivers
Enable fund managers to gain in efficiency but also to focus on their core business
Help customers respond to questions in 4 key areas (Fund Distribution, Financial Reporting, MIS KPI, Social Media)
Enable analysis of investor behavior
Solution Highlights
Designed & developed a business specific Data Management Platform on Hadoop
Enabled end to end data acquisition / storage / Processing / Visualization to End Users in a fully secured
mode at lowest possible granularity level for data access (cell level)
Improved performance and robustness of the platform enabling real time in memory analytics
Business Outcome
A wide range of data in Real time on all funds, Integrating a history on several years
A complete instrument panel allowing customers to zoom in on a specific element such as an investor In
particular, a fund, a date, a geographical area or motto
A customization service allowing each client to adapt the tool to its own needs specific
Technology Stack: HDFS| HBase | Scaled Risk | Tableau | Kerberos | Ranger
Key Highlights
100 + Source
Systems
4 Key Domains – Fund
Distribution, Financial
Reporting, MIS KPI,
Social Media
Total data volume
150+ Tb and growing
Internal and External
Data
16. © 2018 Cognizant
Financial DataLake Implementation @ Global Custodian
16
Central Data Repository
Data Consumption Business Layer
Financial Reporting
Standard Report
VAR Report
Accounting Report
Fund Factsheet
Solvency 2 View
Business Views
Self Service BI
Analytics
Landing Zone
MF MP
IC
Fusion
AAA
gpms
CRD
URD amadeus
RBC
BU_1 View
BU_N View
…
Web Portal /
Intranet
File
Transfer
Downstream
Email
Internal Data
HDFS
Data Integration
API / Data as a
service
External Data
Bisam RMX
External Data sources
Raw Data
Reservoir
Certified
Data Layer
HDFS NFS
Gateway
Security Layer
SPNEGO SSL
HDFS NFS
Gateway
HDFS NFS
Gateway
custom