SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
THREE
Big Data
CASE STUDIES
Great use cases of Big Data
Big Data Exploration
Find, visualize, understand all big
data to improve decision making
Enhanced 3600 View
of the Customer
Extend existing customer views
(CRM, etc) by incorporating
additional internal and external
information sources
Security/Intelligence Extension
Lower risk, detect fraud and
monitor cyber security in real-time
Data Warehouse Augmentation
Integrate big data and data
warehouse capabilities to increase
operational efficiency
Operations Analysis
Analyze a variety of machine
data for improved business results
• Greater efficiencies
in business
processes
• New insights from
combining and
analyzing data
types in new ways
• Develop new
business models
with resulting
increased market
presence and
revenue
Why Big Data
File Systems
Relational Data
Content Mgmt
Email
CRM
Supply Chain
ERP
RSS Feeds
Cloud
Custom SourcesDataViews
Applications/
Users
Atidan Approach
Implement a
Hadoop-
centric
reference
architecture
Move
enterprise
batch
processing to
Hadoop
Make Hadoop
the single
point of truth
Massively
reduce ETL by
transforming
within
Hadoop
Move results
and
aggregates
back to legacy
systems for
consumption
Retain, within
Hadoop,
source files at
the finest
granularity for
re-use
Top Criteria
• Allow users to use familiar consumption interfaces (web, mobile)
• Enable businesses to unlock previously unusable data
Unlock Big
Data
Simplify
Your
Warehouse
Preprocess
Raw Data
Ingest
BigData
ArchitectureHighlevel
Atidan Case Study
Usage Analysis using Hadoop
• Business Need
• A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs
• The logs received from IIS were stored in multiple files e.g. Daily logs
• The data had free text, it was unstructured and it also contained irrelevant data
• The exact analysis criteria/parameters/desired outcome were not pre-known
• Solution
• Traditional RDBMS could not handle the problem due to the type and volume of the data and the
uncertainty around ultimate analysis criteria
• Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily
• The solution was fault tolerant to data inconsistencies
• Hadoop provided elasticity to incremental data addition
• Scalability in the range of Peta Bytes
• Based on data size and complexity, the processing can be scaled from one node to 100 nodes
• Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage
in the project
• The organization got completely new and unexpected insights on employee, customer and vendor/partner
behavior
• Correlations between employee’s usage pattern and attrition as well as productivity were established
Atidan Case Study
Usage Analysis using Hadoop
0
2000
4000
6000
8000
10000
12000
14000
Accepted…
BadRequest…
Created(201)
Forbidden…
Not…
NotFound…
OK(200)
Unauthorise…
Request Types
0
200
400
600
800
1000
1200
January
March
May
July
September
November
January
March
May
July
September
November
2001 2002
Monthly Requests
0
200000
400000
600000
Amare
Amit
Bhagat
Mukesh
Praneel
Sanjog
Vimal
Users
• The size of data being collected
and analyzed in industry for
business intelligence (BI) is
growing rapidly making
traditional warehousing solution
prohibitively expensive
• Map Reduce is low level and
complex to write
• Hive provides high level query
language like SQL
• This allows for ad-hoc analysis
• Business need not know patterns
to look for in advance
Big Query - Hive
Atidan Case Study
Customer data collection (KYC) using Hadoop
• Business Need
• A financial institution had to periodically collect customer data
• Customers are very reluctant to provide updated data
• This customer data has to be cross-checked against the billions of transactions they receive per day
• They want to collate data that is available in public domain from known social media sites
• The data had free text, it was unstructured and it also contained irrelevant data
• Solution
• A graph database is constructed over the extracted social data to analyze transactions
• Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database
• Aggregate customer information from existing sources, social media, government sources
• Analyzed transaction to find hidden patterns
• Enable link analysis, risk monitoring
• Facilitate decision making(new products) and customer discovery
Atidan Case Study
Customer data collection (KYC) using Hadoop
Big Data Processing
Graph Database
Customer Clustering
Income/Expense changes
Corporate structure
changes
AML
Peer group analysis
Pattern Analysis
Customer InformationWeb
Social
Channel
Partners
Utility
Providers
Aadhar
UIDAI
• Lowers cost of follow-up with users
• Reduces loses by highlighting risky
users early
• Graph database based AML
• Insights into
• New products
• New customers
• New loans to existing customers
• New investment opportunities for
customers
• Reduces operational errors
• Traceability of data source
Advantages
of Hadoop (KYC) Solution to Banks
AML
Graph
Queries
Due
Diligence
Risk
Credit
Scoring
Mitigation
Analysis
Peer
groups
New
Prospects
Insights
New
Products
New
Customers
Atidan Case Study
Email scanning and categorization using MongoDB
Business Need
Retrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s
page for frontend access
The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM
Solution
Atidan proposed a MongoDB-Drupal based solution with the following approach:
• Scheduler was created to pull only headers from the all-user common webmail account
• Stored them into the intermediate Catalog in MongoDB
• Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered
records and saved into the final Catalog in MongoDB
• Emails from the final catalog pushed into the front end platform (Drupal)
Key Takeaways
• Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible
• The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
Atidan Case Study
Email scanning and categorization using MongoDB
• Node.js (data transformation)
• MongoDB (database)
• Schema-less
• RESTFUL service to access data from the browser
• Drupal (Frontend)
• Basic unit of data storage and transfer was JSON object
• Storage and querying
• NoSQL/Simple/Schema-less database
• Advantages
• highly scalable, very flexible, simple
• Connectivity
• node.js
 Server side Javascript
Technologies used
Thank you!
www.atidan.com
social@atidan.com

Contenu connexe

Tendances

Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 

Tendances (20)

Data visualization
Data visualizationData visualization
Data visualization
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Business intelligence ppt
Business intelligence pptBusiness intelligence ppt
Business intelligence ppt
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sector
 

En vedette

En vedette (6)

Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data Analytics
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similaire à Three Big Data Case Studies

Similaire à Three Big Data Case Studies (20)

Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Empowering Businesses through Big Data Analytics
Empowering Businesses through  Big Data AnalyticsEmpowering Businesses through  Big Data Analytics
Empowering Businesses through Big Data Analytics
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Bi orientations
Bi orientationsBi orientations
Bi orientations
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Foundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .pptFoundation of Business Intelligence for Business Firms .ppt
Foundation of Business Intelligence for Business Firms .ppt
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Three Big Data Case Studies

  • 2. Great use cases of Big Data Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 3600 View of the Customer Extend existing customer views (CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Operations Analysis Analyze a variety of machine data for improved business results
  • 3. • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue Why Big Data File Systems Relational Data Content Mgmt Email CRM Supply Chain ERP RSS Feeds Cloud Custom SourcesDataViews Applications/ Users
  • 4. Atidan Approach Implement a Hadoop- centric reference architecture Move enterprise batch processing to Hadoop Make Hadoop the single point of truth Massively reduce ETL by transforming within Hadoop Move results and aggregates back to legacy systems for consumption Retain, within Hadoop, source files at the finest granularity for re-use Top Criteria • Allow users to use familiar consumption interfaces (web, mobile) • Enable businesses to unlock previously unusable data Unlock Big Data Simplify Your Warehouse Preprocess Raw Data Ingest BigData ArchitectureHighlevel
  • 5.
  • 6. Atidan Case Study Usage Analysis using Hadoop • Business Need • A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs • The logs received from IIS were stored in multiple files e.g. Daily logs • The data had free text, it was unstructured and it also contained irrelevant data • The exact analysis criteria/parameters/desired outcome were not pre-known • Solution • Traditional RDBMS could not handle the problem due to the type and volume of the data and the uncertainty around ultimate analysis criteria • Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily • The solution was fault tolerant to data inconsistencies • Hadoop provided elasticity to incremental data addition • Scalability in the range of Peta Bytes • Based on data size and complexity, the processing can be scaled from one node to 100 nodes • Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage in the project • The organization got completely new and unexpected insights on employee, customer and vendor/partner behavior • Correlations between employee’s usage pattern and attrition as well as productivity were established
  • 7. Atidan Case Study Usage Analysis using Hadoop 0 2000 4000 6000 8000 10000 12000 14000 Accepted… BadRequest… Created(201) Forbidden… Not… NotFound… OK(200) Unauthorise… Request Types 0 200 400 600 800 1000 1200 January March May July September November January March May July September November 2001 2002 Monthly Requests 0 200000 400000 600000 Amare Amit Bhagat Mukesh Praneel Sanjog Vimal Users
  • 8. • The size of data being collected and analyzed in industry for business intelligence (BI) is growing rapidly making traditional warehousing solution prohibitively expensive • Map Reduce is low level and complex to write • Hive provides high level query language like SQL • This allows for ad-hoc analysis • Business need not know patterns to look for in advance Big Query - Hive
  • 9.
  • 10. Atidan Case Study Customer data collection (KYC) using Hadoop • Business Need • A financial institution had to periodically collect customer data • Customers are very reluctant to provide updated data • This customer data has to be cross-checked against the billions of transactions they receive per day • They want to collate data that is available in public domain from known social media sites • The data had free text, it was unstructured and it also contained irrelevant data • Solution • A graph database is constructed over the extracted social data to analyze transactions • Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database • Aggregate customer information from existing sources, social media, government sources • Analyzed transaction to find hidden patterns • Enable link analysis, risk monitoring • Facilitate decision making(new products) and customer discovery
  • 11. Atidan Case Study Customer data collection (KYC) using Hadoop Big Data Processing Graph Database Customer Clustering Income/Expense changes Corporate structure changes AML Peer group analysis Pattern Analysis Customer InformationWeb Social Channel Partners Utility Providers Aadhar UIDAI
  • 12. • Lowers cost of follow-up with users • Reduces loses by highlighting risky users early • Graph database based AML • Insights into • New products • New customers • New loans to existing customers • New investment opportunities for customers • Reduces operational errors • Traceability of data source Advantages of Hadoop (KYC) Solution to Banks AML Graph Queries Due Diligence Risk Credit Scoring Mitigation Analysis Peer groups New Prospects Insights New Products New Customers
  • 13.
  • 14. Atidan Case Study Email scanning and categorization using MongoDB Business Need Retrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s page for frontend access The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM Solution Atidan proposed a MongoDB-Drupal based solution with the following approach: • Scheduler was created to pull only headers from the all-user common webmail account • Stored them into the intermediate Catalog in MongoDB • Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered records and saved into the final Catalog in MongoDB • Emails from the final catalog pushed into the front end platform (Drupal) Key Takeaways • Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible • The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency
  • 15. Atidan Case Study Email scanning and categorization using MongoDB
  • 16. • Node.js (data transformation) • MongoDB (database) • Schema-less • RESTFUL service to access data from the browser • Drupal (Frontend) • Basic unit of data storage and transfer was JSON object • Storage and querying • NoSQL/Simple/Schema-less database • Advantages • highly scalable, very flexible, simple • Connectivity • node.js  Server side Javascript Technologies used