SlideShare a Scribd company logo
1 of 27
BIG DATA ANALYTICS
REFERENCE ARCHITECTURES AND
CASE STUDIES
BY SERHIY HAZIYEV AND OLHA HRYTSAY
Agenda
2
Big Data
Challenges
Big Data
Reference
Architectures
Case
Studies
10 tips for
Designing
Big Data
Solutions
Big Data Challenges
3
UNSTRUCTURED
STRUCTURED
HIGH
MEDIUM
LOW
Archives Docs Business
Apps
Media Social
Networks
Public
Web
Data
Storages
Machine
Log Data
Sensor
Data
Data Storages
RDBMS, NoSQL, Hadoop, file systems
etc.
Machine Log Data
Application logs, event logs, server
data, CDRs, clickstream data etc.
Sensor Data
Smart electric meters, medical
devices, car sensors, road cameras
etc.
Archives
Scanned documents, statements,
medical records, e-mails etc..
Docs
XLS, PDF, CSV, HTML, JSON etc.
Business Apps
CRM, ERP systems, HR, project
management etc.
Social Networks
Twitter, Facebook, Google+,
LinkedIn etc.
Public Web
Wikipedia, news, weather, public
finance etc
Media
Images, video, audio etc.
Velocity Variety VolumeComplexity
Big Data Analytics
4
Traditional Analytics (BI) Big Data Analytics
Focus on
Data Sets
Supports
• Descriptive analytics
• Diagnosis analytics
• Limited data sets
• Cleansed data
• Simple models
• Large scale data sets
• More types of data
• Raw data
• Complex data models
• Predictive analytics
• Data Science
Causation: what happened,
and why?
Correlation: new insight
More accurate answers
vs
Big Data Analytics Use Cases
5
Data
Discovery
Business
Reporting
Real Time
Intelligence
Data Quality
Self Service
Business Users
Intelligent AgentsConsumers
Low Latency
Reliability
Volume
Performance
Data Scientists/
Analysts
Big Data Analytics Reference Architectures
6
Architecture Drivers: Reference Architectures:
▪ Volume
▪ Sources
▪ Throughput
▪ Latency
▪ Extensibility
▪ Data Quality
▪ Reliability
▪ Security
▪ Self-Service
▪ Cost
▪ Extended Relational
▪ Non-Relational
▪ Hybrid
Relational Reference Architecture
7
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OLAP Cubes
Query &
Reporting
Operational
Data Stores
Data Marts
Data
Warehouses
Replication
API/ODBC
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
8
Extended Relational
Reference Architecture
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
OLAP Cubes
Query &
Reporting
Operational
Data Stores
Data Marts
Data
Warehouses
Replication
API/ODBC
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Key components affected with Big Data challenges
Non-Relational Reference Architecture
9
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
Map Reduce
Query &
Reporting
Search Engines
Distributed File
Systems
NoSQL
Databases
API
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Key components introduced with non-relational movement
Extended Relational vs. Non-Relational Architecture
10
Architecture Drivers
Extended
Relational
Non-Relational
Large data volume
Self-service (ad-hoc reporting)
Unstructured data processing
High data model extensibility
High data quality and consistency
Extensive security
Reliability and fault-tolerance
Low latency (near-real time)
Low cost
Skills availability
Extended Relational vs. Non-Relational Architecture
11
Architecture Drivers
Extended
Relational
Non-Relational
Large data volume
Self-service (ad-hoc reporting)
Unstructured data processing
High data model extensibility
High data quality and consistency
Extensive security
Reliability and fault-tolerance
Low latency (near-real time)
Low cost
Skills availability
Relational vs. Non-Relational Architecture
12
Relational Non-Relational
• Rational
• Predictable
• Traditional
• Agile
• Flexible
• Modern
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
13
Business Users
Intelligent AgentsConsumers
Performance
Volume
Data Scientists
Data Discovery: Non-Relational Architecture
14
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Advanced
Analytics
Map Reduce
Query &
Reporting
Search Engines
Distributed File
Systems
NoSQL
Databases
API
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
15
Intelligent AgentsConsumers
Data Scientists
Data Quality
Self Service
Business Users
Business Reporting: Hybrid Architecture
16
Web Services
Mobile
Devices
Native
Desktop
Web
Browsers
Map Reduce
SQL Query &
Reporting
Distributed File
Systems
API
Messaging
ETL
Unstructured
Semi-
Structured
Data Sources Integration Data Storages Analytics Presentation
Structured
Relational
DWH/DM
Advanced
Analytics
Search Engines
Extended Relational components Non-relational components
Data
Discovery
Business
Reporting
Real Time
Intelligence
Big Data Analytics Use Cases
17
Data Scientists Business Users
Intelligent AgentsConsumers
Low Latency
Reliability
Lambda Architecture
18
Source:
19
Business Goals:
 Provide development environment
for building custom mobile applications
 Charge customers for the platform they
use with pay-as-you-go model
Business Area:
Cloud based platform for building, deploying,
hosting and managing mobile applications
Case Study #1: Usage & Billing Analysis
Architectural Decisions
20
▪ Volume (> 10 TB)
▪ Sources (Semi-structured - JSON)
▪ Throughput (> 10K/sec)
▪ Latency (2 min)
▪ Extensibility (Custom metrics)
▪ Data Quality (Consistency)
▪ Reliability (24/7)
▪ Security (Multitenancy)
▪ Self-Service (Ad-Hoc reports)
▪ Cost (The less the better )
▪ Constraints (Public Cloud)
Architecture Drivers:
Trade-off:
//
Extended
Relational
Non-Relational
Extensibility - +
Data Quality + -
Self-Service + -
 Extended Relational Architecture
 Extensibility via Pre-allocated
Fields pattern
Solution Architecture
21
Technologies:
• Amazon Redshift
• Amazon SQS
• Amazon S3
• Elastic Beanstalk
• Jaspersoft BI Professional
• Python
22
Business Goals:
 Build in-house Analytics Platform for ROI measurement
and performance analysis of every product and feature
delivered by the e-commerce platform;
 Provide the ability to understand how end-users are
interacting with service content, products, and features on
sites;
 Do clickstream analysis;
 Perform A/B Testing
Business Area:
Retail. A platform for e-commerce and
collecting feedbacks from customers
Case Study #2: Clickstream for retail website
//
Extended
Relational
Non-
Relational
Volume/Scalability +/- +
Throughput + +
Self-Service + +/-
Extensibility - +
Architectural Decisions
23
▪ Volume (45 TB)
▪ Sources (Semi-structured - JSON)
▪ Throughput (> 20K/sec)
▪ Latency (1 hour)
▪ Extensibility (Custom tags)
▪ Data Quality (Not critical)
▪ Reliability (24/7)
▪ Security (Multitenancy)
▪ Self-Service (Canned reports, Data
science)
▪ Cost (The less the better )
▪ Constraints (Public Cloud)
Architecture Drivers:
Trade-off:
 Non-Relational Architecture
 Reporting via Materialized View
pattern
Solution Architecture
24
Technologies:
• Amazon S3
• Flume
• Hadoop/HDFS, MapReduce
• HBase
• Oozie
• Hive
Node 1
Node 2
Node N
10 Tips for Designing Big Data Solutions
25
 Understand data users and sources
 Discover architecture drivers
 Select proper reference architecture
 Do trade-off analysis, address cons
 Map reference architecture to technology stack
 Prototype, re-evaluate architecture
 Estimate implementation efforts
 Set up devops practices from the very beginning
 Advance in solution development through “small wins”
 Be ready for changes, big data technologies are evolving
rapidly
26
▪ Leading global Product and
Application Development partner
founded in 1993
▪ 3,300+ employees across North
America, Ukraine and Western
Europe
▪ Thousands of successful outsourcing
projects!
SaaS/Cloud Solutions . Mobility Solutions . UX/UI
BI/Analytics/Big Data . Software Architecture . Security
Clients include:
Thank You!
27
SoftServe US Office
One Congress Plaza,
111 Congress Avenue, Suite 2700 Austin, TX
78701
Tel: 512.516.8880
Contacts
Serhiy Haziyev: shaziyev@softserveinc.com
Olha Hrytsay: ohrytsay@softserveinc.com

More Related Content

What's hot

Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
DataWorks Summit
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
dmurph4
 

What's hot (20)

Key Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramKey Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance Program
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture Strategies
 
The data quality challenge
The data quality challengeThe data quality challenge
The data quality challenge
 
Customer-Centric Data Management for Better Customer Experiences
 Customer-Centric Data Management for Better Customer Experiences Customer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer Experiences
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step Approach
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
 
Data Governance Maturity Model Thesis
Data Governance Maturity Model ThesisData Governance Maturity Model Thesis
Data Governance Maturity Model Thesis
 
Data Management vs Data Strategy
Data Management vs Data StrategyData Management vs Data Strategy
Data Management vs Data Strategy
 
Data Management is Data Governance
Data Management is Data GovernanceData Management is Data Governance
Data Management is Data Governance
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Requirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - PresentationRequirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - Presentation
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDriving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJX
 
Self-Service Analytics
Self-Service AnalyticsSelf-Service Analytics
Self-Service Analytics
 

Similar to Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziyev and Olha Hrytsay

Similar to Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziyev and Olha Hrytsay (20)

big_data_case_studies.pdf
big_data_case_studies.pdfbig_data_case_studies.pdf
big_data_case_studies.pdf
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 

More from SoftServe

More from SoftServe (20)

Approaching Quality in Digital Era
Approaching Quality in Digital EraApproaching Quality in Digital Era
Approaching Quality in Digital Era
 
Digital Product Security
Digital Product SecurityDigital Product Security
Digital Product Security
 
Testing Tools and Tips
Testing Tools and TipsTesting Tools and Tips
Testing Tools and Tips
 
Android Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsAndroid Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, Tools
 
Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...
 
How to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsHow to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps Solutions
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS
 
Implementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowImplementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should Know
 
Using AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondUsing AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and Beyond
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
 
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
 
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
 
Managing Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovManaging Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max Markov
 
How to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullyHow to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions Successfully
 
Designing Big Data Systems Like a Pro
Designing Big Data Systems Like a ProDesigning Big Data Systems Like a Pro
Designing Big Data Systems Like a Pro
 
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukProduct Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
 

Recently uploaded

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziyev and Olha Hrytsay

  • 1. BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BY SERHIY HAZIYEV AND OLHA HRYTSAY
  • 3. Big Data Challenges 3 UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW Archives Docs Business Apps Media Social Networks Public Web Data Storages Machine Log Data Sensor Data Data Storages RDBMS, NoSQL, Hadoop, file systems etc. Machine Log Data Application logs, event logs, server data, CDRs, clickstream data etc. Sensor Data Smart electric meters, medical devices, car sensors, road cameras etc. Archives Scanned documents, statements, medical records, e-mails etc.. Docs XLS, PDF, CSV, HTML, JSON etc. Business Apps CRM, ERP systems, HR, project management etc. Social Networks Twitter, Facebook, Google+, LinkedIn etc. Public Web Wikipedia, news, weather, public finance etc Media Images, video, audio etc. Velocity Variety VolumeComplexity
  • 4. Big Data Analytics 4 Traditional Analytics (BI) Big Data Analytics Focus on Data Sets Supports • Descriptive analytics • Diagnosis analytics • Limited data sets • Cleansed data • Simple models • Large scale data sets • More types of data • Raw data • Complex data models • Predictive analytics • Data Science Causation: what happened, and why? Correlation: new insight More accurate answers vs
  • 5. Big Data Analytics Use Cases 5 Data Discovery Business Reporting Real Time Intelligence Data Quality Self Service Business Users Intelligent AgentsConsumers Low Latency Reliability Volume Performance Data Scientists/ Analysts
  • 6. Big Data Analytics Reference Architectures 6 Architecture Drivers: Reference Architectures: ▪ Volume ▪ Sources ▪ Throughput ▪ Latency ▪ Extensibility ▪ Data Quality ▪ Reliability ▪ Security ▪ Self-Service ▪ Cost ▪ Extended Relational ▪ Non-Relational ▪ Hybrid
  • 7. Relational Reference Architecture 7 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured
  • 8. 8 Extended Relational Reference Architecture Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics OLAP Cubes Query & Reporting Operational Data Stores Data Marts Data Warehouses Replication API/ODBC Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Key components affected with Big Data challenges
  • 9. Non-Relational Reference Architecture 9 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Key components introduced with non-relational movement
  • 10. Extended Relational vs. Non-Relational Architecture 10 Architecture Drivers Extended Relational Non-Relational Large data volume Self-service (ad-hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault-tolerance Low latency (near-real time) Low cost Skills availability
  • 11. Extended Relational vs. Non-Relational Architecture 11 Architecture Drivers Extended Relational Non-Relational Large data volume Self-service (ad-hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault-tolerance Low latency (near-real time) Low cost Skills availability
  • 12. Relational vs. Non-Relational Architecture 12 Relational Non-Relational • Rational • Predictable • Traditional • Agile • Flexible • Modern
  • 13. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 13 Business Users Intelligent AgentsConsumers Performance Volume Data Scientists
  • 14. Data Discovery: Non-Relational Architecture 14 Web Services Mobile Devices Native Desktop Web Browsers Advanced Analytics Map Reduce Query & Reporting Search Engines Distributed File Systems NoSQL Databases API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured
  • 15. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 15 Intelligent AgentsConsumers Data Scientists Data Quality Self Service Business Users
  • 16. Business Reporting: Hybrid Architecture 16 Web Services Mobile Devices Native Desktop Web Browsers Map Reduce SQL Query & Reporting Distributed File Systems API Messaging ETL Unstructured Semi- Structured Data Sources Integration Data Storages Analytics Presentation Structured Relational DWH/DM Advanced Analytics Search Engines Extended Relational components Non-relational components
  • 17. Data Discovery Business Reporting Real Time Intelligence Big Data Analytics Use Cases 17 Data Scientists Business Users Intelligent AgentsConsumers Low Latency Reliability
  • 19. 19 Business Goals:  Provide development environment for building custom mobile applications  Charge customers for the platform they use with pay-as-you-go model Business Area: Cloud based platform for building, deploying, hosting and managing mobile applications Case Study #1: Usage & Billing Analysis
  • 20. Architectural Decisions 20 ▪ Volume (> 10 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 10K/sec) ▪ Latency (2 min) ▪ Extensibility (Custom metrics) ▪ Data Quality (Consistency) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self-Service (Ad-Hoc reports) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud) Architecture Drivers: Trade-off: // Extended Relational Non-Relational Extensibility - + Data Quality + - Self-Service + -  Extended Relational Architecture  Extensibility via Pre-allocated Fields pattern
  • 21. Solution Architecture 21 Technologies: • Amazon Redshift • Amazon SQS • Amazon S3 • Elastic Beanstalk • Jaspersoft BI Professional • Python
  • 22. 22 Business Goals:  Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature delivered by the e-commerce platform;  Provide the ability to understand how end-users are interacting with service content, products, and features on sites;  Do clickstream analysis;  Perform A/B Testing Business Area: Retail. A platform for e-commerce and collecting feedbacks from customers Case Study #2: Clickstream for retail website
  • 23. // Extended Relational Non- Relational Volume/Scalability +/- + Throughput + + Self-Service + +/- Extensibility - + Architectural Decisions 23 ▪ Volume (45 TB) ▪ Sources (Semi-structured - JSON) ▪ Throughput (> 20K/sec) ▪ Latency (1 hour) ▪ Extensibility (Custom tags) ▪ Data Quality (Not critical) ▪ Reliability (24/7) ▪ Security (Multitenancy) ▪ Self-Service (Canned reports, Data science) ▪ Cost (The less the better ) ▪ Constraints (Public Cloud) Architecture Drivers: Trade-off:  Non-Relational Architecture  Reporting via Materialized View pattern
  • 24. Solution Architecture 24 Technologies: • Amazon S3 • Flume • Hadoop/HDFS, MapReduce • HBase • Oozie • Hive Node 1 Node 2 Node N
  • 25. 10 Tips for Designing Big Data Solutions 25  Understand data users and sources  Discover architecture drivers  Select proper reference architecture  Do trade-off analysis, address cons  Map reference architecture to technology stack  Prototype, re-evaluate architecture  Estimate implementation efforts  Set up devops practices from the very beginning  Advance in solution development through “small wins”  Be ready for changes, big data technologies are evolving rapidly
  • 26. 26 ▪ Leading global Product and Application Development partner founded in 1993 ▪ 3,300+ employees across North America, Ukraine and Western Europe ▪ Thousands of successful outsourcing projects! SaaS/Cloud Solutions . Mobility Solutions . UX/UI BI/Analytics/Big Data . Software Architecture . Security Clients include:
  • 27. Thank You! 27 SoftServe US Office One Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880 Contacts Serhiy Haziyev: shaziyev@softserveinc.com Olha Hrytsay: ohrytsay@softserveinc.com

Editor's Notes

  1. Big Data – data that is too large complex and dynamic for any conventional data tools to capture, store, manage and analyze.
  2. More details can be found here: link to our case study