SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Designing Big Data Pipelines
Claudio Ardagna, Paolo Ceravolo, Ernesto Damiani
Big Data
• A huge amount of data are generated and
collected every minute (sensors)
• 1.7 million billion bytes of data, over 6
megabytes for each human (2016)
• 2.5 quintillion bytes of data created each day
• The trend is rapidly accelerating with the
growth of the Internet of Things (IoT),
200 billions of connected devices by 2020
• Low latency access to huge distributed data
sources has become a value proposition
• Business intelligence applications require
proper big data analysis and management
functionalities
What
to do
with
these
data?
Aggregation and
Statistics
Data
warehouse
and OLAP
Indexing,
Searching, and
Querying
Keyword
based search
Pattern
matching
(XML/RDF)
Knowledge
discovery
Data Mining
Statistical
Modeling
Machine
Learning
The Big
Data
difference
• Classic analytics assume:
• Standard data
models/formats
• Reasonable volumes
• Loose deadlines
• Problem: The five Vs
jeopardise these assumptions
– (unless we sample or
summarize)
The data analytics pipeline
Processing Models: batch vs stream
• Batch
• Receive, accumulate then
compute (data lake)
• Stream
• Compute while receiving
(data flow)
• Same questions, different
algorithms
• Both different from “mouse”
computations
Hurdles in
Adoption of
Big Data
Technologies
• Complex Architecture
• Lack of Standardization
• Regulatory Barriers
• Violation of Data
Access
• Sharing & Custody
Regulation
• High Cost of Legal
Clearance
Big Data
As a
Service
• A set of automatic tools
and a methodology that
allows customers to design
and deploy a full Big Data
pipeline addressing their
goals
How to
design a Big
Data Pipeline
1. Define a Business Value
2. Identify the Data Sources
3. Define the Data Flow
4. Study Data Protection Directives
5. Define Visualization, Reporting
and Interaction
6. Select Data Preparation Stages
7. Identify Processing
Requirements
8. Select Analytics
9. Define the Data Processing Flow
Big Data Pipeline Areas
Ingestion and
representation
Preparation
Processing
Analytics
Display and reporting
Specify how data are represented: NoSQL, Graph-
based, Relational, Extended relational, Markup
based, Hybrid
Specify how data will be routed and
parallelized, and how the analytics will be
computed: parallel batch, stream, hybrid
Specify the expected outcome: descriptive,
prescriptive, predictive
Specify the display and reporting of the
results: scalar, multi-dimensional
Specify how to prepare data for analitycs:
anonymize, reduce dimensions, hash
• Abstract the typical
procedural models (e.g., data
pipeline) implemented in big
data frameworks
• Develop model
transformations to translate
modelling decisions into
actual provisioning
Model Driven Approach
Declarative
Models
Procedural
Models
Deployment
Models
(Non-)Functional
Goals: Service goals
of Big Data Pipeline
What the BDA should
achieve and how to
achieve objectives
How the BDA process
should work
Declarative
Model
• Specify non-functional/functional
goals
• Single model addressing all
aspects of big data pipelines:
preparation, representation,
analytics, processing, display
and reporting
• Aspects of different areas
may impact on the same
procedural model template
• Some goals map directly to Service
Level Agreements (SLAs), others
need a transformation function to
map to SLAs
Procedural
Model
• Contain all information needed for
running the analytics
• Simple to map declarative goals on
procedures
• Platform independent
• Specified procedural templates
(alternatives)
• Procedural templates correspond to
defined goals
• May need additional input from final
users of big data services
• Templates express competences of data
scientist and data technologist
• Declarative models used to select the
(set of) proper templates
Deployment
Model
• Specify how procedural
models are to be incarnated
in a ready-to-be-deployed
architecture
• Drive analytics execution in
real scenarios
• To be defined for each
application
• Platform dependent
Methodology again
Declarative
Model
Specification
Service
Selection
Procedural
Model
Definition
Workflow
Compiler
Deployment
Model
Execution
Declarative
Specifications
Service
Catalog
Service
Composition
Repository
Deployment
Configurations
MBDAaaS
Platform Big Data
Platform
Tocode-based
Torecipies

Contenu connexe

Tendances

Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
madynav
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Vignesh Prajapati
 

Tendances (20)

Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
 
Business Innovations Through Big Data Analytics - 30th November 2017
Business Innovations Through Big Data Analytics - 30th November 2017Business Innovations Through Big Data Analytics - 30th November 2017
Business Innovations Through Big Data Analytics - 30th November 2017
 
Machine Learning in the Data Science Context
Machine Learning in the Data Science ContextMachine Learning in the Data Science Context
Machine Learning in the Data Science Context
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Big data in Food sector
Big data in Food sectorBig data in Food sector
Big data in Food sector
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Big data
Big dataBig data
Big data
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
 

Similaire à BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Damiani)

Similaire à BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Damiani) (20)

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Big Data SE vs. SE for Big Data
Big Data SE vs. SE for Big DataBig Data SE vs. SE for Big Data
Big Data SE vs. SE for Big Data
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentation
 
StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical Overview
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 

Plus de Big Data Value Association

Plus de Big Data Value Association (20)

Data Privacy, Security in personal data sharing
Data Privacy, Security in personal data sharingData Privacy, Security in personal data sharing
Data Privacy, Security in personal data sharing
 
Key Modules for a trsuted and privacy preserving personal data marketplace
Key Modules for a trsuted and privacy preserving personal data marketplaceKey Modules for a trsuted and privacy preserving personal data marketplace
Key Modules for a trsuted and privacy preserving personal data marketplace
 
GDPR and Data Ethics considerations in personal data sharing
GDPR and Data Ethics considerations in personal data sharingGDPR and Data Ethics considerations in personal data sharing
GDPR and Data Ethics considerations in personal data sharing
 
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
 
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Three pillars for building a Smart Data Ecosystem: Trust, Security and PrivacyThree pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
 
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
 
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
 
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
 
BDV Skills Accreditation - EIT labels for professionals
BDV Skills Accreditation - EIT labels for professionalsBDV Skills Accreditation - EIT labels for professionals
BDV Skills Accreditation - EIT labels for professionals
 
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
 
BDV Skills Accreditation - Objectives of the workshop
BDV Skills Accreditation - Objectives of the workshopBDV Skills Accreditation - Objectives of the workshop
BDV Skills Accreditation - Objectives of the workshop
 
BDV Skills Accreditation - Welcome introduction to the workshop
BDV Skills Accreditation - Welcome introduction to the workshopBDV Skills Accreditation - Welcome introduction to the workshop
BDV Skills Accreditation - Welcome introduction to the workshop
 
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
 
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector WebinarBigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
 
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
 
Virtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench FrameworkVirtual BenchLearning - Data Bench Framework
Virtual BenchLearning - Data Bench Framework
 
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for BenchmarkingVirtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
 
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
 
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
Policy Cloud Data Driven Policies against Radicalisation - Technical OverviewPolicy Cloud Data Driven Policies against Radicalisation - Technical Overview
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
 
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
 

Dernier

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Damiani)

  • 1. Designing Big Data Pipelines Claudio Ardagna, Paolo Ceravolo, Ernesto Damiani
  • 2. Big Data • A huge amount of data are generated and collected every minute (sensors) • 1.7 million billion bytes of data, over 6 megabytes for each human (2016) • 2.5 quintillion bytes of data created each day • The trend is rapidly accelerating with the growth of the Internet of Things (IoT), 200 billions of connected devices by 2020 • Low latency access to huge distributed data sources has become a value proposition • Business intelligence applications require proper big data analysis and management functionalities
  • 3. What to do with these data? Aggregation and Statistics Data warehouse and OLAP Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF) Knowledge discovery Data Mining Statistical Modeling Machine Learning
  • 4. The Big Data difference • Classic analytics assume: • Standard data models/formats • Reasonable volumes • Loose deadlines • Problem: The five Vs jeopardise these assumptions – (unless we sample or summarize)
  • 6. Processing Models: batch vs stream • Batch • Receive, accumulate then compute (data lake) • Stream • Compute while receiving (data flow) • Same questions, different algorithms • Both different from “mouse” computations
  • 7. Hurdles in Adoption of Big Data Technologies • Complex Architecture • Lack of Standardization • Regulatory Barriers • Violation of Data Access • Sharing & Custody Regulation • High Cost of Legal Clearance
  • 8. Big Data As a Service • A set of automatic tools and a methodology that allows customers to design and deploy a full Big Data pipeline addressing their goals
  • 9. How to design a Big Data Pipeline 1. Define a Business Value 2. Identify the Data Sources 3. Define the Data Flow 4. Study Data Protection Directives 5. Define Visualization, Reporting and Interaction 6. Select Data Preparation Stages 7. Identify Processing Requirements 8. Select Analytics 9. Define the Data Processing Flow
  • 10. Big Data Pipeline Areas Ingestion and representation Preparation Processing Analytics Display and reporting Specify how data are represented: NoSQL, Graph- based, Relational, Extended relational, Markup based, Hybrid Specify how data will be routed and parallelized, and how the analytics will be computed: parallel batch, stream, hybrid Specify the expected outcome: descriptive, prescriptive, predictive Specify the display and reporting of the results: scalar, multi-dimensional Specify how to prepare data for analitycs: anonymize, reduce dimensions, hash
  • 11. • Abstract the typical procedural models (e.g., data pipeline) implemented in big data frameworks • Develop model transformations to translate modelling decisions into actual provisioning Model Driven Approach Declarative Models Procedural Models Deployment Models (Non-)Functional Goals: Service goals of Big Data Pipeline What the BDA should achieve and how to achieve objectives How the BDA process should work
  • 12. Declarative Model • Specify non-functional/functional goals • Single model addressing all aspects of big data pipelines: preparation, representation, analytics, processing, display and reporting • Aspects of different areas may impact on the same procedural model template • Some goals map directly to Service Level Agreements (SLAs), others need a transformation function to map to SLAs
  • 13. Procedural Model • Contain all information needed for running the analytics • Simple to map declarative goals on procedures • Platform independent • Specified procedural templates (alternatives) • Procedural templates correspond to defined goals • May need additional input from final users of big data services • Templates express competences of data scientist and data technologist • Declarative models used to select the (set of) proper templates
  • 14. Deployment Model • Specify how procedural models are to be incarnated in a ready-to-be-deployed architecture • Drive analytics execution in real scenarios • To be defined for each application • Platform dependent