SlideShare une entreprise Scribd logo
1  sur  15
Big Data Analytics Lifecycle
K.Sathyaseelan
Assistant Professor,
Department of Information Technology,
Sri Ramakrishna Institute of Technology,
Coimbatore.
Overview
• Discovery
• Data preparation
• Model planning
• Model execution
• Communicate results
• Operationalize
Source – EMC² - Data Science and Big data analytics
Key stakeholders of an analytics project.
• Since data science consist of various phases similarly it involves various
stake holders.
• Totally seven roles are specified in the process for successful completion of
the project.
• Business User: Someone who understands the domain area
• This person can consult and advise the project team on the context of the project,
the value of the results, and how the outputs will be operationalized.
– Usually a business analyst, line manager, or deep subject matter expert in the
project domain fulfills this role.
Source – EMC² - Data Science and Big data analytics
Key stakeholders of an analytics project.
• Project Sponsor: Responsible for the genesis of the project.
• Project Manager: Ensures that key milestones and objectives are met on
time and at the expected quality.
• Business Intelligence Analyst: Provides business domain expertise based
on a deep understanding of the data, key performance indicators (KPIs),
key metrics, and business intelligence from a reporting perspective.
• Database Administrator (DBA): Provisions and configures the database
environment to support the analytics needs of the working team.
• Data Engineer: Leverages deep technical skills to assist with tuning SQL
queries for data management and data extraction, and provides support
for data ingestion.
Source – EMC² - Data Science and Big data analytics
Data Analytics Lifecycle
• Phase 1— Discovery: Team learns the business domain, including relevant
history such as whether the organization or business unit has attempted
similar projects in the past from which they can learn.
• The team assesses the resources available to support the project in terms
of people, technology, time, and data.
• Important activities in this phase include framing the business problem as
an analytics challenge that can be addressed in subsequent phases and
formulating initial hypotheses (IHs) to test and begin learning the data.
Source – EMC² - Data Science and Big data analytics
Phase 1— Discovery:
• The team should perform five main activities during this step of
the discovery phase:
• Identify data sources: Make a list of data sources the team may
need to test the initial hypotheses outlined in this phase.
– Make an inventory of the datasets currently available and
those that can be purchased or otherwise acquired for the tests
the team wants to perform.
• Capture aggregate data sources: This is for previewing the data and
providing high-level understanding.
– It enables the team to gain a quick overview of the data and
perform further exploration on specific areas.
• Review the raw data: Begin understanding the interdependencies
among the data attributes.
– Become familiar with the content of the data, its quality, and
its limitations. Source – EMC² - Data Science and Big data analytics
Phase 1— Discovery:
• Evaluate the data structures and tools needed: The data type and
structure dictate which tools the team can use to analyze the data.
• Scope the sort of data infrastructure needed for this type of
problem: In addition to the tools needed, the data influences the
kind of infrastructure that's required, such as disk storage and
network capacity.
Source – EMC² - Data Science and Big data analytics
Data Analytics Lifecycle
• Phase 2— Data preparation: Phase 2 requires the presence of an
analytic sandbox (workspace), in which the team can work with data
and perform analytics for the duration of the project.
• The team needs to execute extract, load, and transform (ELT) or
extract, transform and load (ETL) to get data into the sandbox.
• The ELT and ETL are sometimes abbreviated as ETLT. Data
should be transformed in the ETLT process so the team can work
with it and analyze it.
Source – EMC² - Data Science and Big data analytics
Common Tools for the Data Preparation Phase
• Hadoop
• Alpine Miner
• OpenRefine
Source – EMC² - Data Science and Big data analytics
Data Analytics Lifecycle
• Phase 3— Model planning:
Phase 3 is model planning, where the
team determines the methods, techniques, and workflow it intends to
follow for the subsequent model building phase.
Source – EMC² - Data Science and Big data analytics
Common Tools for the Model Planning Phase
Here are several of the more common ones:
• R
• SQL Analysis
• SAS/ ACCESS
Source – EMC² - Data Science and Big data analytics
Data Analytics Lifecycle
• Phase 4 Model building:
In Phase 4, the team develops datasets for testing, training, and
production purposes.
– In addition, in this phase the team builds and executes models
based on the work done in the model planning phase.
Source – EMC² - Data Science and Big data analytics
Data Analytics Lifecycle
• Phase 5— Communicate results: In Phase 5, the team, in
collaboration with major stakeholders, determines if the results
of the project are a success or a failure based on the criteria
developed in Phase 1.
• Phase 6— Operationalize: In Phase 6, the team delivers final
reports, briefings, code, and technical documents. In addition,
the team may run a pilot project to implement the models in a
production environment.
Source – EMC² - Data Science and Big data analytics
Common Tools for the Model Building Phase
Commercial Tools:
• SAS Enterprise Miner
• SPSS Modeler (provided by IBM and now called IBM SPSS
Modeler)
• Matlab
• Alpine Miner.
• STATISTICA and Mathematica
Source – EMC² - Data Science and Big data analytics
Common Tools for the Model Building Phase
Free or Open Source tools:
• R and PL/ R
• Octave.
• WEKA
• Python
• SQL
• MADlib
Source – EMC² - Data Science and Big data analytics

Contenu connexe

Tendances

Datamining
DataminingDatamining
Datamining
sumit621
 
Data analysis – using computers for presentation
Data analysis – using computers for presentationData analysis – using computers for presentation
Data analysis – using computers for presentation
Noonapau
 

Tendances (20)

Datamining
DataminingDatamining
Datamining
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
Data analytics
Data analyticsData analytics
Data analytics
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Data analysis
Data analysisData analysis
Data analysis
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data analysis – using computers for presentation
Data analysis – using computers for presentationData analysis – using computers for presentation
Data analysis – using computers for presentation
 
The Big Metadata
The Big MetadataThe Big Metadata
The Big Metadata
 
11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining11.challenging issues of spatio temporal data mining
11.challenging issues of spatio temporal data mining
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 

Similaire à Bda life cycle slideshare

1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
Ayele40
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
randyburney60861
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
wekineheshete
 

Similaire à Bda life cycle slideshare (20)

MODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptxMODULE 1_Introduction to Data analytics and life cycle..pptx
MODULE 1_Introduction to Data analytics and life cycle..pptx
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
 
Software Development Life Cycle
Software Development Life CycleSoftware Development Life Cycle
Software Development Life Cycle
 
Sdlc 4
Sdlc 4Sdlc 4
Sdlc 4
 
crisp.ppt
crisp.pptcrisp.ppt
crisp.ppt
 
crisp.ppt
crisp.pptcrisp.ppt
crisp.ppt
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Module 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdfModule 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdf
 
Software Analytics
Software AnalyticsSoftware Analytics
Software Analytics
 
Topic5 - IT Implementation & Challenges.pptx
Topic5 - IT Implementation & Challenges.pptxTopic5 - IT Implementation & Challenges.pptx
Topic5 - IT Implementation & Challenges.pptx
 
5.Developing IT Solution.pptx
5.Developing IT Solution.pptx5.Developing IT Solution.pptx
5.Developing IT Solution.pptx
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science Lifecycle
 

Plus de SathyaseelanK1 (6)

CSS gradients
CSS gradientsCSS gradients
CSS gradients
 
SVG introduction
SVG   introductionSVG   introduction
SVG introduction
 
New tags in html5
New tags in html5New tags in html5
New tags in html5
 
Css Introduction
Css IntroductionCss Introduction
Css Introduction
 
Html5 examples
Html5 examplesHtml5 examples
Html5 examples
 
Html 5 Tags - Beginners
Html 5  Tags - BeginnersHtml 5  Tags - Beginners
Html 5 Tags - Beginners
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Bda life cycle slideshare

  • 1. Big Data Analytics Lifecycle K.Sathyaseelan Assistant Professor, Department of Information Technology, Sri Ramakrishna Institute of Technology, Coimbatore.
  • 2. Overview • Discovery • Data preparation • Model planning • Model execution • Communicate results • Operationalize Source – EMC² - Data Science and Big data analytics
  • 3. Key stakeholders of an analytics project. • Since data science consist of various phases similarly it involves various stake holders. • Totally seven roles are specified in the process for successful completion of the project. • Business User: Someone who understands the domain area • This person can consult and advise the project team on the context of the project, the value of the results, and how the outputs will be operationalized. – Usually a business analyst, line manager, or deep subject matter expert in the project domain fulfills this role. Source – EMC² - Data Science and Big data analytics
  • 4. Key stakeholders of an analytics project. • Project Sponsor: Responsible for the genesis of the project. • Project Manager: Ensures that key milestones and objectives are met on time and at the expected quality. • Business Intelligence Analyst: Provides business domain expertise based on a deep understanding of the data, key performance indicators (KPIs), key metrics, and business intelligence from a reporting perspective. • Database Administrator (DBA): Provisions and configures the database environment to support the analytics needs of the working team. • Data Engineer: Leverages deep technical skills to assist with tuning SQL queries for data management and data extraction, and provides support for data ingestion. Source – EMC² - Data Science and Big data analytics
  • 5. Data Analytics Lifecycle • Phase 1— Discovery: Team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. • The team assesses the resources available to support the project in terms of people, technology, time, and data. • Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data. Source – EMC² - Data Science and Big data analytics
  • 6. Phase 1— Discovery: • The team should perform five main activities during this step of the discovery phase: • Identify data sources: Make a list of data sources the team may need to test the initial hypotheses outlined in this phase. – Make an inventory of the datasets currently available and those that can be purchased or otherwise acquired for the tests the team wants to perform. • Capture aggregate data sources: This is for previewing the data and providing high-level understanding. – It enables the team to gain a quick overview of the data and perform further exploration on specific areas. • Review the raw data: Begin understanding the interdependencies among the data attributes. – Become familiar with the content of the data, its quality, and its limitations. Source – EMC² - Data Science and Big data analytics
  • 7. Phase 1— Discovery: • Evaluate the data structures and tools needed: The data type and structure dictate which tools the team can use to analyze the data. • Scope the sort of data infrastructure needed for this type of problem: In addition to the tools needed, the data influences the kind of infrastructure that's required, such as disk storage and network capacity. Source – EMC² - Data Science and Big data analytics
  • 8. Data Analytics Lifecycle • Phase 2— Data preparation: Phase 2 requires the presence of an analytic sandbox (workspace), in which the team can work with data and perform analytics for the duration of the project. • The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. • The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. Source – EMC² - Data Science and Big data analytics
  • 9. Common Tools for the Data Preparation Phase • Hadoop • Alpine Miner • OpenRefine Source – EMC² - Data Science and Big data analytics
  • 10. Data Analytics Lifecycle • Phase 3— Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. Source – EMC² - Data Science and Big data analytics
  • 11. Common Tools for the Model Planning Phase Here are several of the more common ones: • R • SQL Analysis • SAS/ ACCESS Source – EMC² - Data Science and Big data analytics
  • 12. Data Analytics Lifecycle • Phase 4 Model building: In Phase 4, the team develops datasets for testing, training, and production purposes. – In addition, in this phase the team builds and executes models based on the work done in the model planning phase. Source – EMC² - Data Science and Big data analytics
  • 13. Data Analytics Lifecycle • Phase 5— Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. • Phase 6— Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment. Source – EMC² - Data Science and Big data analytics
  • 14. Common Tools for the Model Building Phase Commercial Tools: • SAS Enterprise Miner • SPSS Modeler (provided by IBM and now called IBM SPSS Modeler) • Matlab • Alpine Miner. • STATISTICA and Mathematica Source – EMC² - Data Science and Big data analytics
  • 15. Common Tools for the Model Building Phase Free or Open Source tools: • R and PL/ R • Octave. • WEKA • Python • SQL • MADlib Source – EMC² - Data Science and Big data analytics