SlideShare a Scribd company logo
1 of 3
Big Data
Processing
Training,
R&DPower by Data Cloud Lab
[Bigdata isa fieldthattreats ways to analyze, systematically extract
informationfrom,orotherwisedeal withdata sets that are too large
or complex to be dealt with by traditional data-processing
application software. Big data was originally associated with three
key concepts: volume, variety, and velocity.]
Data Set – 1M Data:
1. Healthcare_ [Record – 46935]
2. Weather-history - [Record – 4573]
3. World Demography - [Record – 5000]
4. Census Tracts 2010 - [Record -21
5. Animal_Services_Intake_Data - [Record -187594]
6. Average_Daily_Traffic_Counts - [Record -1280]
7. Acciental_Durg_Related_Death - [Record -5106]
8. Retails Store - [Record – 182728]
customer12435,category_59,Departments_7,orders_68883,products_1345,order_items_99999
9. Popular_Baby_Names - [Record – 46935]
10. SAT__College_Board__2010_School_Level_Results - Total Data [Record -461]
11. Sales_Tax_Rates - [Record -1911]
12. Restaurants [Record -1328]
13. Transportation : 34_drivers , 17076_truck_event_text_partition , 1768_timesheet - [Record -
18878]
14. Acciental_Durg_Related_Death - [Record -5106]
15. Census Tracts 2010 - [Record -216]
16. Employees_Salary - [Record – 824]
17. Customer_transactional_spending - [Record – 60000]
18. Customer_Order - [Record – 1000]
19. Employees_Salary - [Record – 824]
Power by: Software Linux, Hadoop Big Data, Hive & Power BI)
Case Study 01: Healthcare [Record – 46935]
Raw Data (Date, Sex, Diseases, Age) :
12/10/1950,M,Diabetes,78
12/10/1984,F,PCOS,67
712/11/1940,M,Fever,90
12/12/1950,F,Cold,88
12/13/1960,M,Blood Pressure,76
Result :
Blood Pressure,5215
Cold,5215
Diabetes,5215
Fever,15645
Malaria,5215
PCOS,5215
Swine Flu,5215
Data Visualizations:
Backend Data Process by HiveQL command:
select diseases, count(*) from healthgroupby diseases;
WARNING: Hive-on-MR is deprecated inHive2 and may not beavailableinthefuture versions. Considerusing a different execution engine(i.e.
spark, tez) or using Hive 1.X releases.
Query ID =hduser_20200125220715_338a065f-f176-4464-b03e-28fb18dc66f5
Total jobs =1
Launching Job 1 outof1
Number ofreducetasks not specified. Estimated frominputdata size: 1
In order to changethe average load for a reducer (inbytes): , set hive.exec.reducers.bytes.per.reducer=<number>
In order to limitthemaximum number ofreducers: , sethive.exec.reducers.max=<number>
In order to set a constant numberofreducers: , setmapreduce.job.reduces=<number>
Job running in-process (localHadoop) , 2020-01-25 22:07:18,630Stage-1 map =100%, reduce=100%
Ended Job =job_local171670995_0001, Moving data to localdirectory /home/hduser/Dataset
MapReduceJobs Launched: , Stage-Stage-1: HDFS Read:2336322 HDFS Write: 0 SUCCESS, TotalMapReduce CPU TimeSpent:0 msec, OK
Time taken: 3.617seconds

More Related Content

What's hot

John Gladstone - ‎EMEA Healthcare Pathways and Alliances, Netapp
John Gladstone -  ‎EMEA Healthcare Pathways and Alliances, NetappJohn Gladstone -  ‎EMEA Healthcare Pathways and Alliances, Netapp
John Gladstone - ‎EMEA Healthcare Pathways and Alliances, NetappHIMSS UK
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...Stephen Allan Weitzman
 
Data Management Planning and Data Compliance Reporting with IEDA
Data Management Planning and Data Compliance Reporting with IEDAData Management Planning and Data Compliance Reporting with IEDA
Data Management Planning and Data Compliance Reporting with IEDAVicki Ferrini
 
Sapiens data science and snowflake data warehouse
Sapiens data science and snowflake data warehouseSapiens data science and snowflake data warehouse
Sapiens data science and snowflake data warehouseLarry Heminger
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareSkillspeed
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector Chris Groves
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampBigDataCamp
 
Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...
Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...
Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...David Nickelson, PsyD, JD
 

What's hot (11)

John Gladstone - ‎EMEA Healthcare Pathways and Alliances, Netapp
John Gladstone -  ‎EMEA Healthcare Pathways and Alliances, NetappJohn Gladstone -  ‎EMEA Healthcare Pathways and Alliances, Netapp
John Gladstone - ‎EMEA Healthcare Pathways and Alliances, Netapp
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
 
8 1open ehr-helsinki_29oct2018
8 1open ehr-helsinki_29oct20188 1open ehr-helsinki_29oct2018
8 1open ehr-helsinki_29oct2018
 
Data Management Planning and Data Compliance Reporting with IEDA
Data Management Planning and Data Compliance Reporting with IEDAData Management Planning and Data Compliance Reporting with IEDA
Data Management Planning and Data Compliance Reporting with IEDA
 
Sapiens data science and snowflake data warehouse
Sapiens data science and snowflake data warehouseSapiens data science and snowflake data warehouse
Sapiens data science and snowflake data warehouse
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
 
Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...
Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...
Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...
 

Similar to Data cloud-lab-version-v0012020

IRJET- Predictive Analysis and Healthcare of Diabetes
IRJET- Predictive Analysis and Healthcare of DiabetesIRJET- Predictive Analysis and Healthcare of Diabetes
IRJET- Predictive Analysis and Healthcare of DiabetesIRJET Journal
 
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESIRJET Journal
 
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...VMware Tanzu
 
Mr. Neil Hammerschmidt - USDA-APHIS IT Update
Mr. Neil Hammerschmidt - USDA-APHIS IT UpdateMr. Neil Hammerschmidt - USDA-APHIS IT Update
Mr. Neil Hammerschmidt - USDA-APHIS IT UpdateJohn Blue
 
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET Journal
 
IRJET- Advances in Data Mining: Healthcare Applications
IRJET- Advances in Data Mining: Healthcare ApplicationsIRJET- Advances in Data Mining: Healthcare Applications
IRJET- Advances in Data Mining: Healthcare ApplicationsIRJET Journal
 
IRJET- A Survey on Big Data Frameworks and Approaches in Health Care Sector
IRJET- A Survey on Big Data Frameworks and Approaches in Health Care SectorIRJET- A Survey on Big Data Frameworks and Approaches in Health Care Sector
IRJET- A Survey on Big Data Frameworks and Approaches in Health Care SectorIRJET Journal
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptxsainathk18
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
76 s201915
76 s20191576 s201915
76 s201915IJRAT
 
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPCIDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPCinside-BigData.com
 
IRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET Journal
 
Private Hidden Data for Health Care
Private Hidden Data for Health CarePrivate Hidden Data for Health Care
Private Hidden Data for Health CareIRJET Journal
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Pentaho
 
Shrink your DB and increase SAP BW performance
Shrink your DB and increase SAP BW performanceShrink your DB and increase SAP BW performance
Shrink your DB and increase SAP BW performanceDataVard
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET Journal
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC
 
HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES
HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES
HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES cscpconf
 

Similar to Data cloud-lab-version-v0012020 (20)

IRJET- Predictive Analysis and Healthcare of Diabetes
IRJET- Predictive Analysis and Healthcare of DiabetesIRJET- Predictive Analysis and Healthcare of Diabetes
IRJET- Predictive Analysis and Healthcare of Diabetes
 
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
 
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...
 
Mr. Neil Hammerschmidt - USDA-APHIS IT Update
Mr. Neil Hammerschmidt - USDA-APHIS IT UpdateMr. Neil Hammerschmidt - USDA-APHIS IT Update
Mr. Neil Hammerschmidt - USDA-APHIS IT Update
 
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
 
IRJET- Advances in Data Mining: Healthcare Applications
IRJET- Advances in Data Mining: Healthcare ApplicationsIRJET- Advances in Data Mining: Healthcare Applications
IRJET- Advances in Data Mining: Healthcare Applications
 
IRJET- A Survey on Big Data Frameworks and Approaches in Health Care Sector
IRJET- A Survey on Big Data Frameworks and Approaches in Health Care SectorIRJET- A Survey on Big Data Frameworks and Approaches in Health Care Sector
IRJET- A Survey on Big Data Frameworks and Approaches in Health Care Sector
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
76 s201915
76 s20191576 s201915
76 s201915
 
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPCIDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
 
IRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET- A Scenario on Big Data
IRJET- A Scenario on Big Data
 
Private Hidden Data for Health Care
Private Hidden Data for Health CarePrivate Hidden Data for Health Care
Private Hidden Data for Health Care
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Shrink your DB and increase SAP BW performance
Shrink your DB and increase SAP BW performanceShrink your DB and increase SAP BW performance
Shrink your DB and increase SAP BW performance
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data Mining
 
Innovative project1
Innovative project1Innovative project1
Innovative project1
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
 
HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES
HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES
HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES
 

Recently uploaded

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Recently uploaded (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Data cloud-lab-version-v0012020

  • 1. Big Data Processing Training, R&DPower by Data Cloud Lab [Bigdata isa fieldthattreats ways to analyze, systematically extract informationfrom,orotherwisedeal withdata sets that are too large or complex to be dealt with by traditional data-processing application software. Big data was originally associated with three key concepts: volume, variety, and velocity.]
  • 2. Data Set – 1M Data: 1. Healthcare_ [Record – 46935] 2. Weather-history - [Record – 4573] 3. World Demography - [Record – 5000] 4. Census Tracts 2010 - [Record -21 5. Animal_Services_Intake_Data - [Record -187594] 6. Average_Daily_Traffic_Counts - [Record -1280] 7. Acciental_Durg_Related_Death - [Record -5106] 8. Retails Store - [Record – 182728] customer12435,category_59,Departments_7,orders_68883,products_1345,order_items_99999 9. Popular_Baby_Names - [Record – 46935] 10. SAT__College_Board__2010_School_Level_Results - Total Data [Record -461] 11. Sales_Tax_Rates - [Record -1911] 12. Restaurants [Record -1328] 13. Transportation : 34_drivers , 17076_truck_event_text_partition , 1768_timesheet - [Record - 18878] 14. Acciental_Durg_Related_Death - [Record -5106] 15. Census Tracts 2010 - [Record -216] 16. Employees_Salary - [Record – 824] 17. Customer_transactional_spending - [Record – 60000] 18. Customer_Order - [Record – 1000] 19. Employees_Salary - [Record – 824]
  • 3. Power by: Software Linux, Hadoop Big Data, Hive & Power BI) Case Study 01: Healthcare [Record – 46935] Raw Data (Date, Sex, Diseases, Age) : 12/10/1950,M,Diabetes,78 12/10/1984,F,PCOS,67 712/11/1940,M,Fever,90 12/12/1950,F,Cold,88 12/13/1960,M,Blood Pressure,76 Result : Blood Pressure,5215 Cold,5215 Diabetes,5215 Fever,15645 Malaria,5215 PCOS,5215 Swine Flu,5215 Data Visualizations: Backend Data Process by HiveQL command: select diseases, count(*) from healthgroupby diseases; WARNING: Hive-on-MR is deprecated inHive2 and may not beavailableinthefuture versions. Considerusing a different execution engine(i.e. spark, tez) or using Hive 1.X releases. Query ID =hduser_20200125220715_338a065f-f176-4464-b03e-28fb18dc66f5 Total jobs =1 Launching Job 1 outof1 Number ofreducetasks not specified. Estimated frominputdata size: 1 In order to changethe average load for a reducer (inbytes): , set hive.exec.reducers.bytes.per.reducer=<number> In order to limitthemaximum number ofreducers: , sethive.exec.reducers.max=<number> In order to set a constant numberofreducers: , setmapreduce.job.reduces=<number> Job running in-process (localHadoop) , 2020-01-25 22:07:18,630Stage-1 map =100%, reduce=100% Ended Job =job_local171670995_0001, Moving data to localdirectory /home/hduser/Dataset MapReduceJobs Launched: , Stage-Stage-1: HDFS Read:2336322 HDFS Write: 0 SUCCESS, TotalMapReduce CPU TimeSpent:0 msec, OK Time taken: 3.617seconds