SlideShare une entreprise Scribd logo
1  sur  56
Jongwook Woo
HiPIC
CalStateLA
Marketing Analytics Research Society
(M.A.R.S.)
Oct 7 2020
Jongwook Woo, PhD, jwoo5@calstatela.edu
Big Data AI Center (BigDAI)
California State University Los Angeles
Introduction to Big Data and AI
for Business Analytics and Prediction
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Big Data AI Predictive Analysis
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself
Experience:
Since 2002, Professor at California State University Los Angeles
– PhD in 2001: Computer Science and Engineering at USC
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: S/W Development Lead
http://www.mobygames.com/game/windows/matrix-online/credits
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: CDH, Oracle using Hadoop Big Data
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Partners for Services
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Collaborations
SOFTZEN
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Big Data AI Predictive Analysis
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Issues
Large-Scale data
Tera-Byte (1012), Peta-byte (1015)
– Because of web
– IoT (Streaming data, Sensor Data) in SmartX
– Social Computing, smart phone, online game
– Bioinformatics, …
Legacy approach
 Can do
– Improve the speed of CPU
 Increase the storage size
 Only Problem
– Too expensive
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
Becomes too Expensive
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Issues
Large Scale Data
Too big
Non-/Semi-structured data
 3 Vs, 4 Vs,…
– Velocity, Volume, Variety
Traditional Systems can handle them
– But Again, Too expensive
Cannot handle with the legacy approach
Need new systems
Non-expensive
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Two Cores in Big Data
How to store Big Data
How to compute Big Data
Google
How to store Big Data
– GFS
– Distributed Systems on non-expensive commodity computers
How to compute Big Data
– MapReduce
– Parallel Computing with non-expensive computers
Own super computers
Published papers in 2003, 2004
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
But Works Well with the crazy massive data set
Battle of Nagashino,
1575, Japan
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
http://blog.naver.com/PostView.nhn?blogId=dosims&logNo=221127053677
AD 1409 (Year 9 of King Tae-Jong, Chosun Dynasty, Korea) By Choi family:
최해산(崔海山), 아버지 최무선(崔茂宣)
[Ref] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data
Big Data (Hadoop, Spark, Distributed Deep Learning)
Cluster for Compute and Store
(Distributed File Systems: HDFS, GFS)
…
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Super Computer vs Big Data vs Cloud
Traditional Super Computer
(Parallel File Systems: Lustre, PVFS, GPFS)
Cluster for Store
Big Data (Hadoop, Spark, Distributed Deep Learning)
Cluster for Compute and Store
(Distributed File Systems: HDFS, GFS)
However, Cloud Computing adopts
this separated architecture:
with High Speed N/W (> 10Gbps)
and Object Storage
Cluster for Compute
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Definition: Big Data
Non-expensive platform, which is distributed parallel computing
systems and that can store a large scale data and process it in
parallel [1, 2]
 Apache Hadoop
– Non-expensive Super Computer
– More public than the traditional super computers
• Anyone can own super computer as open source
– In your university labs, small companies, research centers
Other solutions with storage and computing services
– Spark
• mostly integrated into Hadoop with Hadoop community
– NoSQL DB (Cassandra, MongoDB, Redis, Hbase,…)
– ElasticSearch
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
What is Hadoop?
20
 Apache Hadoop Project in
Jan, 2006 split from Nutch
 Hadoop Founder:
o Doug Cutting
 Apache Committer:
Lucene, Nutch, …
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data: Linearly Scalable
 Some people questions that the system to handle 1 ~ 3GB of
data set is not Big Data
Well…. add more servers as more data in the future in Big Data platform
– it is linearly scalable once built
– n time more computing power ideally
Data Size: < 3 GB Data Size: 200 TB >
Add n
servers
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Data Analysis & Visualization
Sentiment Map of Alphago
Positive
Negative
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
K-Election 2017
(April 29 – May 9)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Businesses popular in 5 miles of CalStateLA,
USC , UCLA
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Jams and other traffic incidents reported
by users in Dec 2017 – Jan 2018:
(Dalyapraz Dauletbak)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Big Data AI Predictive Analysis
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Analysis and Prediction
Big Data Analysis
Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,..
Big Data for Data Analysis
– How to store, compute, analyze massive dataset?
Big Data Science
How to predict the future trend and pattern with the massive
dataset? => Machine Learning
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark
 Parallel Computing Engine
 Spark by UC Berkley AMP Lab
 Started by Matei Zaharia in 2009,
– and open sourced in 2010
In-Memory storage for intermediate data
 20 ~ 100 times faster than
– MapReduce
Good in Machine Learning => Big Data Science
– Iterative algorithms
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark (Cont’d)
Spark ML
Supports Machine Learning libraries
Process massive data set to build prediction models
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
 Machine Learning
 Has been popular since Google Tensorflow, Nov 9 2015
 Multiple Cores in GPU
– Even with multiple GPUs and CPUs
 Parallel Computing in a chip
 GPU (Nvidia GTX 1660 Ti)
 1280 CUDA cores
 Other Deep Learning Libraries
 Tensor Flow
 PyTorch
 Keras
 Caffe, Caffe2
 Microsoft Cognitive Toolkit (Previously CNTK)
 Apache Mxnet
 DeepLearning4j
 …
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
From Neural Networks to Deep Learning
Deep learning – Different types of architectures
Generative Adversarial Networks (GAN)
Convolutional Neural Networks (CNN)
Neural Networks (NN)
7 © 2017 SAP SE or an SAP affiliate company. All rights
reserved. ǀ PUBLIC
Recurrent Neural Networks (RNN) &
Long-Short Term Memory (LSTM)
Ref: SAP Enterprise Deep Learning with TensorFlow
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
CNN
Image Recognition
Video Analysis
 NLP for classification, Prediction
RNN
Time Series Prediction
Speech Recognition/Synthesis
Image/Video Captioning
Text Analysis
– Conversation Q&A
GAN
 Media Generation
– Photo Realistic Images
Human Image Synthesis: Fake faces
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Scale Driving: Deep Learning Process
Deep Learning and Massive Data [3]
“Machine Learning Yearning” Andrew Ng 2016
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep learning experts
The
Chasm
Big Data Engineers, Scientists, Analysts, etc.
Another Gap between Deep Learning and Big Data
Communities [6]
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster
 Existing Big Data cluster with massive data set without using
Big Data
Too slow in data
migration and
single server fails
Single GPU
server for Deep
Learning?
Single server for
Python and R
Traditional
Machine Learning?
Big Data Cluster
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
What if we combine Deep Learning and Spark?
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster
 Existing Big Data cluster
Big Data Engineering
Big Data Analysis
Big Data Science
Distributed Deep Learning
– Integrate Deep Learning to the cluster
Not needs data migration and can leverage the
parallel computing and existing large scale data
Big Data Cluster
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
Deep Learning Pipelines for Apache Spark
Databricks
TensorFlowOnSpark
Yahoo! Inc
BigDL (Distributed Deep Learning Library for Apache Spark)
Intel
DL4J (Deeplearning4j On Spark)
Skymind
Distributed Deep Learning with Keras & Spark
Elephas
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Big Data AI Predictive Analysis: Use Case
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
COVID 19 Dashboard
https://www.calstatela.edu/centers/hipic/covid-19-us-ca-confirmed-prediction
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Financial Data Set
Priyanka Purushu, Jongwook Woo, "Financial Fraud Detection
adopting Distributed Deep Learning in Big Data",
KSII The 15th Asia Pacific International Conference on Information Science
and Technology (APIC-IST) 2020, July 5 -7 2020, Seoul, Korea, pp271-273,
ISSN 2093-0542
No public available datasets on financial services
 private nature of financial transactions
– specially in the mobile money transactions domain
 PaySim
URL: https://www.kaggle.com/ntnu-testimon/paysim1
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Financial Data Set (Cont‘d)
Size: 470 MB
6,362,620 records
Not that large scale data comparing to data set > GB
But the Big Data architecture can be applicable to much bigger data set
– As it still adopt Spark Computing Engine in Big Data
Attributes: 11
Predictive Analysis
The target column to predict fraud :
– ‘isFraud’
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Understanding
Numeric attributes:
amount, oldbalanceOrg, newbalanceOrg, oldbalanceDest, newbalanceDest
Categorical attributes:
step, type, isFraud, isFlaggedFraud
String attributes:
 nameOrig, nameDest
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Comparing Spark ML and DDL for fraud detection
Spark ML algorithms
DT (Decision Tree)
RF (Random Forest)
– Performance
• 53 minutes
• Best in Precision: 0.959
LR (Linear Regression): Fastest 24 minutes
DDL: Distributed Deep Learning in Spark
 Forward Feed (FF)
– a neural network system
– Performance
• 51 minutes
• Best in Recall: 0.938
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Performance
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Accuracy and Performance
Model Precision Recall Computing
Time (mins)
DT 0.946 0.889 29
RF 0.959 0.909 53
LR 0.902 0.655 24
FF 0.880 0.938 51
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
AWS Review Dataset
 Monika Mishra, Mingoo Kang, Jongwook Woo, “Rating Prediction using Deep
Learning and Spark”,
 The 11th International Conference on Internet (ICONI 2019), pp307-310, Dec 15-18 2019,
Hanoi, Vietnam
 Predictive Analysis
 Prediction of Users’ ratings
– important measures for purchase and selling
 Spark ML: ALS (Alternating Least Squares) algorithm
 DDL (Distributed Deep Learning): Neural Collaborative Filtering(NCF)
 Dataset : - https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt
 Products reviewed between 2005 and 2015 are analyzed
 Total product reviews : 9.57 million
 File Size : 5.26 GB
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Performance
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Mean Absolute Error
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
What To Do?
Predictive Analysis
Big Data Analyst & Scientist
– Learn the domain of Marketing?
Marketing Experts
– Learn the cutting edge tech: machine learning, AI and Big Data technology?
Need Collaboration instead
Big Data AI
Domain Expert in Marketing
Have coffee and talk occasionally
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Big Data AI Predictive Analysis
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
Introduction to Big Data
Spark ML for Big Data Science
Distributed Deep Learning with Spark
DDL provides more accuracy with the similar performance by
leveraging the Big Data cluster
Collaboration and Coffee time Needed
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Precision vs Recall
True Positive (TP): Fraud? Yes it is
False Negative (FN): No fraud? but it is
False Positive (FP): Fraud? but it is not
 Precision
 TP / (TP + FP)
 Recall
 TP / (TP + FN)
 Ref: https://en.wikipedia.org/wiki/Precision_and_recall
Positive:
Event occurs
(Fraud)
Negative: Event
does not
Occur (non
Fraud)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
1. Priyanka Purushu, Niklas Melcher, Bhagyashree Bhagwat, Jongwook Woo, "Predictive Analysis of Financial
Fraud Detection using Azure and Spark ML", Asia Pacific Journal of Information Systems (APJIS),
VOL.28│NO.4│December 2018, pp308~319
2. Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with MapReduce”, Wiley
Interdisciplinary Reviews Data Mining and Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445-
452, ISSN 1942-4795
3. Jongwook Woo, “Big Data Trend and Open Data”, UKC 2016, Dallas, TX, Aug 12 2016
4. How to choose algorithms for Microsoft Azure Machine Learning, https://docs.microsoft.com/en-
us/azure/machine-learning/machine-learning-algorithm-choice
5. “Big Data Analysis using Spark for Collision Rate Near CalStateLA” , Manik Katyal, Parag Chhadva, Shubhra
Wahi & Jongwook Woo, https://globaljournals.org/GJCST_Volume16/1-Big-Data-Analysis-using-Spark.pdf
6. Spark Programming Guide: http://spark.apache.org/docs/latest/programming-guide.html
7. TensorFrames: Google Tensorflow on Apache Spark, https://www.slideshare.net/databricks/tensorframes-
google-tensorflow-on-apache-spark
8. Deep learning and Apache Spark, https://www.slideshare.net/QuantUniversity/deep-learning-and-apache-
spark
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
9. Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark,
https://www.slideshare.net/SparkSummit/which-is-deeper-comparison-of-deep-learning-frameworks-on-
spark
10. Accelerating Machine Learning and Deep Learning At Scale with Apache Spark,
https://www.slideshare.net/SparkSummit/accelerating-machine-learning-and-deep-learning-at-scalewith-
apache-spark-keynote-by-ziya-ma
11. Deep Learning with Apache Spark and TensorFlow, https://databricks.com/blog/2016/01/25/deep-
learning-with-apache-spark-and-tensorflow.html
12. Tensor Flow Deep Learning Open SAP
13. Overview of Smart Factory, https://www.slideshare.net/BrendanSheppard1/overview-of-smart-factory-
solutions-68137094/6
14. https://dzone.com/articles/sqoop-import-data-from-mysql-tohive
15. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data
16. https://blogs.msdn.microsoft.com/andreasderuiter/2015/02/09/performance-measures-in-azure-ml-
accuracy-precision-recall-and-f1-score/

Contenu connexe

Tendances

Tendances (20)

Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Data mining
Data miningData mining
Data mining
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Data science
Data scienceData science
Data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 

Similaire à Introduction to Big Data and AI for Business Analytics and Prediction

Big Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and TrainingBig Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and Training
Jongwook Woo
 

Similaire à Introduction to Big Data and AI for Business Analytics and Prediction (20)

Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Big Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingBig Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive Computing
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Benefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleBenefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycle
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on Networks
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use Cases
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
Big Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and TrainingBig Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and Training
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Big Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and TrainingBig Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and Training
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 

Plus de Jongwook Woo

Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
Jongwook Woo
 

Plus de Jongwook Woo (13)

Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum Computing
 
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeWhose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure ML
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 
Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data
 
Big Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using SparkBig Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using Spark
 
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
 
Introduction to Hadoop, Big Data, Training, Use Cases
Introduction to Hadoop, Big Data, Training, Use CasesIntroduction to Hadoop, Big Data, Training, Use Cases
Introduction to Hadoop, Big Data, Training, Use Cases
 
Introduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using HadoopIntroduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using Hadoop
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on Hadoop
 
2014 International Software Testing Conference in Seoul
2014 International Software Testing Conference in Seoul2014 International Software Testing Conference in Seoul
2014 International Software Testing Conference in Seoul
 

Dernier

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Introduction to Big Data and AI for Business Analytics and Prediction

  • 1. Jongwook Woo HiPIC CalStateLA Marketing Analytics Research Society (M.A.R.S.) Oct 7 2020 Jongwook Woo, PhD, jwoo5@calstatela.edu Big Data AI Center (BigDAI) California State University Los Angeles Introduction to Big Data and AI for Business Analytics and Prediction
  • 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data AI Predictive Analysis  Summary
  • 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself Experience: Since 2002, Professor at California State University Los Angeles – PhD in 2001: Computer Science and Engineering at USC
  • 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: S/W Development Lead http://www.mobygames.com/game/windows/matrix-online/credits
  • 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: CDH, Oracle using Hadoop Big Data
  • 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Partners for Services
  • 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Collaborations SOFTZEN
  • 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data AI Predictive Analysis  Summary
  • 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Issues Large-Scale data Tera-Byte (1012), Peta-byte (1015) – Because of web – IoT (Streaming data, Sensor Data) in SmartX – Social Computing, smart phone, online game – Bioinformatics, … Legacy approach  Can do – Improve the speed of CPU  Increase the storage size  Only Problem – Too expensive
  • 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Traditional Way
  • 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Traditional Way Becomes too Expensive
  • 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Issues Large Scale Data Too big Non-/Semi-structured data  3 Vs, 4 Vs,… – Velocity, Volume, Variety Traditional Systems can handle them – But Again, Too expensive Cannot handle with the legacy approach Need new systems Non-expensive
  • 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Two Cores in Big Data How to store Big Data How to compute Big Data Google How to store Big Data – GFS – Distributed Systems on non-expensive commodity computers How to compute Big Data – MapReduce – Parallel Computing with non-expensive computers Own super computers Published papers in 2003, 2004
  • 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Not Expensive From 2017 Korean Blockbuster Movie, “The Fortress” (남한산성)
  • 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way But Works Well with the crazy massive data set Battle of Nagashino, 1575, Japan
  • 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Not Expensive http://blog.naver.com/PostView.nhn?blogId=dosims&logNo=221127053677 AD 1409 (Year 9 of King Tae-Jong, Chosun Dynasty, Korea) By Choi family: 최해산(崔海山), 아버지 최무선(崔茂宣) [Ref] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
  • 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Big Data (Hadoop, Spark, Distributed Deep Learning) Cluster for Compute and Store (Distributed File Systems: HDFS, GFS) …
  • 18. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Super Computer vs Big Data vs Cloud Traditional Super Computer (Parallel File Systems: Lustre, PVFS, GPFS) Cluster for Store Big Data (Hadoop, Spark, Distributed Deep Learning) Cluster for Compute and Store (Distributed File Systems: HDFS, GFS) However, Cloud Computing adopts this separated architecture: with High Speed N/W (> 10Gbps) and Object Storage Cluster for Compute
  • 19. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Definition: Big Data Non-expensive platform, which is distributed parallel computing systems and that can store a large scale data and process it in parallel [1, 2]  Apache Hadoop – Non-expensive Super Computer – More public than the traditional super computers • Anyone can own super computer as open source – In your university labs, small companies, research centers Other solutions with storage and computing services – Spark • mostly integrated into Hadoop with Hadoop community – NoSQL DB (Cassandra, MongoDB, Redis, Hbase,…) – ElasticSearch
  • 20. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA What is Hadoop? 20  Apache Hadoop Project in Jan, 2006 split from Nutch  Hadoop Founder: o Doug Cutting  Apache Committer: Lucene, Nutch, …
  • 21. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data: Linearly Scalable  Some people questions that the system to handle 1 ~ 3GB of data set is not Big Data Well…. add more servers as more data in the future in Big Data platform – it is linearly scalable once built – n time more computing power ideally Data Size: < 3 GB Data Size: 200 TB > Add n servers
  • 22. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Data Analysis & Visualization Sentiment Map of Alphago Positive Negative
  • 23. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA K-Election 2017 (April 29 – May 9)
  • 24. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Businesses popular in 5 miles of CalStateLA, USC , UCLA
  • 25. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Jams and other traffic incidents reported by users in Dec 2017 – Jan 2018: (Dalyapraz Dauletbak)
  • 26. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data AI Predictive Analysis  Summary
  • 27. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Analysis and Prediction Big Data Analysis Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,.. Big Data for Data Analysis – How to store, compute, analyze massive dataset? Big Data Science How to predict the future trend and pattern with the massive dataset? => Machine Learning
  • 28. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark  Parallel Computing Engine  Spark by UC Berkley AMP Lab  Started by Matei Zaharia in 2009, – and open sourced in 2010 In-Memory storage for intermediate data  20 ~ 100 times faster than – MapReduce Good in Machine Learning => Big Data Science – Iterative algorithms
  • 29. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark (Cont’d) Spark ML Supports Machine Learning libraries Process massive data set to build prediction models
  • 30. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning  Machine Learning  Has been popular since Google Tensorflow, Nov 9 2015  Multiple Cores in GPU – Even with multiple GPUs and CPUs  Parallel Computing in a chip  GPU (Nvidia GTX 1660 Ti)  1280 CUDA cores  Other Deep Learning Libraries  Tensor Flow  PyTorch  Keras  Caffe, Caffe2  Microsoft Cognitive Toolkit (Previously CNTK)  Apache Mxnet  DeepLearning4j  …
  • 31. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA From Neural Networks to Deep Learning Deep learning – Different types of architectures Generative Adversarial Networks (GAN) Convolutional Neural Networks (CNN) Neural Networks (NN) 7 © 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Recurrent Neural Networks (RNN) & Long-Short Term Memory (LSTM) Ref: SAP Enterprise Deep Learning with TensorFlow
  • 32. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning CNN Image Recognition Video Analysis  NLP for classification, Prediction RNN Time Series Prediction Speech Recognition/Synthesis Image/Video Captioning Text Analysis – Conversation Q&A GAN  Media Generation – Photo Realistic Images Human Image Synthesis: Fake faces
  • 33. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Scale Driving: Deep Learning Process Deep Learning and Massive Data [3] “Machine Learning Yearning” Andrew Ng 2016
  • 34. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep learning experts The Chasm Big Data Engineers, Scientists, Analysts, etc. Another Gap between Deep Learning and Big Data Communities [6]
  • 35. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster  Existing Big Data cluster with massive data set without using Big Data Too slow in data migration and single server fails Single GPU server for Deep Learning? Single server for Python and R Traditional Machine Learning? Big Data Cluster
  • 36. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark What if we combine Deep Learning and Spark?
  • 37. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster  Existing Big Data cluster Big Data Engineering Big Data Analysis Big Data Science Distributed Deep Learning – Integrate Deep Learning to the cluster Not needs data migration and can leverage the parallel computing and existing large scale data Big Data Cluster
  • 38. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark Deep Learning Pipelines for Apache Spark Databricks TensorFlowOnSpark Yahoo! Inc BigDL (Distributed Deep Learning Library for Apache Spark) Intel DL4J (Deeplearning4j On Spark) Skymind Distributed Deep Learning with Keras & Spark Elephas
  • 39. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data AI Predictive Analysis: Use Case  Summary
  • 40. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA COVID 19 Dashboard https://www.calstatela.edu/centers/hipic/covid-19-us-ca-confirmed-prediction
  • 41. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Financial Data Set Priyanka Purushu, Jongwook Woo, "Financial Fraud Detection adopting Distributed Deep Learning in Big Data", KSII The 15th Asia Pacific International Conference on Information Science and Technology (APIC-IST) 2020, July 5 -7 2020, Seoul, Korea, pp271-273, ISSN 2093-0542 No public available datasets on financial services  private nature of financial transactions – specially in the mobile money transactions domain  PaySim URL: https://www.kaggle.com/ntnu-testimon/paysim1
  • 42. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Financial Data Set (Cont‘d) Size: 470 MB 6,362,620 records Not that large scale data comparing to data set > GB But the Big Data architecture can be applicable to much bigger data set – As it still adopt Spark Computing Engine in Big Data Attributes: 11 Predictive Analysis The target column to predict fraud : – ‘isFraud’
  • 43. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Understanding Numeric attributes: amount, oldbalanceOrg, newbalanceOrg, oldbalanceDest, newbalanceDest Categorical attributes: step, type, isFraud, isFlaggedFraud String attributes:  nameOrig, nameDest
  • 44. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Comparing Spark ML and DDL for fraud detection Spark ML algorithms DT (Decision Tree) RF (Random Forest) – Performance • 53 minutes • Best in Precision: 0.959 LR (Linear Regression): Fastest 24 minutes DDL: Distributed Deep Learning in Spark  Forward Feed (FF) – a neural network system – Performance • 51 minutes • Best in Recall: 0.938
  • 45. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary: Performance
  • 46. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary: Accuracy and Performance Model Precision Recall Computing Time (mins) DT 0.946 0.889 29 RF 0.959 0.909 53 LR 0.902 0.655 24 FF 0.880 0.938 51
  • 47. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA AWS Review Dataset  Monika Mishra, Mingoo Kang, Jongwook Woo, “Rating Prediction using Deep Learning and Spark”,  The 11th International Conference on Internet (ICONI 2019), pp307-310, Dec 15-18 2019, Hanoi, Vietnam  Predictive Analysis  Prediction of Users’ ratings – important measures for purchase and selling  Spark ML: ALS (Alternating Least Squares) algorithm  DDL (Distributed Deep Learning): Neural Collaborative Filtering(NCF)  Dataset : - https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt  Products reviewed between 2005 and 2015 are analyzed  Total product reviews : 9.57 million  File Size : 5.26 GB
  • 48. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary: Performance
  • 49. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary: Mean Absolute Error
  • 50. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA What To Do? Predictive Analysis Big Data Analyst & Scientist – Learn the domain of Marketing? Marketing Experts – Learn the cutting edge tech: machine learning, AI and Big Data technology? Need Collaboration instead Big Data AI Domain Expert in Marketing Have coffee and talk occasionally
  • 51. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Big Data AI Predictive Analysis  Summary
  • 52. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary Introduction to Big Data Spark ML for Big Data Science Distributed Deep Learning with Spark DDL provides more accuracy with the similar performance by leveraging the Big Data cluster Collaboration and Coffee time Needed
  • 53. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Questions?
  • 54. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Precision vs Recall True Positive (TP): Fraud? Yes it is False Negative (FN): No fraud? but it is False Positive (FP): Fraud? but it is not  Precision  TP / (TP + FP)  Recall  TP / (TP + FN)  Ref: https://en.wikipedia.org/wiki/Precision_and_recall Positive: Event occurs (Fraud) Negative: Event does not Occur (non Fraud)
  • 55. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 1. Priyanka Purushu, Niklas Melcher, Bhagyashree Bhagwat, Jongwook Woo, "Predictive Analysis of Financial Fraud Detection using Azure and Spark ML", Asia Pacific Journal of Information Systems (APJIS), VOL.28│NO.4│December 2018, pp308~319 2. Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with MapReduce”, Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445- 452, ISSN 1942-4795 3. Jongwook Woo, “Big Data Trend and Open Data”, UKC 2016, Dallas, TX, Aug 12 2016 4. How to choose algorithms for Microsoft Azure Machine Learning, https://docs.microsoft.com/en- us/azure/machine-learning/machine-learning-algorithm-choice 5. “Big Data Analysis using Spark for Collision Rate Near CalStateLA” , Manik Katyal, Parag Chhadva, Shubhra Wahi & Jongwook Woo, https://globaljournals.org/GJCST_Volume16/1-Big-Data-Analysis-using-Spark.pdf 6. Spark Programming Guide: http://spark.apache.org/docs/latest/programming-guide.html 7. TensorFrames: Google Tensorflow on Apache Spark, https://www.slideshare.net/databricks/tensorframes- google-tensorflow-on-apache-spark 8. Deep learning and Apache Spark, https://www.slideshare.net/QuantUniversity/deep-learning-and-apache- spark
  • 56. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 9. Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark, https://www.slideshare.net/SparkSummit/which-is-deeper-comparison-of-deep-learning-frameworks-on- spark 10. Accelerating Machine Learning and Deep Learning At Scale with Apache Spark, https://www.slideshare.net/SparkSummit/accelerating-machine-learning-and-deep-learning-at-scalewith- apache-spark-keynote-by-ziya-ma 11. Deep Learning with Apache Spark and TensorFlow, https://databricks.com/blog/2016/01/25/deep- learning-with-apache-spark-and-tensorflow.html 12. Tensor Flow Deep Learning Open SAP 13. Overview of Smart Factory, https://www.slideshare.net/BrendanSheppard1/overview-of-smart-factory- solutions-68137094/6 14. https://dzone.com/articles/sqoop-import-data-from-mysql-tohive 15. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data 16. https://blogs.msdn.microsoft.com/andreasderuiter/2015/02/09/performance-measures-in-azure-ml- accuracy-precision-recall-and-f1-score/