SlideShare une entreprise Scribd logo
1  sur  78
Jongwook Woo
HiPIC
CalStateLA
Keimyung University
Dec 20 2019
Jongwook Woo, PhD, jwoo5@calstatela.edu
Big Data AI Center (BigDAI)
California State University Los Angeles
History and Trend of
Big Data and Deep Learning
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Deep Learning and Big Data
 Big Data Predictive Analysis
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself
Experience:
Since 2002, Professor at California State University Los Angeles
– PhD in 2001: Computer Science and Engineering at USC
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Universities in Los Angeles
West
North
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Universities in Los Angeles
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
California State University
Los Angeles
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: S/W Development Lead
http://www.mobygames.com/game/windows/matrix-online/credits
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Collaboration with HDP, CDH, Oracle, Amazon
using Hadoop Big Data
https://www.cloudera.com/more/customers/csula.html
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Partners for Services
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Collaborations
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Deep Learning and Big Data
 Big Data Predictive Analysis
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
New Technology: Big Data
What is Big Data? Data or Systems?
Large Scale Data?
–Many people only see the data point of view
–3 Vs, 5Vs
Systems?
– YES
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling Systems: Traditional Way
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
Becomes too Expensive
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
http://blog.naver.com/PostView.nhn?blogId=dosims&logNo=221127053677
1409년(태종 9) 최해산(崔海山), 아버지 최무선(崔茂宣)
[출처] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Issues
Cannot handle with the legacy approach
Too big
Non-/Semi-structured data
 3 Vs, 4 Vs,…
– Velocity, Volume, Variety
Traditional Systems can handle them
– But Again, Too expensive
Need new systems
Non-expensive
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Two Cores in Big Data
How to store Big Data
How to compute Big Data
Google
How to store Big Data
– GFS
– Distributed Systems on non-expensive commodity computers
How to compute Big Data
– MapReduce
– Parallel Computing with non-expensive computers
Own super computers
Published papers in 2003, 2004
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Super Computer vs Big Data vs Cloud
Traditional Super Computer
(Parallel File Systems: Lustre, PVFS, GPFS)
Cluster for Store
Big Data (Hadoop, Spark, Distributed Deep Learning)
Cluster for Compute and Store
(Distributed File Systems: HDFS, GFS)
However, Cloud Computing adopts
this separated architecture:
with High Speed N/W and Object
Storage
Cluster for Compute
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data: Hadoop
20
 Apache Hadoop Project in
Jan, 2006 split from Nutch
 Hadoop Founder:
o Doug Cutting
 Apache Committer:
Lucene, Nutch, …
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Definition: Big Data [W13]
Non-expensive platform that is distributed parallel systems and
that can store a large scale data and process it in parallel
Hadoop
– Non-expensive Super Computer
– More public than the traditional super computers
• You can store and process your applications
– In your university labs, small companies, research centers
Others with storage and computing services
– Spark
• normally integrated into Hadoop with Hadoop community
– NoSQL DB (Cassandra, MongoDB, Redis, Hbase,…)
– ElasticSearch
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data: Linearly Scalable
 Some people questions that the system to handle 1 ~ 3GB of
data set is not Big Data
Well…. add more servers as more data in the future in Big Data platform
– it is linearly scalable once built
– n time more computing power ideally
Data Size: < 3 GB Data Size: 200 TB >
Add n
servers
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Cluster
Are you ready for research now?
Large Scale Data Set with computing engine: ML, DS
Massive Data Set with
Computing Engines (Hadoop,
Spark)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experimental Results in AWS [PMBW18]
Execution times
Big Data Science
3 nodes:
–40min – 70mins
11 nodes
–10min – 20mins
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data is great for Any Small Business
Your data is the value and Big Data
 Customer data
 Operational data
You have your specific data
Big Company does not have a specific data as you have
Potentials
 Your customer data
– Smart marketing and Sales
– Advertisement
 Your operational data
– Efficient operation, For Example, Smart*:
• Smart Factory, Smart City
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Data Analysis & Visualization
Sentiment Map of Alphago
Positive
Negative
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
K-Election 2017
(April 29 – May 9)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
IoT of Smart Factory
28
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
IoT of Smart Factory (Cont’d)
29
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Businesses popular in 5 miles of CalStateLA,
USC , UCLA
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Jams and other traffic incidents reported
by users in Dec 2017 – Jan 2018: [DW19a]
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Deep Learning and Big Data
 Big Data Predictive Analysis
Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Analysis and Prediction
Big Data Analysis
Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,..
Big Data for Data Analysis
– How to store, compute, analyze massive dataset?
Big Data Science
How to predict the future trend and pattern with the massive
dataset? => Machine Learning
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark
 Limitation in MapReduce
 Hard to program in Java
 Batch Processing
– Not interactive
 Disk storage for intermediate data
– Performance issue
 Spark by UC Berkley AMP Lab
 Started by Matei Zaharia in 2009,
– and open sourced in 2010
In-Memory storage for intermediate data
 20 ~ 100 times faster than
– MapReduce
Good in Machine Learning => Big Data Science
– Iterative algorithms
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark (Cont’d)
Spark ML
Supports Machine Learning libraries
Process massive data set to build prediction models
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Analysis and Prediction Flow
Data Collection
Batch API: Yelp, Google
Streaming: Twitter, Apache
NiFi, Kafka, StereamSets,
Storm
Open Data: Government
Data Storage
HDFS, S3, Object Storage,
NoSQL DB (Couchbase)…
Data Filtering
Hive, Pig
Data Analysis and Science
Hive, Pig, Spark, Deep Learning,
BI Tools (Qlik, Tableau, …)
Data Visualization
Qlik, Excel PowerMap,
Tableau, Looker, …
- Engineering:
- Big Data Engineering
- Big Data Analysis
- Data Visualization
- Research
- Big Data Science Deep Learning
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Traditional Data Science
The Gap
Big Data Engineers, Scientists, Analysts, etc.
Gap between Traditional Data Science and Big Data
Communities
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster [MCSPBW19, DW19a]
 Existing Big Data cluster with massive data set with the
traditional ML
Issues and Solutions: Too
slow in large scale data
migration and single
server fails
Single server for
Python and R
Traditional Machine
Learning
Big Data Cluster
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
 Machine Learning
 Has been popular since Google Tensorflow
 Multiple Cores in GPU
– Even with multiple GPUs and CPUs
 Parallel Computing
 GPU (Nvidia GTX 1660 Ti)
 1280 CUDA cores
 Deep Learning Libraries
 Tensor Flow
 PyTorch
 Keras
 Caffe, Caffe2
 Microsoft Cognitive Toolkit (Previously CNTK)
 Apache Mxnet
 DeepLearning4j
 …
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
From Neural Networks to Deep Learning
Deep learning – Different types of architectures
Generative Adversarial Networks (GAN)
Convolutional Neural Networks (CNN)
Neural Networks (NN)
7 © 2017 SAP SE or an SAP affiliate company. All rights
reserved. ǀ PUBLIC
Recurrent Neural Networks (RNN) &
Long-Short Term Memory (LSTM)
Ref: SAP Enterprise Deep Learning with TensorFlow
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
CNN
Image Recognition
Video Analysis
 NLP for classification, Prediction
RNN
Time Series Prediction
Speech Recognition/Synthesis
Image/Video Captioning
Text Analysis
– Conversation Q&A
GAN
 Media Generation
– Photo Realistic Images
Human Image Synthesis: Fake faces
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Scale Driving: Deep Learning Process
Deep Learning and Massive Data [3]
“Machine Learning Yearning” Andrew Ng 2016
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep learning experts
The
Chasm
Big Data Engineers, Scientists, Analysts, etc.
Another Gap between Deep Learning and Big Data
Communities [6]
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster
 Existing Big Data cluster with massive data set without using
Big Data
Too slow in data
migration and
single server fails
Single GPU
server for Deep
Learning?
Single server for
Python and R
Traditional
Machine Learning?
Big Data Cluster
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
What if we combine Deep Learning and Spark?
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster
 Existing Big Data cluster
Big Data Engineering
Big Data Analysis
Big Data Science
Distributed Deep Learning
– Integrate Deep Learning to the cluster
Not needs data migration and can leverage the
parallel computing and existing large scale data
Big Data Cluster
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
Deep Learning Pipelines for Apache Spark
Databricks
TensorFlowOnSpark
Yahoo! Inc
BigDL (Distributed Deep Learning Library for Apache Spark)
Intel
DL4J (Deeplearning4j On Spark)
Skymind
Distributed Deep Learning with Keras & Spark
Elephas
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Prediction with DDL
DDL: Distributed Deep Learning
Tensor Flow
Distributed Training and Inference in Spark cluster
DDL
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark ML and DDL [MKW19]
Deep Learning in Spark cluster
Distributed Deep Learning
DDL
DDL lib
DDL lib
Deep Learning in Spark
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Deep Learning and Big Data
 Big Data Predictive Analysis
Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Azure ML Studio and Spark ML Result Comparison:
Ad Click Fraud Prediction, 7GB data [GLBW19]
TWO-CLASS
DECISION
JUNGLE
(AzureML)
TWO-CLASS
DECISION
FOREST
(AzureML)
DECISION
TREE
CLASSIFIER
(Databricks)
RANDOM
FOREST
CLASSIFIER
(Databricks)
DECISION TREE
CLASSIFIER
(Balanced
Sample Data,
Oracle)
RANDOM
FOREST
CLASSIFIER
(Balanced
Sample Data,
Oracle)
AUC 0.905 0.997 0.815 0.746 0.896 0.893
PRECISION 1.0 0.992 0.822 0.878 0.935 0.934
RECALL 0.001 0.902 0.633 0.495 0.807 0.800
TP 35 47,199 86,683 67,726 111,187 110,220
FP 0 377 18,727 9,408 7,712 7,791
TN 52,306 406,228 7,112,961 7,122,280 545,302 545,223
FN 406,605 5,142 50,074 69,031 26,604 27,571
Run Time 2 hrs 2-3 hrs 22 mins 50 mins 24 sec 2 mins
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Science: Transaction Data Fraud Detection
[PMBW18]
Model Area under
ROC
Precision Recall
DecisionTreeClassifier
RandomForestClassifier 0.909573
LogisticRegression
 Size: 470 MB (=> 718MB)
 6,362,620 records
 Not that large scale data comparing to data set > GB
 https://www.kaggle.com/ntnu-testimon/paysim1
3 models in Spark Cluster with different combinations of the
parameters
 Times taken: 1 hour with 3 Spark clsters
 In theory of Linear Scalability: 2 minutes with 30 Spark clsters
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experimental Results in AWS [PMBW18]
Execution times
3 nodes:
–40min – 70mins
11 nodes
–10min – 20mins
Shows Scalability
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Big Data Science in Smart * [DW19a]
 Traffic Data Analysis and Prediction using Big Data
Smart*
– Smart Things
• Data collected from Cellphone apps
– Traffic Data from the driver and the cell phone
Data source:
Navigation app traffic data set from LA City Department*
– Information reported by users – Alerts
information captured by user’s device – Jams
*Limited authorization to access the full datasets 100 GB + original;
– Adopted limited dataset to 9 days (Dec 31– Jan 8, 2018)
– ~2GB
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Introduction
Provide real-time directions and up-to-date information
Traffic
Accidents
Road closure
Weather hazards
Lurking police vehicles and etc.
We are going to find out:
Areas with high volume of traffic (geography)
Peak-hours
Density of Alerts and Incidents
Traffic volume by road types
Prediction of traffic jam
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Experiment Environment:
Traditional Systems and Big Data
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
H/W Specification
 Hadoop Spark Cluster
Number of nodes 6
OCPUs 12
CPU speed 2.2GHz
Memory 180 GB
Storage 682 GB
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Implementation Flow
Big Data Science and AI ML
Local Computer
Raw data
files (JSON)
Geo-Spatial
Visualization (3D map)
Dashboard for Analytics
Big Data Analysis:
Hadoop/Hive
Upload dataset to
HDFS
Parse JSON files using
Pandas
Create tables’ schema
Clean data
Create sample/summary
dataset for prediction and
visualization
Traditional Data
Science: Microsoft
Azure ML Studio
Upload sample dataset
Apply data
transformation
Split dataset for
training and scoring
Train model(s)
Evaluate model(s)
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Traffic Dashboard: Big Data Analysis
Peak
Peak
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Traffic Dashboard: Big Data Analysis (Cont’d)
Major areas of traffic are:
Downtown Los Angeles
Santa Monica
Hollywood
Freeway (highways)
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Video-Simulation of Traffic in LA (captured from users' devices)
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Video-Simulation of Traffic in LA (reported by app users)
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Features/columns in a dataset
location x, location y X and Y -coordinate of location
date_pst Pacific Time of the publication of traffic report
*date splits into month, day, hour, min, sec, weekday
speed driver’s captured speed in mph
length length of the traffic ahead in the route of user in meters
level jam level: 1 – 5
where (1: almost no jam) and (5: standstill jam)
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
MODEL Evaluation: Traditional Data Science
with Azure ML Studio
Model Accuracy Precision Recall AUC ROC
LR 0.662 0.662 1.0 0.571
BDT 0.805 0.832 0.884 0.868
DF 0.832 0.868 0.880 0.885
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Summary of Traffic Prediction with
Machine Learning
Model is based on sampled
dataset ~ 1M rows (100 MB):
Sampled using Spark as the data set
is 2GB
Best model - Decision Forest
Accuracy – 0.832
Precision - 0.868
Recall - 0.880
Area under the Curve – 0.885
Confusion Matrix
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Distributed Deep Learning in Big Data Cluster
[MKW19]
Predictive Analysis
Prediction of rating
– important measures for purchase and selling
Spark ML: ALS (Alternating Least Squares) algorithm
DDL (Distributed Deep Learning): Neural Collaborative Filtering (NCF)
Dataset : - https://s3.amazonaws.com/amazon-reviews-
pds/tsv/index.txt
Products reviewed between 2005 and 2015 are analyzed
Total product reviews : 9.57 million
File Size : 5.26 GB
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Summary: Performance
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Summary: Mean Absolute Error
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Training and Education
Emerging Technology every moment
IT companies lead the industry not university
How to catch up with?
– Training and Education
Company with new technology
Always deliver training
– Big Data
• Cloudera, Hortonworks
– AI Deep Learning
• Traditional Concept
– Stanford, UC Berkeley, edx, IBM, H2O
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Training (Cont’d)
Training by Company
 3 - 4days/Week
– $2,500 - $3,000
– Practical
• with theory + hands-on exercise
• Instructor paid well
• Employer send their engineers to learn the new technology in a few
weeks
Education in University
Need an instructor who knows the new technology
– Not easy
• IT companies lead the industry not university
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Trained but No Experience with bad management in Korea
Sang-Ryung Battle:
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Trained Well With Experience and Good management in Japan
Battle of Nagashino,
1575, Japan
Big Data AI Center (BigDAI / HiPIC)
Jongwook Woo
CalStateLA
Trained but No Experience with bad management in Korea (Cont’d)
Sang-Ryung Battle:
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
 Myself
 Introduction To Big Data
 Deep Learning and Big Data
 Big Data Predictive Analysis
 Summary
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
Introduction to Big Data
Definition in terms of platforms
Data and Predictive Analysis in Massive Data Set
Introduction to Deep Learning in Big Data
Distributed Deep Learning
Big Data Predictive Analysis
Big Data Science
Distributed Deep Learning
Education is important
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
1. [W13] Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with MapReduce”, Wiley
Interdisciplinary Reviews Data Mining and Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445-452, ISSN
1942-4795
2. [KCWW16] “Big Data Analysis using Spark for Collision Rate Near CalStateLA” , Manik Katyal, Parag Chhadva,
Shubhra Wahi & Jongwook Woo, https://globaljournals.org/GJCST_Volume16/1-Big-Data-Analysis-using-Spark.pdf
3. [PMBW18] Priyanka Purushu, Niklas Melcher, Bhagyashree Bhagwat, Jongwook Woo, "Predictive Analysis of
Financial Fraud Detection using Azure and Spark ML", Asia Pacific Journal of Information Systems (APJIS),
VOL.28│NO.4│December 2018, pp308~319
4. [MCSPBW19] Monika Mishra, Jaydeep Chopde, Maitri Shah, Pankti Parikh, Rakshith Chandan Babu, Jongwook Woo,
"Big Data Predictive Analysis of Amazon Product Review", KSII The 14th Asia Pacific International Conference on
Information Science and Technology (APIC-IST) 2019, pp141-147, ISSN 2093-0542
5. [GLBW19] Neha Gupta, Hai Anh Le, Maria Boldina, Jongwook Woo, "Predicting fraud of AD click using Traditional
and Spark ML", KSII The 14th Asia Pacific International Conference on Information Science and Technology (APIC-
IST) 2019, pp24-28, ISSN 2093-0542
6. [DW19a] Dalyapraz Dauletbak, Jongwook Woo, "Traffic Data Analysis and Prediction using Big Data", KSII The 14th
Asia Pacific International Conference on Information Science and Technology (APIC-IST) 2019, pp127-133, ISSN
2093-0542
7. [SW19] Ruchi Singh and Jongwook Woo, "Applications of Machine Learning Models on Yelp Data", Asia Pacific
Journal of Information Systems (APJIS), Vol.29, No.1, 2019, pp35-49, ISSN 2288-5404
Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
8. [MKW19] Monika Mishra, Mingoo Kang, Jongwook Woo, “Rating Prediction using Deep Learning and Spark”, The 11th
International Conference on Internet (ICONI 2019), Dec 15-18 2019, Hanoi, Vietnam
9. [DW19b] (Will be Published Dec 2019) Dalyapraz Dauletbak, Jongwook Woo, “Big Data Analysis and Prediction of Traffic in
Los Angeles”, in Transactions on Internet & Information Systems (TIIS)
10. Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark, https://www.slideshare.net/SparkSummit/which-
is-deeper-comparison-of-deep-learning-frameworks-on-spark
11. Accelerating Machine Learning and Deep Learning At Scale with Apache Spark,
https://www.slideshare.net/SparkSummit/accelerating-machine-learning-and-deep-learning-at-scalewith-apache-spark-
keynote-by-ziya-ma
12. Deep Learning with Apache Spark and TensorFlow, https://databricks.com/blog/2016/01/25/deep-learning-with-apache-
spark-and-tensorflow.html
13. Overview of Smart Factory, https://www.slideshare.net/BrendanSheppard1/overview-of-smart-factory-solutions-
68137094/6
14. TensorFrames: Google Tensorflow on Apache Spark, https://www.slideshare.net/databricks/tensorframes-google-
tensorflow-on-apache-spark
15. Deep learning and Apache Spark, https://www.slideshare.net/QuantUniversity/deep-learning-and-apache-spark

Contenu connexe

Tendances

Tendances (20)

The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Predictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial IntelligencePredictive Analytics - Big Data & Artificial Intelligence
Predictive Analytics - Big Data & Artificial Intelligence
 
Data mining
Data miningData mining
Data mining
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science
Data scienceData science
Data science
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Similaire à History and Trend of Big Data and Deep Learning

INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
ArmyTrilidiaDevegaSK
 

Similaire à History and Trend of Big Data and Deep Learning (20)

Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on Networks
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
On Big Data
On Big DataOn Big Data
On Big Data
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Big Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and TrainingBig Data and Data Intensive Computing: Education and Training
Big Data and Data Intensive Computing: Education and Training
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.ppt
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Big Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingBig Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive Computing
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 

Plus de Jongwook Woo

Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
Jongwook Woo
 

Plus de Jongwook Woo (14)

Machine Learning in Quantum Computing
Machine Learning in Quantum ComputingMachine Learning in Quantum Computing
Machine Learning in Quantum Computing
 
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon SungjaeWhose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
Whose tombs are so called Nakrang tombs in Pyungyang? By Moon Sungjae
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure ML
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and SparkAlphago vs Lee Se-Dol: Tweeter Analysis using Hadoop and Spark
Alphago vs Lee Se-Dol : Tweeter Analysis using Hadoop and Spark
 
Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data
 
Big Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using SparkBig Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using Spark
 
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
 
Introduction to Hadoop, Big Data, Training, Use Cases
Introduction to Hadoop, Big Data, Training, Use CasesIntroduction to Hadoop, Big Data, Training, Use Cases
Introduction to Hadoop, Big Data, Training, Use Cases
 
Introduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using HadoopIntroduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using Hadoop
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on Hadoop
 
2014 International Software Testing Conference in Seoul
2014 International Software Testing Conference in Seoul2014 International Software Testing Conference in Seoul
2014 International Software Testing Conference in Seoul
 

Dernier

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Dernier (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

History and Trend of Big Data and Deep Learning

  • 1. Jongwook Woo HiPIC CalStateLA Keimyung University Dec 20 2019 Jongwook Woo, PhD, jwoo5@calstatela.edu Big Data AI Center (BigDAI) California State University Los Angeles History and Trend of Big Data and Deep Learning
  • 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Deep Learning and Big Data  Big Data Predictive Analysis  Summary
  • 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself Experience: Since 2002, Professor at California State University Los Angeles – PhD in 2001: Computer Science and Engineering at USC
  • 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Universities in Los Angeles West North
  • 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Universities in Los Angeles
  • 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA California State University Los Angeles
  • 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: S/W Development Lead http://www.mobygames.com/game/windows/matrix-online/credits
  • 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Collaboration with HDP, CDH, Oracle, Amazon using Hadoop Big Data https://www.cloudera.com/more/customers/csula.html
  • 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Partners for Services
  • 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Collaborations
  • 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Deep Learning and Big Data  Big Data Predictive Analysis  Summary
  • 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA New Technology: Big Data What is Big Data? Data or Systems? Large Scale Data? –Many people only see the data point of view –3 Vs, 5Vs Systems? – YES
  • 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling Systems: Traditional Way
  • 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Traditional Way Becomes too Expensive
  • 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Not Expensive From 2017 Korean Blockbuster Movie, “The Fortress” (남한산성)
  • 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Not Expensive http://blog.naver.com/PostView.nhn?blogId=dosims&logNo=221127053677 1409년(태종 9) 최해산(崔海山), 아버지 최무선(崔茂宣) [출처] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
  • 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Issues Cannot handle with the legacy approach Too big Non-/Semi-structured data  3 Vs, 4 Vs,… – Velocity, Volume, Variety Traditional Systems can handle them – But Again, Too expensive Need new systems Non-expensive
  • 18. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Two Cores in Big Data How to store Big Data How to compute Big Data Google How to store Big Data – GFS – Distributed Systems on non-expensive commodity computers How to compute Big Data – MapReduce – Parallel Computing with non-expensive computers Own super computers Published papers in 2003, 2004
  • 19. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Super Computer vs Big Data vs Cloud Traditional Super Computer (Parallel File Systems: Lustre, PVFS, GPFS) Cluster for Store Big Data (Hadoop, Spark, Distributed Deep Learning) Cluster for Compute and Store (Distributed File Systems: HDFS, GFS) However, Cloud Computing adopts this separated architecture: with High Speed N/W and Object Storage Cluster for Compute
  • 20. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data: Hadoop 20  Apache Hadoop Project in Jan, 2006 split from Nutch  Hadoop Founder: o Doug Cutting  Apache Committer: Lucene, Nutch, …
  • 21. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Definition: Big Data [W13] Non-expensive platform that is distributed parallel systems and that can store a large scale data and process it in parallel Hadoop – Non-expensive Super Computer – More public than the traditional super computers • You can store and process your applications – In your university labs, small companies, research centers Others with storage and computing services – Spark • normally integrated into Hadoop with Hadoop community – NoSQL DB (Cassandra, MongoDB, Redis, Hbase,…) – ElasticSearch
  • 22. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data: Linearly Scalable  Some people questions that the system to handle 1 ~ 3GB of data set is not Big Data Well…. add more servers as more data in the future in Big Data platform – it is linearly scalable once built – n time more computing power ideally Data Size: < 3 GB Data Size: 200 TB > Add n servers
  • 23. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Cluster Are you ready for research now? Large Scale Data Set with computing engine: ML, DS Massive Data Set with Computing Engines (Hadoop, Spark)
  • 24. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experimental Results in AWS [PMBW18] Execution times Big Data Science 3 nodes: –40min – 70mins 11 nodes –10min – 20mins
  • 25. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data is great for Any Small Business Your data is the value and Big Data  Customer data  Operational data You have your specific data Big Company does not have a specific data as you have Potentials  Your customer data – Smart marketing and Sales – Advertisement  Your operational data – Efficient operation, For Example, Smart*: • Smart Factory, Smart City
  • 26. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Data Analysis & Visualization Sentiment Map of Alphago Positive Negative
  • 27. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA K-Election 2017 (April 29 – May 9)
  • 28. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA IoT of Smart Factory 28
  • 29. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA IoT of Smart Factory (Cont’d) 29
  • 30. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Businesses popular in 5 miles of CalStateLA, USC , UCLA
  • 31. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Jams and other traffic incidents reported by users in Dec 2017 – Jan 2018: [DW19a]
  • 32. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Deep Learning and Big Data  Big Data Predictive Analysis Summary
  • 33. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Analysis and Prediction Big Data Analysis Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,.. Big Data for Data Analysis – How to store, compute, analyze massive dataset? Big Data Science How to predict the future trend and pattern with the massive dataset? => Machine Learning
  • 34. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark  Limitation in MapReduce  Hard to program in Java  Batch Processing – Not interactive  Disk storage for intermediate data – Performance issue  Spark by UC Berkley AMP Lab  Started by Matei Zaharia in 2009, – and open sourced in 2010 In-Memory storage for intermediate data  20 ~ 100 times faster than – MapReduce Good in Machine Learning => Big Data Science – Iterative algorithms
  • 35. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark (Cont’d) Spark ML Supports Machine Learning libraries Process massive data set to build prediction models
  • 36. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Analysis and Prediction Flow Data Collection Batch API: Yelp, Google Streaming: Twitter, Apache NiFi, Kafka, StereamSets, Storm Open Data: Government Data Storage HDFS, S3, Object Storage, NoSQL DB (Couchbase)… Data Filtering Hive, Pig Data Analysis and Science Hive, Pig, Spark, Deep Learning, BI Tools (Qlik, Tableau, …) Data Visualization Qlik, Excel PowerMap, Tableau, Looker, … - Engineering: - Big Data Engineering - Big Data Analysis - Data Visualization - Research - Big Data Science Deep Learning
  • 37. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Traditional Data Science The Gap Big Data Engineers, Scientists, Analysts, etc. Gap between Traditional Data Science and Big Data Communities
  • 38. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster [MCSPBW19, DW19a]  Existing Big Data cluster with massive data set with the traditional ML Issues and Solutions: Too slow in large scale data migration and single server fails Single server for Python and R Traditional Machine Learning Big Data Cluster
  • 39. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning  Machine Learning  Has been popular since Google Tensorflow  Multiple Cores in GPU – Even with multiple GPUs and CPUs  Parallel Computing  GPU (Nvidia GTX 1660 Ti)  1280 CUDA cores  Deep Learning Libraries  Tensor Flow  PyTorch  Keras  Caffe, Caffe2  Microsoft Cognitive Toolkit (Previously CNTK)  Apache Mxnet  DeepLearning4j  …
  • 40. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA From Neural Networks to Deep Learning Deep learning – Different types of architectures Generative Adversarial Networks (GAN) Convolutional Neural Networks (CNN) Neural Networks (NN) 7 © 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Recurrent Neural Networks (RNN) & Long-Short Term Memory (LSTM) Ref: SAP Enterprise Deep Learning with TensorFlow
  • 41. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning CNN Image Recognition Video Analysis  NLP for classification, Prediction RNN Time Series Prediction Speech Recognition/Synthesis Image/Video Captioning Text Analysis – Conversation Q&A GAN  Media Generation – Photo Realistic Images Human Image Synthesis: Fake faces
  • 42. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Scale Driving: Deep Learning Process Deep Learning and Massive Data [3] “Machine Learning Yearning” Andrew Ng 2016
  • 43. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep learning experts The Chasm Big Data Engineers, Scientists, Analysts, etc. Another Gap between Deep Learning and Big Data Communities [6]
  • 44. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster  Existing Big Data cluster with massive data set without using Big Data Too slow in data migration and single server fails Single GPU server for Deep Learning? Single server for Python and R Traditional Machine Learning? Big Data Cluster
  • 45. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark What if we combine Deep Learning and Spark?
  • 46. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster  Existing Big Data cluster Big Data Engineering Big Data Analysis Big Data Science Distributed Deep Learning – Integrate Deep Learning to the cluster Not needs data migration and can leverage the parallel computing and existing large scale data Big Data Cluster
  • 47. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark Deep Learning Pipelines for Apache Spark Databricks TensorFlowOnSpark Yahoo! Inc BigDL (Distributed Deep Learning Library for Apache Spark) Intel DL4J (Deeplearning4j On Spark) Skymind Distributed Deep Learning with Keras & Spark Elephas
  • 48. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Prediction with DDL DDL: Distributed Deep Learning Tensor Flow Distributed Training and Inference in Spark cluster DDL
  • 49. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark ML and DDL [MKW19] Deep Learning in Spark cluster Distributed Deep Learning DDL DDL lib DDL lib Deep Learning in Spark
  • 50. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Deep Learning and Big Data  Big Data Predictive Analysis Summary
  • 51. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Azure ML Studio and Spark ML Result Comparison: Ad Click Fraud Prediction, 7GB data [GLBW19] TWO-CLASS DECISION JUNGLE (AzureML) TWO-CLASS DECISION FOREST (AzureML) DECISION TREE CLASSIFIER (Databricks) RANDOM FOREST CLASSIFIER (Databricks) DECISION TREE CLASSIFIER (Balanced Sample Data, Oracle) RANDOM FOREST CLASSIFIER (Balanced Sample Data, Oracle) AUC 0.905 0.997 0.815 0.746 0.896 0.893 PRECISION 1.0 0.992 0.822 0.878 0.935 0.934 RECALL 0.001 0.902 0.633 0.495 0.807 0.800 TP 35 47,199 86,683 67,726 111,187 110,220 FP 0 377 18,727 9,408 7,712 7,791 TN 52,306 406,228 7,112,961 7,122,280 545,302 545,223 FN 406,605 5,142 50,074 69,031 26,604 27,571 Run Time 2 hrs 2-3 hrs 22 mins 50 mins 24 sec 2 mins
  • 52. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Science: Transaction Data Fraud Detection [PMBW18] Model Area under ROC Precision Recall DecisionTreeClassifier RandomForestClassifier 0.909573 LogisticRegression  Size: 470 MB (=> 718MB)  6,362,620 records  Not that large scale data comparing to data set > GB  https://www.kaggle.com/ntnu-testimon/paysim1 3 models in Spark Cluster with different combinations of the parameters  Times taken: 1 hour with 3 Spark clsters  In theory of Linear Scalability: 2 minutes with 30 Spark clsters
  • 53. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experimental Results in AWS [PMBW18] Execution times 3 nodes: –40min – 70mins 11 nodes –10min – 20mins Shows Scalability
  • 54. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Big Data Science in Smart * [DW19a]  Traffic Data Analysis and Prediction using Big Data Smart* – Smart Things • Data collected from Cellphone apps – Traffic Data from the driver and the cell phone Data source: Navigation app traffic data set from LA City Department* – Information reported by users – Alerts information captured by user’s device – Jams *Limited authorization to access the full datasets 100 GB + original; – Adopted limited dataset to 9 days (Dec 31– Jan 8, 2018) – ~2GB
  • 55. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Introduction Provide real-time directions and up-to-date information Traffic Accidents Road closure Weather hazards Lurking police vehicles and etc. We are going to find out: Areas with high volume of traffic (geography) Peak-hours Density of Alerts and Incidents Traffic volume by road types Prediction of traffic jam
  • 56. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Experiment Environment: Traditional Systems and Big Data
  • 57. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA H/W Specification  Hadoop Spark Cluster Number of nodes 6 OCPUs 12 CPU speed 2.2GHz Memory 180 GB Storage 682 GB
  • 58. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Implementation Flow Big Data Science and AI ML Local Computer Raw data files (JSON) Geo-Spatial Visualization (3D map) Dashboard for Analytics Big Data Analysis: Hadoop/Hive Upload dataset to HDFS Parse JSON files using Pandas Create tables’ schema Clean data Create sample/summary dataset for prediction and visualization Traditional Data Science: Microsoft Azure ML Studio Upload sample dataset Apply data transformation Split dataset for training and scoring Train model(s) Evaluate model(s)
  • 59. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Traffic Dashboard: Big Data Analysis Peak Peak
  • 60. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Traffic Dashboard: Big Data Analysis (Cont’d) Major areas of traffic are: Downtown Los Angeles Santa Monica Hollywood Freeway (highways)
  • 61. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Video-Simulation of Traffic in LA (captured from users' devices)
  • 62. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Video-Simulation of Traffic in LA (reported by app users)
  • 63. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Features/columns in a dataset location x, location y X and Y -coordinate of location date_pst Pacific Time of the publication of traffic report *date splits into month, day, hour, min, sec, weekday speed driver’s captured speed in mph length length of the traffic ahead in the route of user in meters level jam level: 1 – 5 where (1: almost no jam) and (5: standstill jam)
  • 64. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA MODEL Evaluation: Traditional Data Science with Azure ML Studio Model Accuracy Precision Recall AUC ROC LR 0.662 0.662 1.0 0.571 BDT 0.805 0.832 0.884 0.868 DF 0.832 0.868 0.880 0.885
  • 65. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Summary of Traffic Prediction with Machine Learning Model is based on sampled dataset ~ 1M rows (100 MB): Sampled using Spark as the data set is 2GB Best model - Decision Forest Accuracy – 0.832 Precision - 0.868 Recall - 0.880 Area under the Curve – 0.885 Confusion Matrix
  • 66. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Distributed Deep Learning in Big Data Cluster [MKW19] Predictive Analysis Prediction of rating – important measures for purchase and selling Spark ML: ALS (Alternating Least Squares) algorithm DDL (Distributed Deep Learning): Neural Collaborative Filtering (NCF) Dataset : - https://s3.amazonaws.com/amazon-reviews- pds/tsv/index.txt Products reviewed between 2005 and 2015 are analyzed Total product reviews : 9.57 million File Size : 5.26 GB
  • 67. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Summary: Performance
  • 68. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Summary: Mean Absolute Error
  • 69. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Training and Education Emerging Technology every moment IT companies lead the industry not university How to catch up with? – Training and Education Company with new technology Always deliver training – Big Data • Cloudera, Hortonworks – AI Deep Learning • Traditional Concept – Stanford, UC Berkeley, edx, IBM, H2O
  • 70. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Training (Cont’d) Training by Company  3 - 4days/Week – $2,500 - $3,000 – Practical • with theory + hands-on exercise • Instructor paid well • Employer send their engineers to learn the new technology in a few weeks Education in University Need an instructor who knows the new technology – Not easy • IT companies lead the industry not university
  • 71. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Trained but No Experience with bad management in Korea Sang-Ryung Battle: From 2017 Korean Blockbuster Movie, “The Fortress” (남한산성)
  • 72. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Trained Well With Experience and Good management in Japan Battle of Nagashino, 1575, Japan
  • 73. Big Data AI Center (BigDAI / HiPIC) Jongwook Woo CalStateLA Trained but No Experience with bad management in Korea (Cont’d) Sang-Ryung Battle: From 2017 Korean Blockbuster Movie, “The Fortress” (남한산성)
  • 74. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Deep Learning and Big Data  Big Data Predictive Analysis  Summary
  • 75. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary Introduction to Big Data Definition in terms of platforms Data and Predictive Analysis in Massive Data Set Introduction to Deep Learning in Big Data Distributed Deep Learning Big Data Predictive Analysis Big Data Science Distributed Deep Learning Education is important
  • 76. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Questions?
  • 77. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 1. [W13] Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with MapReduce”, Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445-452, ISSN 1942-4795 2. [KCWW16] “Big Data Analysis using Spark for Collision Rate Near CalStateLA” , Manik Katyal, Parag Chhadva, Shubhra Wahi & Jongwook Woo, https://globaljournals.org/GJCST_Volume16/1-Big-Data-Analysis-using-Spark.pdf 3. [PMBW18] Priyanka Purushu, Niklas Melcher, Bhagyashree Bhagwat, Jongwook Woo, "Predictive Analysis of Financial Fraud Detection using Azure and Spark ML", Asia Pacific Journal of Information Systems (APJIS), VOL.28│NO.4│December 2018, pp308~319 4. [MCSPBW19] Monika Mishra, Jaydeep Chopde, Maitri Shah, Pankti Parikh, Rakshith Chandan Babu, Jongwook Woo, "Big Data Predictive Analysis of Amazon Product Review", KSII The 14th Asia Pacific International Conference on Information Science and Technology (APIC-IST) 2019, pp141-147, ISSN 2093-0542 5. [GLBW19] Neha Gupta, Hai Anh Le, Maria Boldina, Jongwook Woo, "Predicting fraud of AD click using Traditional and Spark ML", KSII The 14th Asia Pacific International Conference on Information Science and Technology (APIC- IST) 2019, pp24-28, ISSN 2093-0542 6. [DW19a] Dalyapraz Dauletbak, Jongwook Woo, "Traffic Data Analysis and Prediction using Big Data", KSII The 14th Asia Pacific International Conference on Information Science and Technology (APIC-IST) 2019, pp127-133, ISSN 2093-0542 7. [SW19] Ruchi Singh and Jongwook Woo, "Applications of Machine Learning Models on Yelp Data", Asia Pacific Journal of Information Systems (APJIS), Vol.29, No.1, 2019, pp35-49, ISSN 2288-5404
  • 78. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 8. [MKW19] Monika Mishra, Mingoo Kang, Jongwook Woo, “Rating Prediction using Deep Learning and Spark”, The 11th International Conference on Internet (ICONI 2019), Dec 15-18 2019, Hanoi, Vietnam 9. [DW19b] (Will be Published Dec 2019) Dalyapraz Dauletbak, Jongwook Woo, “Big Data Analysis and Prediction of Traffic in Los Angeles”, in Transactions on Internet & Information Systems (TIIS) 10. Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark, https://www.slideshare.net/SparkSummit/which- is-deeper-comparison-of-deep-learning-frameworks-on-spark 11. Accelerating Machine Learning and Deep Learning At Scale with Apache Spark, https://www.slideshare.net/SparkSummit/accelerating-machine-learning-and-deep-learning-at-scalewith-apache-spark- keynote-by-ziya-ma 12. Deep Learning with Apache Spark and TensorFlow, https://databricks.com/blog/2016/01/25/deep-learning-with-apache- spark-and-tensorflow.html 13. Overview of Smart Factory, https://www.slideshare.net/BrendanSheppard1/overview-of-smart-factory-solutions- 68137094/6 14. TensorFrames: Google Tensorflow on Apache Spark, https://www.slideshare.net/databricks/tensorframes-google- tensorflow-on-apache-spark 15. Deep learning and Apache Spark, https://www.slideshare.net/QuantUniversity/deep-learning-and-apache-spark