This document discusses Jongwook Woo's work with Big Data AI at CalStateLA. It introduces Woo and his background, provides an overview of big data and how distributed systems enable scalable analysis of massive datasets. It also describes predictive analytics using machine learning and deep learning on big data, and how integrating GPUs into big data clusters can improve parallel processing for tasks like traffic analysis.
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Scalable Predictive Analysis and The Trend with Big Data & AI
1. Jongwook Woo
BigDAI
CalStateLA
School of Business
Yonsei University
Oct 5 2021, Korea
Jongwook Woo, PhD, jwoo5@calstatela.edu
Big Data AI Center (BigDAI)
California State University Los Angeles
Scalable Predictive Analysis
and The Trend with Big Data & AI
2. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Scalable Business Intelligence
Predictive Analysis with Big Data AI
Summary
3. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself
Experience:
Since 2002, Professor at Dept. of IS, California State University Los Angeles
– PhD in 2001: Computer Science and Engineering at USC
4. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: S/W Development Lead
http://www.mobygames.com/game/windows/matrix-online/credits
5. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Partners for Services
6. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: Collaborations
SOFTZEN
7. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Collaboration with NVidia, Databricks, Oracle,
Amazon, CDH using Big Data AI
https://www.cloudera.com/more/customers/csula.html
8. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Scalable Business Intelligence
Predictive Analysis with Big Data AI
Summary
9. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Issues
Large-Scale data
Tera-Byte (1012), Peta-byte (1015)
– Because of web
– IoT (Streaming data, Sensor Data) in SmartX
– Social Computing, smart phone, online game
– Bioinformatics, …
Legacy approach
Can handle the massive data set
– Increase the storage size
– Improve the speed of CPU
Only Problem
– Too expensive
10. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
11. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
Becomes too Expensive
12. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Definition
What is Big Data? Data or Systems?
Data view: Large Scale Data?
–3 Vs, 5Vs
• Velocity, Volume, Variety
–Many people only see the data point of view
• Nothing new
Systems View:
– YES, new systems for large scale data
• Non-expensive
13. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Two Cores in Big Data
How to store Big Data
How to compute Big Data
Google
How to store Big Data
– GFS
– Distributed Systems on non-expensive commodity computers
How to compute Big Data
– MapReduce
– Parallel Computing with non-expensive computers
Own super computers
Published papers in 2003, 2004
14. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
But Works Well with the crazy massive data set
Battle of Nagashino,
1575, Japan
15. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
16. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
http://blog.naver.com/PostView.nhn?blogId=dosims&logNo=221127053677
AD 1409 (Year 9 of King Tae-Jong, Chosun Dynasty, Korea) By Choi family:
최해산(崔海山), 아버지 최무선(崔茂宣)
[Ref] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
17. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Solution: Large Scale Data
Big Data:
Non-expensive platform, which is distributed parallel computing
systems and that can store a large-scale data and process it in
parallel
Apache Hadoop and Spark since 2006
– Non-expensive Super Computer
– Any small companies or university labs can own it
18. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data
Big Data (Hadoop, Spark, Distributed Deep Learning)
Cluster for Compute and Store
(Distributed File Systems: HDFS, GFS)
…
19. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data: Linearly Scalable
Some people questions that the system to handle 1 ~ 3GB of
data set is not Big Data
Well…. add more servers as more data in the future in Big Data platform
– it is linearly scalable once built
– n time more computing power ideally
Data Size: < 3 GB Data Size: 200 TB >
Add n
servers
20. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Scalable Business Intelligence
Predictive Analysis with Big Data AI
Summary
21. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data is great for everyone: University labs and
Small Business, etc
Big Data Analysis
Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,..
Big Data for Data Analysis
– How to store, compute, analyze massive dataset?
You have your specific data
Big Company does not have a specific data as you have
Your Business data is the value
– Customer data
– Operational data
22. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Analysis and Prediction Flow
Data Collection
Batch API: Yelp,
Google
Streaming: Twitter,
Apache NiFi, Kafka,
Storm
Open Data:
Government
Data Storage
HDFS, S3, Object Storage,
NoSQL DB (Couchbase)…
Data Filtering
Hive, Pig
Data Analysis and Science
Hive, Pig, Spark, BI Tools
(Datameer, Qlik, …)
Data Visualization
Qlik, Datameer, Excel
PowerView
Big Data Engineering
Big Data Analysis
Big Data Science
23. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Data Analysis & Visualization
Sentiment Map of Alphago
Positive
Negative
24. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
K-Election 2017
(April 29 – May 9)
25. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Businesses popular in 5 miles of CalStateLA,
USC , UCLA
26. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Jams and other traffic incidents reported
by users in Dec 2017 – Jan 2018:
(Dalyapraz Dauletbak)
27. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
COVID 19 Dashboard
https://www.calstatela.edu/centers/hipic/covid-19-us-ca-confirmed-prediction
28. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Scalable Business Intelligence
Predictive Analysis with Big Data AI
Summary
29. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Prediction
Big Data Science
How to predict the future trend and pattern with the massive
dataset?
=> Machine Learning
Deep
Learning
Machine
Learning
AI
30. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark
Limitation in MapReduce computing
Hard to program in Java
Batch Processing
– Not interactive
Disk storage for intermediate data
– Performance issue for Machine Learning
31. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark (Cont’d)
Spark by UC Berkley AMP Lab
Started by Matei Zaharia in 2009,
– and open sourced in 2010
In-Memory storage for intermediate data
20 ~ 100 times faster than
– MapReduce
Good in Machine Learning => Big Data Science
– Iterative algorithms
Spark ML
Supports Machine Learning libraries
Process massive data set to build prediction models
32. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
Machine Learning
Has been popular since Google Tensorflow, Nov 9 2015
Multiple Cores in GPU
– Even with multiple GPUs and CPUs
Parallel Computing in a chip
GPU (Nvidia GTX 1660 Ti)
1280 CUDA cores
Other Deep Learning Libraries
Tensor Flow with Keras
PyTorch by Facebook
Apache Mxnet
Caffe, Caffe2
Microsoft Cognitive Toolkit (Previously CNTK)
DeepLearning4j
…
34. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning
CNN
Image Recognition
Video Analysis
NLP for classification, Prediction
RNN
Time Series Prediction
Text Analysis
– Conversation Q&A
Image/Video Captioning
Speech Recognition/Synthesis
GAN
Media Generation
– Photo Realistic Images
Human Image Synthesis: Fake faces
35. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Scale Driving: Deep Learning Process
Deep Learning and Massive Data [3]
“Machine Learning Yearning” Andrew Ng 2016
36. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep learning experts
The
Chasm
Big Data Engineers, Scientists, Analysts, etc.
Another Gap between Deep Learning and Big Data
Communities [6]
37. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster
Existing Big Data cluster with massive data set without using
Big Data
Too slow in data
migration and
single server fails
Single GPU
server for Deep
Learning?
Single server for
Python and R
Traditional
Machine Learning?
Big Data Cluster
38. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
What if we combine Deep Learning and Spark?
39. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster with Deep Learning
Existing Big Data cluster
Big Data Engineering
Big Data Analysis
Big Data Science
Distributed Deep Learning
– Integrate Deep Learning to the cluster
Not needs data migration and can leverage the
parallel computing and existing large scale data
Big Data Cluster
40. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep Learning with Spark
Deep Learning Pipelines for Apache Spark
Databricks
BigDL (Distributed Deep Learning Library for Apache Spark)
Intel
TensorFlowOnSpark
Yahoo! Inc
DL4J (Deeplearning4j On Spark)
Skymind
Distributed Deep Learning with Keras & Spark
Elephas
41. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark ML and DDL [2-5]
Deep Learning in Spark cluster
Distributed Deep Learning
DDL
DDL lib
DDL lib
Deep Learning in Spark
42. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Scalable Business Intelligence
Predictive Analysis with Big Data AI
Big Data using GPU
Summary
43. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster with GPU
What if we use GPU for Big Data Cluster?
44. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Again: Big Data Cluster with GPU
Existing Big Data cluster with massive data set
Too slow in data
migration and
single server fails
Single GPU
server for
Machine (Deep)
Learning?
Big Data Cluster
45. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster with GPU
Big Data Cluster: Unified Analytics Platform
Already built in the site
– and matured for Data Engineering, Data Analysis, Data Science
Can we use the existing Big Data cluster with GPU?
– Can we integrate GPU to this Big Data Cluster?
NVidia
RAPIDS and Spark
46. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Distributed Parallel Computing using RAPIDS
RAPIDS:
Parallel Machine Learning (ML) on GPU
RAPIDS + Spark:
Distributed Parallel ML in Big Data
– XGBoost:
(+) machine learning not deep learning
(+) Leveraging Big Data
No bottleneck for large scale data
47. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Parallel Computing with GPU
Apache Spark 3.0 in GPU
48. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster with GPU
Existing Big Data cluster
Big Data Engineering
Big Data Analysis
Big Data Science
– Integrate GPU chips to the cluster
– Big Data x GPU
• Improved Parallelism
– Distributed Parallel x Parallel Chip Computing
Not needs data migration and can leverage the parallel
computing and existing large scale data with GPU
Big Data Cluster with GPU
49. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Case I: Traffic Data Analysis
Dalyapraz Dauletbak, Junghoon Heo, Sooyoung Kim, Yeon Pyo Kim and,
Jongwook Woo, "Scalable Traffic Predictive Analysis for Smart City using
GPU in Big Data", KSII The 16th APIC-IST, June 20-22 2021, pp144-148, ISSN
2093-0542
Columns to consider :
Location/Time
– X and Y coordinates (Longitude & Latitude)
Level of traffic intensity (1 - 5)
Counts of jams/alerts
Traffic Jam Analysis with Classification:
Found the Time for Traffic Jam
– Rush hours from 7 am to 9 am produce a lot of traffic,
– the heaviest traffic time
• start from 3pm and gets better after 6pm.
50. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Features/columns in a dataset
Label to Predict:
Level of traffic (0, 1)
Features:
location x, location y X and Y -coordinate of location
date_pst Pacific Time of the publication of traffic report
level Label: jam level
1: almost no jam, 5: standstill jam
speed driver’s captured speed in mph
length length of the traffic ahead in the route of user in
meters
*date_pst *date splits into month, day, hour, min, sec,
weekday
51. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experiment: H/W Specification
Dataproc Cluster of GCP: Hadoop Spark
Spark 3.1.1 on Hadoop 3.2.2
Spark Cluster 2 worker nodes
(CPU)
2 GPUs
n1-highmem-32 nvidia-tesla-t4
Cores 32 48
Memory 208 GB 32 GB
52. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Accuracy of Models
3 Algorithms
XGBoost, Gradient Boost Tree (GBT), Random Forrest (RF)
XGBoost has 100% Recall, Precision, and AUC
High Recall: low FN
RF (CPU) GBT
(CPU)
XGBoost
(GPU)
AUC 86.3% 89.6% 100%
Precision 0.890 0.922 1.0
Recall 0.956 0.947 1.0
Computin
g Time
1 hrs 8
min 53 sec
3 hrs 55
min 23
sec
21 sec
53. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Computing Time to train Models
RF GBT XGBoost
Computing Time 1 hrs 8 min
53 sec
3 hrs 55 min
23 sec
21 sec
Computing Time:
Log(Sec)
3.62 4.15 1.32
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 hrs 8 min 53 sec
3 hrs 55 min 23 sec
21 sec
RF
GBT
XGBoost
Computing Time: Log(Sec)
54. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
GCP Cluster Price
Price of a Cluster
Number of nodes to be computationally equivalent
– Assuming the cluster is linearly scalable
RF GBT XGBoost
Equivalent No of Nodes 199 673 2
Total Prices $753.35 $2547.77 $5.99
GCP Price/hours
n1-highmem-32 (CPU) $1.892848
nvidia-tesla-t4 (GPU) $1.1
Total $2.99
55. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Case II: Fraud Detection in Financial Data
Priyanka Purushu, Jongwook Woo, "Financial Fraud
Detection adopting Distributed Deep Learning in Big Data",
KSII The 15th APIC-IST 2020, July 5 -7 2020, Seoul, Korea,
pp271-273, ISSN 2093-0542
Distributed Deep Learning without GPU
No public available datasets on financial services
private nature of financial transactions
– specially in the mobile money transactions domain
PaySim
URL: https://www.kaggle.com/ntnu-testimon/paysim1
56. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Understanding
Numeric attributes:
amount, oldbalanceOrg, newbalanceOrg, oldbalanceDest, newbalanceDest
Categorical attributes:
step, type, isFraud, isFlaggedFraud
String attributes:
nameOrig, nameDest
57. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Label: isFraud
Data is biased as others
isFraud has only a few positives
– not helpful in detecting a fraud transaction
Traditional Approach
Need to generate sample data to balance the data
to build a model
– SMOTE (Synthetic Minority Over Sampling
Technique) algorithm adopted
• Minority Data: 11% from 0.2 %
Large Scale data does not need to generate
it as it has good enough data set
Just sample and balanced the data to build a model
58. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experimental System Specification
Cluster in Google Cloud Platform
Hadoop Spark of Dataproc cluster
– Python 2.7.14, Spark 2.3.4
– Intel BigDL
Google Cloud Platform (GCP):
Instances: n1-standard-64 (64 vCPUs, 240 GB memory, 257 TB storage)
Number of Nodes: 6
– Memory size:
• 1.44 TB = 1440 GB (= 240 GB x 6)
– CPU:
• 384 vCPU (= 64 vCPUs x 6), 2.0 GHz
– Storage:
• 1.542 PB = 1,542 TB (= 257TB x 6)
59. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Financial Data Set (Cont‘d)
Size: 470 MB
6,362,620 records
Not that large scale data comparing to data set > GB
But the Big Data architecture can be applicable to much bigger data set
– As it still adopt Spark Computing Engine in Big Data
Attributes: 11
Predictive Analysis
The target column to predict fraud :
– ‘isFraud’
60. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Comparing Spark ML and DDL for fraud detection
Spark ML algorithms
DT (Decision Tree)
RF (Random Forest)
LR (Linear Regression)
DDL: Distributed Deep Learning in Spark
MFF (Multilayer Perceptron FF)
– Feed Forward (FF)
• a neural network system
– Cross Validation (CV)
– Train Split Validation (TSV)
BigDL FF (BFF)
Achieve High Recall: low FN
61. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Accuracy and Performance
Model Precision Recall AUC Time
(mins)
DT 0.976 0.975 0.976 3
RF 0.977 0.980 0.979 13
LR 0.946 0.860 0.905 3
MFF TSV 0.694 1 0.782 2
MFF CV 0.695 1 0.783 4
BFF 0.593 0.516 1 4
62. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Confusion matrix of RF
RF should be the optimal model
has the high
– Recall: 0.980, and Precision: 0.977
Good AUC: 0.979
MFF: Recall 1
AUC is low:
– 0.782
BFF: AUC 1
Recall is low:
– 0.516
RF Actual
Negative
Actual Positive
Predicted
Negative
124,034 2,847
Predicted
Positive
2,534 122,936
63. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Performance
MFF TSV has the fast computing time of about 2
minutes
Others:
– 3 – 4 mins
RF: 13 mins
0 5 10 15
DT, 3
RF, 13
LR, 3
MFF TSV, 2
MFF CV, 4
BFF, 4
Computing Time (min)
64. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Successful Enterprise: Business + Engineering
Low Tech (Cost?) but
High Biz
High {Biz + Tech
(Cost?)}
Low {Biz + Tech} High Tech but Low Biz
Engineering / Technology (Cost?)
Business
65. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Collaboration
How do you adopt technology for your business
High Tech
– Not to focus on technology itself
• Good enough technology
Business
– Good Business model
• Good enough or Latest technology?
Needs Convergence and Collaboration
Communication between biz and eng needed
Find the proper solution
– Leveraging the optimal Tech
– Gain the highest Business Profit
66. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
Introduction To Big Data
Scalable Business Intelligence
Predictive Analysis with Big Data AI
Summary
67. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
Big Data platform for Large Scale Data
High Performance solution for massive data set
– Data Storage, Analysis, Prediction
Unified Analytics Platform
Big Data and AI
Big Data
– without GPU but with Deep Learning
GPU
– Leveraging Big Data with GPU
Big Data Predictive Analysis Performance with GPU
Faster
More Accurate
Much cheaper
68. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?
69. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
1. J. Barbaresso, G. Cordahi, D. Garcia et al., “USDOT’s Intelligent Transportation Systems (ITS) ITS Strategic Plan
2015- 2019,” 2014.
2. “Integrated Corridor Management,” Intelligent Transportation Systems - Integrated Corridor Management,
www.its.dot.gov/research_archives/icms/. Accessed April 14, 2019.
3. J. Kestelyn, “Real-Time Data Visualization and Machine Learning for London Traffic Analysis,” Google Cloud,
2016, cloud.google.com/blog/products/gcp/real-time-data-visualization-and-machine-learning-for-london-
traffic-analysis. Accessed April 14, 2019.
4. “Connected Citizens by Waze,” Waze, www.waze.com/ccp. Accessed April 14, 2019.
5. M. Schnuerle, “Louisville and Waze: Applying Mobility Data in Cities,” Harvard Civic Analytics Network
Summit on Data-Smart Government, 2017.
6. Louisville Metro. “Thunder Jams, 2017 Traffic Delays.” CARTO, louisvillemetro-
ms.carto.com/builder/d98732d0-1f6a-4db2-9f8a-e58026bf0d39/embed. Accessed April 14, 2019.
7. Louisville Metro. “Pothole Animation.” CARTO, cdolabs-admin.carto.com/builder/a80f62bf-98e1-4591-8354-
acfa8e51a8de/embed. Accessed April 14, 2019.
8. E. Necula, “Analyzing Traffic Patterns on Street Segments Based on GPS Data Using R,” Transportation
Research Procedia, Vol. 10, pp. 276–285, 2015.
70. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
9. J. Woo and Y. Xu, “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing,” in Proc. of
International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las
Vegas. 2011.
10. “Pandas.io.json.json_normalize.” Pandas.io.json.json_normalize - Pandas 0.24.2 Documentation,
pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html. Accessed April
14, 2019.
11. United States, Chief Executive Office County of Los Angeles. “Cities within the County of Los Angeles.”
lacounty.gov. Accessed April 14, 2019.
12. Garyericson. “What Is - Azure Machine Learning Studio.” Microsoft Docs, docs.microsoft.com/en-
us/azure/machine-learning/studio/what-is-ml-studio. Accessed April 14, 2019.
13. A. Tharwat, “Classification Assessment Methods.” Applied Computing and Informatics, 2018.
14. M. Sokolova and L. Guy, “A Systematic Analysis of Performance Measures for Classification
Tasks,” Information Processing & Management, Vol. 45. No. 4, pp. 427–437, 2009.
15. Performance of Dataframe in Spark and PySpark, https://databricks.com/blog/2015/02/17/introducing-
dataframes-in-spark-for-large-scale-data-science.html
16. https://cities-today.com/smart-traffic-management-could-save-cities-us277-billion-by-2025/
17. https://www.greenbiz.com/article/advanced-traffic-management-next-big-thing-smart-cities