Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Scalable Predictive Analysis and The Trend with Big Data & AI

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 70 Publicité

Scalable Predictive Analysis and The Trend with Big Data & AI

Télécharger pour lire hors ligne

The history and the latest trend of Big Data and Scalable Predictive Analysis for large scale data set using Distributed Machine Learning and Deep Learning with GPUs in Spark and Rapids; Invited talk at IS department of Yonsei University, Korea

The history and the latest trend of Big Data and Scalable Predictive Analysis for large scale data set using Distributed Machine Learning and Deep Learning with GPUs in Spark and Rapids; Invited talk at IS department of Yonsei University, Korea

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Scalable Predictive Analysis and The Trend with Big Data & AI (20)

Publicité

Plus par Jongwook Woo (18)

Plus récents (20)

Publicité

Scalable Predictive Analysis and The Trend with Big Data & AI

  1. 1. Jongwook Woo BigDAI CalStateLA School of Business Yonsei University Oct 5 2021, Korea Jongwook Woo, PhD, jwoo5@calstatela.edu Big Data AI Center (BigDAI) California State University Los Angeles Scalable Predictive Analysis and The Trend with Big Data & AI
  2. 2. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Business Intelligence  Predictive Analysis with Big Data AI  Summary
  3. 3. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself Experience: Since 2002, Professor at Dept. of IS, California State University Los Angeles – PhD in 2001: Computer Science and Engineering at USC
  4. 4. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: S/W Development Lead http://www.mobygames.com/game/windows/matrix-online/credits
  5. 5. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Partners for Services
  6. 6. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Myself: Collaborations SOFTZEN
  7. 7. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Collaboration with NVidia, Databricks, Oracle, Amazon, CDH using Big Data AI https://www.cloudera.com/more/customers/csula.html
  8. 8. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Business Intelligence  Predictive Analysis with Big Data AI  Summary
  9. 9. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Issues Large-Scale data Tera-Byte (1012), Peta-byte (1015) – Because of web – IoT (Streaming data, Sensor Data) in SmartX – Social Computing, smart phone, online game – Bioinformatics, … Legacy approach  Can handle the massive data set – Increase the storage size – Improve the speed of CPU  Only Problem – Too expensive
  10. 10. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Traditional Way
  11. 11. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Traditional Way Becomes too Expensive
  12. 12. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Definition What is Big Data? Data or Systems? Data view: Large Scale Data? –3 Vs, 5Vs • Velocity, Volume, Variety –Many people only see the data point of view • Nothing new Systems View: – YES, new systems for large scale data • Non-expensive
  13. 13. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Two Cores in Big Data How to store Big Data How to compute Big Data Google How to store Big Data – GFS – Distributed Systems on non-expensive commodity computers How to compute Big Data – MapReduce – Parallel Computing with non-expensive computers Own super computers Published papers in 2003, 2004
  14. 14. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way But Works Well with the crazy massive data set Battle of Nagashino, 1575, Japan
  15. 15. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Not Expensive From 2017 Korean Blockbuster Movie, “The Fortress” (남한산성)
  16. 16. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Handling: Another Way Not Expensive http://blog.naver.com/PostView.nhn?blogId=dosims&logNo=221127053677 AD 1409 (Year 9 of King Tae-Jong, Chosun Dynasty, Korea) By Choi family: 최해산(崔海山), 아버지 최무선(崔茂宣) [Ref] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
  17. 17. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Solution: Large Scale Data Big Data: Non-expensive platform, which is distributed parallel computing systems and that can store a large-scale data and process it in parallel  Apache Hadoop and Spark since 2006 – Non-expensive Super Computer – Any small companies or university labs can own it
  18. 18. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Big Data (Hadoop, Spark, Distributed Deep Learning) Cluster for Compute and Store (Distributed File Systems: HDFS, GFS) …
  19. 19. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data: Linearly Scalable  Some people questions that the system to handle 1 ~ 3GB of data set is not Big Data Well…. add more servers as more data in the future in Big Data platform – it is linearly scalable once built – n time more computing power ideally Data Size: < 3 GB Data Size: 200 TB > Add n servers
  20. 20. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Business Intelligence  Predictive Analysis with Big Data AI  Summary
  21. 21. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data is great for everyone: University labs and Small Business, etc  Big Data Analysis Hadoop, Spark, NoSQL DB, SAP HANA, ElasticSearch,.. Big Data for Data Analysis – How to store, compute, analyze massive dataset? You have your specific data Big Company does not have a specific data as you have Your Business data is the value – Customer data – Operational data
  22. 22. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Analysis and Prediction Flow Data Collection Batch API: Yelp, Google Streaming: Twitter, Apache NiFi, Kafka, Storm Open Data: Government Data Storage HDFS, S3, Object Storage, NoSQL DB (Couchbase)… Data Filtering Hive, Pig Data Analysis and Science Hive, Pig, Spark, BI Tools (Datameer, Qlik, …) Data Visualization Qlik, Datameer, Excel PowerView Big Data Engineering Big Data Analysis Big Data Science
  23. 23. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Data Analysis & Visualization Sentiment Map of Alphago Positive Negative
  24. 24. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA K-Election 2017 (April 29 – May 9)
  25. 25. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Businesses popular in 5 miles of CalStateLA, USC , UCLA
  26. 26. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Jams and other traffic incidents reported by users in Dec 2017 – Jan 2018: (Dalyapraz Dauletbak)
  27. 27. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA COVID 19 Dashboard https://www.calstatela.edu/centers/hipic/covid-19-us-ca-confirmed-prediction
  28. 28. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Business Intelligence  Predictive Analysis with Big Data AI  Summary
  29. 29. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Big Data Prediction Big Data Science How to predict the future trend and pattern with the massive dataset? => Machine Learning Deep Learning Machine Learning AI
  30. 30. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark Limitation in MapReduce computing Hard to program in Java Batch Processing – Not interactive Disk storage for intermediate data – Performance issue for Machine Learning
  31. 31. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark (Cont’d) Spark by UC Berkley AMP Lab Started by Matei Zaharia in 2009, – and open sourced in 2010 In-Memory storage for intermediate data 20 ~ 100 times faster than – MapReduce Good in Machine Learning => Big Data Science – Iterative algorithms Spark ML Supports Machine Learning libraries Process massive data set to build prediction models
  32. 32. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning Machine Learning Has been popular since Google Tensorflow, Nov 9 2015 Multiple Cores in GPU – Even with multiple GPUs and CPUs Parallel Computing in a chip GPU (Nvidia GTX 1660 Ti) 1280 CUDA cores Other Deep Learning Libraries Tensor Flow with Keras PyTorch by Facebook Apache Mxnet Caffe, Caffe2 Microsoft Cognitive Toolkit (Previously CNTK) DeepLearning4j …
  33. 33. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA From Neural Networks to Deep Learning Deep learning – Different types of architectures Generative Adversarial Networks (GAN) Convolutional Neural Networks (CNN) Neural Networks (NN) 7 © 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Recurrent Neural Networks (RNN) & Long-Short Term Memory (LSTM) Ref: SAP Enterprise Deep Learning with TensorFlow
  34. 34. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning CNN Image Recognition Video Analysis  NLP for classification, Prediction RNN Time Series Prediction Text Analysis – Conversation Q&A Image/Video Captioning Speech Recognition/Synthesis GAN  Media Generation – Photo Realistic Images Human Image Synthesis: Fake faces
  35. 35. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Scale Driving: Deep Learning Process Deep Learning and Massive Data [3] “Machine Learning Yearning” Andrew Ng 2016
  36. 36. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep learning experts The Chasm Big Data Engineers, Scientists, Analysts, etc. Another Gap between Deep Learning and Big Data Communities [6]
  37. 37. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster  Existing Big Data cluster with massive data set without using Big Data Too slow in data migration and single server fails Single GPU server for Deep Learning? Single server for Python and R Traditional Machine Learning? Big Data Cluster
  38. 38. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark What if we combine Deep Learning and Spark?
  39. 39. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster with Deep Learning  Existing Big Data cluster Big Data Engineering Big Data Analysis Big Data Science Distributed Deep Learning – Integrate Deep Learning to the cluster Not needs data migration and can leverage the parallel computing and existing large scale data Big Data Cluster
  40. 40. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Deep Learning with Spark Deep Learning Pipelines for Apache Spark Databricks BigDL (Distributed Deep Learning Library for Apache Spark) Intel TensorFlowOnSpark Yahoo! Inc DL4J (Deeplearning4j On Spark) Skymind Distributed Deep Learning with Keras & Spark Elephas
  41. 41. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Spark ML and DDL [2-5] Deep Learning in Spark cluster Distributed Deep Learning DDL DDL lib DDL lib Deep Learning in Spark
  42. 42. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Business Intelligence  Predictive Analysis with Big Data AI  Big Data using GPU  Summary
  43. 43. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster with GPU What if we use GPU for Big Data Cluster?
  44. 44. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Again: Big Data Cluster with GPU  Existing Big Data cluster with massive data set Too slow in data migration and single server fails Single GPU server for Machine (Deep) Learning? Big Data Cluster
  45. 45. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster with GPU Big Data Cluster: Unified Analytics Platform Already built in the site – and matured for Data Engineering, Data Analysis, Data Science Can we use the existing Big Data cluster with GPU? – Can we integrate GPU to this Big Data Cluster? NVidia RAPIDS and Spark
  46. 46. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Distributed Parallel Computing using RAPIDS  RAPIDS: Parallel Machine Learning (ML) on GPU  RAPIDS + Spark:  Distributed Parallel ML in Big Data – XGBoost: (+) machine learning not deep learning (+) Leveraging Big Data No bottleneck for large scale data
  47. 47. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Parallel Computing with GPU  Apache Spark 3.0 in GPU
  48. 48. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Leveraging Big Data Cluster with GPU  Existing Big Data cluster Big Data Engineering Big Data Analysis Big Data Science – Integrate GPU chips to the cluster – Big Data x GPU • Improved Parallelism – Distributed Parallel x Parallel Chip Computing Not needs data migration and can leverage the parallel computing and existing large scale data with GPU Big Data Cluster with GPU
  49. 49. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Case I: Traffic Data Analysis  Dalyapraz Dauletbak, Junghoon Heo, Sooyoung Kim, Yeon Pyo Kim and, Jongwook Woo, "Scalable Traffic Predictive Analysis for Smart City using GPU in Big Data", KSII The 16th APIC-IST, June 20-22 2021, pp144-148, ISSN 2093-0542  Columns to consider :  Location/Time – X and Y coordinates (Longitude & Latitude)  Level of traffic intensity (1 - 5)  Counts of jams/alerts  Traffic Jam Analysis with Classification: Found the Time for Traffic Jam – Rush hours from 7 am to 9 am produce a lot of traffic, – the heaviest traffic time • start from 3pm and gets better after 6pm.
  50. 50. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Features/columns in a dataset Label to Predict: Level of traffic (0, 1) Features: location x, location y X and Y -coordinate of location date_pst Pacific Time of the publication of traffic report level Label: jam level 1: almost no jam, 5: standstill jam speed driver’s captured speed in mph length length of the traffic ahead in the route of user in meters *date_pst *date splits into month, day, hour, min, sec, weekday
  51. 51. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experiment: H/W Specification Dataproc Cluster of GCP: Hadoop Spark  Spark 3.1.1 on Hadoop 3.2.2 Spark Cluster 2 worker nodes (CPU) 2 GPUs n1-highmem-32 nvidia-tesla-t4 Cores 32 48 Memory 208 GB 32 GB
  52. 52. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Accuracy of Models 3 Algorithms  XGBoost, Gradient Boost Tree (GBT), Random Forrest (RF) XGBoost has 100% Recall, Precision, and AUC High Recall: low FN RF (CPU) GBT (CPU) XGBoost (GPU) AUC 86.3% 89.6% 100% Precision 0.890 0.922 1.0 Recall 0.956 0.947 1.0 Computin g Time 1 hrs 8 min 53 sec 3 hrs 55 min 23 sec 21 sec
  53. 53. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Computing Time to train Models RF GBT XGBoost Computing Time 1 hrs 8 min 53 sec 3 hrs 55 min 23 sec 21 sec Computing Time: Log(Sec) 3.62 4.15 1.32 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 hrs 8 min 53 sec 3 hrs 55 min 23 sec 21 sec RF GBT XGBoost Computing Time: Log(Sec)
  54. 54. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA GCP Cluster Price Price of a Cluster  Number of nodes to be computationally equivalent – Assuming the cluster is linearly scalable RF GBT XGBoost Equivalent No of Nodes 199 673 2 Total Prices $753.35 $2547.77 $5.99 GCP Price/hours n1-highmem-32 (CPU) $1.892848 nvidia-tesla-t4 (GPU) $1.1 Total $2.99
  55. 55. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Case II: Fraud Detection in Financial Data Priyanka Purushu, Jongwook Woo, "Financial Fraud Detection adopting Distributed Deep Learning in Big Data", KSII The 15th APIC-IST 2020, July 5 -7 2020, Seoul, Korea, pp271-273, ISSN 2093-0542 Distributed Deep Learning without GPU No public available datasets on financial services  private nature of financial transactions – specially in the mobile money transactions domain  PaySim URL: https://www.kaggle.com/ntnu-testimon/paysim1
  56. 56. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Data Understanding Numeric attributes: amount, oldbalanceOrg, newbalanceOrg, oldbalanceDest, newbalanceDest Categorical attributes: step, type, isFraud, isFlaggedFraud String attributes:  nameOrig, nameDest
  57. 57. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Label: isFraud Data is biased as others isFraud has only a few positives – not helpful in detecting a fraud transaction Traditional Approach Need to generate sample data to balance the data to build a model – SMOTE (Synthetic Minority Over Sampling Technique) algorithm adopted • Minority Data: 11% from 0.2 % Large Scale data does not need to generate it as it has good enough data set  Just sample and balanced the data to build a model
  58. 58. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Experimental System Specification Cluster in Google Cloud Platform Hadoop Spark of Dataproc cluster – Python 2.7.14, Spark 2.3.4 – Intel BigDL Google Cloud Platform (GCP):  Instances: n1-standard-64 (64 vCPUs, 240 GB memory, 257 TB storage)  Number of Nodes: 6 – Memory size: • 1.44 TB = 1440 GB (= 240 GB x 6) – CPU: • 384 vCPU (= 64 vCPUs x 6), 2.0 GHz – Storage: • 1.542 PB = 1,542 TB (= 257TB x 6)
  59. 59. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Financial Data Set (Cont‘d) Size: 470 MB 6,362,620 records Not that large scale data comparing to data set > GB But the Big Data architecture can be applicable to much bigger data set – As it still adopt Spark Computing Engine in Big Data Attributes: 11 Predictive Analysis The target column to predict fraud : – ‘isFraud’
  60. 60. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Comparing Spark ML and DDL for fraud detection Spark ML algorithms DT (Decision Tree) RF (Random Forest) LR (Linear Regression) DDL: Distributed Deep Learning in Spark MFF (Multilayer Perceptron FF) – Feed Forward (FF) • a neural network system – Cross Validation (CV) – Train Split Validation (TSV) BigDL FF (BFF) Achieve High Recall: low FN
  61. 61. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary: Accuracy and Performance Model Precision Recall AUC Time (mins) DT 0.976 0.975 0.976 3 RF 0.977 0.980 0.979 13 LR 0.946 0.860 0.905 3 MFF TSV 0.694 1 0.782 2 MFF CV 0.695 1 0.783 4 BFF 0.593 0.516 1 4
  62. 62. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary: Confusion matrix of RF RF should be the optimal model has the high – Recall: 0.980, and Precision: 0.977 Good AUC: 0.979 MFF: Recall 1 AUC is low: – 0.782 BFF: AUC 1 Recall is low: – 0.516 RF Actual Negative Actual Positive Predicted Negative 124,034 2,847 Predicted Positive 2,534 122,936
  63. 63. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary: Performance MFF TSV has the fast computing time of about 2 minutes Others: – 3 – 4 mins RF: 13 mins 0 5 10 15 DT, 3 RF, 13 LR, 3 MFF TSV, 2 MFF CV, 4 BFF, 4 Computing Time (min)
  64. 64. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Successful Enterprise: Business + Engineering Low Tech (Cost?) but High Biz High {Biz + Tech (Cost?)} Low {Biz + Tech} High Tech but Low Biz Engineering / Technology (Cost?) Business
  65. 65. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Collaboration How do you adopt technology for your business High Tech – Not to focus on technology itself • Good enough technology Business – Good Business model • Good enough or Latest technology? Needs Convergence and Collaboration Communication between biz and eng needed  Find the proper solution – Leveraging the optimal Tech – Gain the highest Business Profit
  66. 66. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Contents  Myself  Introduction To Big Data  Scalable Business Intelligence  Predictive Analysis with Big Data AI  Summary
  67. 67. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Summary  Big Data platform for Large Scale Data  High Performance solution for massive data set – Data Storage, Analysis, Prediction  Unified Analytics Platform  Big Data and AI  Big Data – without GPU but with Deep Learning  GPU – Leveraging Big Data with GPU  Big Data Predictive Analysis Performance with GPU Faster More Accurate Much cheaper
  68. 68. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA Questions?
  69. 69. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 1. J. Barbaresso, G. Cordahi, D. Garcia et al., “USDOT’s Intelligent Transportation Systems (ITS) ITS Strategic Plan 2015- 2019,” 2014. 2. “Integrated Corridor Management,” Intelligent Transportation Systems - Integrated Corridor Management, www.its.dot.gov/research_archives/icms/. Accessed April 14, 2019. 3. J. Kestelyn, “Real-Time Data Visualization and Machine Learning for London Traffic Analysis,” Google Cloud, 2016, cloud.google.com/blog/products/gcp/real-time-data-visualization-and-machine-learning-for-london- traffic-analysis. Accessed April 14, 2019. 4. “Connected Citizens by Waze,” Waze, www.waze.com/ccp. Accessed April 14, 2019. 5. M. Schnuerle, “Louisville and Waze: Applying Mobility Data in Cities,” Harvard Civic Analytics Network Summit on Data-Smart Government, 2017. 6. Louisville Metro. “Thunder Jams, 2017 Traffic Delays.” CARTO, louisvillemetro- ms.carto.com/builder/d98732d0-1f6a-4db2-9f8a-e58026bf0d39/embed. Accessed April 14, 2019. 7. Louisville Metro. “Pothole Animation.” CARTO, cdolabs-admin.carto.com/builder/a80f62bf-98e1-4591-8354- acfa8e51a8de/embed. Accessed April 14, 2019. 8. E. Necula, “Analyzing Traffic Patterns on Street Segments Based on GPS Data Using R,” Transportation Research Procedia, Vol. 10, pp. 276–285, 2015.
  70. 70. Big Data Artificial Intelligence Center (BigDAI) Jongwook Woo CalStateLA References 9. J. Woo and Y. Xu, “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing,” in Proc. of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas. 2011. 10. “Pandas.io.json.json_normalize.” Pandas.io.json.json_normalize - Pandas 0.24.2 Documentation, pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html. Accessed April 14, 2019. 11. United States, Chief Executive Office County of Los Angeles. “Cities within the County of Los Angeles.” lacounty.gov. Accessed April 14, 2019. 12. Garyericson. “What Is - Azure Machine Learning Studio.” Microsoft Docs, docs.microsoft.com/en- us/azure/machine-learning/studio/what-is-ml-studio. Accessed April 14, 2019. 13. A. Tharwat, “Classification Assessment Methods.” Applied Computing and Informatics, 2018. 14. M. Sokolova and L. Guy, “A Systematic Analysis of Performance Measures for Classification Tasks,” Information Processing & Management, Vol. 45. No. 4, pp. 427–437, 2009. 15. Performance of Dataframe in Spark and PySpark, https://databricks.com/blog/2015/02/17/introducing- dataframes-in-spark-for-large-scale-data-science.html 16. https://cities-today.com/smart-traffic-management-could-save-cities-us277-billion-by-2025/ 17. https://www.greenbiz.com/article/advanced-traffic-management-next-big-thing-smart-cities

×