39. Presto Hive Spark
Hive MetaStore
Data in S3
(RCfile, JSONー2012年 からのログ)
SparkSQL/MLlib
SmartNews UDF, UDAF
(user defined function)Chartio / Shib
ad-hoc
analysis
reporting pre-process
model
creation
realtime
analysis
Streaming
Data
news
engine
Hive
Azkaban
SparkHive Spark
39
40. Spark MLlib
● Collaborative filtering
○ ALS (Alternating Least Squares)
Pre Process
on Hive
SpakSQL
Data
in S3
RDD
ALS MLlib
on Spark
Apache Spark on EMR
http://www.slideshare.net/smartnews/aws-meetupapache-spark-on-emr
40