Oppenheimer Film Discussion for Philosophy and Film
AWS Customer Presentation - eHarmony
1. Matchmaking in the Cloud A study on Amazon EC2, Elastic MapReduce and Apache Hadoop at eHarmony Ben Hardy - Sr. Software Engineer
2.
3.
4.
5.
6.
7. Architecture Overview Hadoop Data Warehouse Local Store unload s3put EC2 S3 s3get start verify get job status shutdown User and Match data Cluster control Score Data store
Here are some facts and figures on us 2% of US Marriages
Db joins etc, models are CPU and IO intensive and need to be tested, in offline system we can take advantage of aggregate data without constraining our online system
Getting our data to and from EC2 is definitely non-trivial Steps 1,2,6, and 7 are outside the cloud
EMR simplifies the process and scripting for us by consolidating the allocation, hadoop configuration and process control of the jobs
Lots of steps before EMR. No fun. Lots of possible points of failure. No need to copy job to master, or even touch the master in any way. Uses Amazon’s elastic-mapreduce.rb utility script.
Status of job flow Status of steps in each job flow