Mike Brown's CTO of comScore's presentation from the Big Data Warehouse Meetup sponsored by Syncsort Sept 2013 NYC covering how they process over 1.7 Trillion interactions using Hadoop
Key MessagecomScore is a global internet technology company providing customers with Analytics for a Digital WorldSupporting Talking PointsFounded in 1999, comScore is best known as the gold standard for measuring digital activity, including website visitation, search, video, social, digital advertisingcomScore’s data and technologies are well-established crucial components in measuring and analyzing the rapidly evolving digital world, and are widely deployed at a broad range of publishers, advertising agencies, advertisers, retailers and telecom operators, both in the US and internationally
comScore leverages DMExpress from SyncSort across hundreds of our servers to allow us to efficiently process our data.A generic design pattern for us is to sort the input data based on the column that we will be counting uniques. Counting uniques is one of the more costly measures to calculate in a system. By sorting the data in advance, you only need to see if the prior value has changed from the current value and increment a counter.This approach has let us implement aggregation systems that can process over 50 GB of data with 357 million rows in less than an hour on a Dell R710 2U server.