Dhruv Bansal, Infochimps CSO and Co-founder, discusses how Big Data tools are helping agencies build out repeatable revenue platforms, and what factors are important when setting down the path of Big Data.
Data comes from Twitter, Facebook, Omniture, Google, CRM systems, data warehouses, and much more. Tap into that data to enhance your analytics products and services.
5. #rtanalytics
more data than ever before
CRM
POS
ERP
BI & data warehouse
system & network logs
web logs
many terabytes of data,
google analytics
sometimes many petabytes
?!
facebook
twitter
tumblr
foursquare
klout
mobile devices
product reviews
google search results
+ more
6. #rtanalytics
enhance the agency product line
• CRM/customer insights
• ad/content optimization and targeting
• social media analytics
• sentiment/influencers
• topics/memes
• web traffic analysis
• mobile/cross-channel reporting
• data-driven ad campaigns
16. #rtanalytics
real-time analytics is the future
• high volume and high velocity
• customer demand
• competitive differentiation
• cost & speed benefit
17. #rtanalytics
why isn’t everyone
and their mother
doing this?
18. #rtanalytics
big data expertise
scalability and flexibility
agnostic integration
19. #rtanalytics
platform
data input
insights
data delivery
database
elastic
analytics
service™
management
hadoop
tools
25. #rtanalytics
real-time analytics
powered by the data delivery service™
26. #rtanalytics
enhance the agency product line
• CRM/customer insights
• ad/content optimization and targeting
• social media analytics
• sentiment/influencers
• topics/memes
• web traffic analysis
• mobile/cross-channel reporting
• data-driven ad campaigns
27. #rtanalytics
the new winning reality
• your clients are asking questions that
require big data to answer
• your agency has the opportunity for a
bigger seat at the CMO's table
28. #rtanalytics
benefits we provide
• generate insights clients need while:
• saving valuable resources
• delivering solutions extremely quickly
• responding to unique client needs
• we are:
• a flexible big data foundation to build upon
• your outsourced big data partner
29. #rtanalytics
Let’s discuss your big data projects and ideas.
infochimps.com/free-big-data-consultation
info@infochimps.com
follow us: @infochimps
Notes de l'éditeur
Amanda introduces Dhruv, Infochimps, and sets the context for the webinar.
Hi, everyone, my name is Dhruv Bansal and I’m one of the founders and the CSO at Infochimps. We make big data infrastructure simple. We’re proud to knit together solutions for our customers from best of breed big data technologies that we research and become experts at using and deploying – so our customers don’t have to.
Before we get started I’d like to set an agenda for what we’ll be talking about today. I’m going to start by telling you some stories about big opportunities in big data that we’re hearing from our customers in the agency space. I hope you’ll all agree that you see these same opportunities and challenges in front of you.Next I’m going to talk about why these challenges and opportunities all ultimately stem from or depend upon big data technology. I’ll next talk about Hadoop, a tool I’m sure many of you have heard of, and talk about some other big data tools that you might be less familiar with.That’ll leave us in a good position to show you a real, live demo that highlights some of what I’ve been saying and gives you a taste for how the Infochimps platform works and how it can help you overcome some of the challenges you’ll face with big data and capitalize on some of the opportunities.Finally, we’ll wrap up, and I’ll take some questions.
But first – a poll! Just to get a sense of where you all are, let me ask a quick, almost diagnostic question:How much data do you have under management? 0-200 GB 201 GB – 1 TB 1-10 TB Over 10 TB Absolutely no idea how to measure this
So let’s dig into it. Big data is a pretty easy idea to explain: we produce data, all the time, constantly, and we produce a lot of it. Data centers now take up 1.3% of global energy usage – as much as the entire continent of Australia. So we have some similarly big challenges and even bigger opportunities.On the left on this slide I’ve listed just a few of the kinds of data sources that might be available to an agency, should they choose to ingest them. Everything from their own clients’ customer databases, to streams of tweets from Twitter, to Google search results and even forum posts, can be ingested in the pursuit of building something that generates insights for their clients.
But what do clients actually want? Anything. Everything! Some want to take advantage of listening platforms to report on everything that happens with their brand in social media. Your client is an airline and an upset customer’s tweets complaining about bad service go viral #faii – imagine being able to spot this trend within seconds because you detected an anomalous number of tweets with negative sentiment about the airline localized within a city. Reactive PR at the speed of light can help nullify a crisis or amplify positive stories.Others clients will want their customer databases connected to what those same customers are talking about online to better segment them and increase sales and revenues. Fire off an email campaign with a diaper discount to Twitter users who exclaim that they’ve just had a baby!Some brands know that there exist prominent users of social networks, influencers who can organically and honestly spread the gospel about their brand to the rest of the Web – if only they can be identified and reached.Our customers don’t lack for ideas about what *could* be done. The opportunities here are endless, and the ground so poorly trod, that it’s easy to think of ways of combining social data feeds with customer data to make wonderful things happen.But there’s one big challenge.
You guessed it! Ingesting all the data sources available these days, combining them in smart ways to produce insight, and delivering that insight to your customers in the way they want it, in real-time so they can act on it hard. It’s big data hard.
GARTNERBig data means chiefly means three things: large (big) data volume, large throughput of data per second or minute, and a large variety of different types of data to handle.Variety – the prior slide has just a small subset of the data sources our clients are excited aboutWhat do you need in order to be able to solve these problems?
Our clients tell us that they don’t have the expertise or support staff to solve these problems right now. And they don’t want to spend the time or resources required to hire these very rare persons. It’s just not their core competency and they don’t want it to be.
Well, that’s not strictly fair. Some of our clients have already started to work with something called Hadoop. Hadoop is a wonderful open source tool designed to process big data at tremendous scale, reliably, and without using top-of-the line hardware.
But first – a poll! To help me understand the level of technical detail I can assume from you all, I’d like to know what you know about big data. Have you evaluated Hadoop? Yes No
Despite how well-known Hadoop is, even in the agency ecosystem, there’s still often confusion about what it actually does and what problems it solves. Let me show you an illustrative – if a little silly – example that might help you to understand exactly when and why you would use Hadoop.
Welcome to the Batch Sub Shop! We make sandwiches, lots of sandwiches. If we get a big order for 1,000 subs, we execute that order all at once. Hadoop has two phases in each calculation or job that executes, the map phase and the reduce phase. In the map phase, input data is modified, transformed, parsed, or otherwise altered or prepared. In our Batch Sub Shop, the map phase is when we slice our bread and our veggies and prepare our meat.In the reduce phase, transformed data from the map phase is assembled into the final output we want. In our Batch Sub Shop, the reduce phase is when we assemble all the sandwich orders from the sliced bread, veggies, and meats we prepared in the map phase. In a few hours, we’ll deliver a huge batch of sandwiches, fresh and delicious.For those of you who have started to evaluate Hadoop, I hope you’re getting the joke here. Hadoop is great in the same way that a caterer is great: if you have a big order and you don’t need it right away, it’s the perfect choice. Similarly, if you have a large amount of data that you need analyzed and you don’t need the result right away you should use Hadoop. This is one of the reasons that Hadoop so popular for analyzing historical data in a batch processing paradigm.
But say you were really hungry, and you just really want to eat your sandwich now. Our batch sub shop will make you wait 3 hours! Sure, you’ll get 1000 sandwiches at the 3-hour mark, but that’s not very helpful if you just wanted one right away.Similarly, Hadoop is not the right big data tool to use when you want results right away, in real-time, because not only do you have to assemble all your data in one place, as we assembled all our ingredients in one place in the batch sub shop, but you also have to wait for the full computation to finish before you get any results. This can often take hours. This makes Hadoop appropriate for batch or offline calculations that can run, say, overnight, and whose results we won’t need to see till morning.But what if we need results right away?
Enter the Streaming Sub Shop. This sub shop works like a conveyor belt. Ingredients enter on the left and as they move through the shop, we slice ‘em, dice ‘em, assemble those sandwiches, and get them toasted and served. The first sandwich will come out in just a few minutes and sandwiches will continuously be produced afterwards as they’re continuously fed in.Similarly, there are technologies complementary to Hadoop which enable this kind of stream processing of big data.
While batch-processing is better known and still useful, Real-time analytics, like the streaming sub shop, is actually the right tool for a lot of the data feeds that are relevant to the agency. Social media is not only high volume data, but high velocity as well: it’s continuously being produced.
But now let’s return to the central challenge of big data: why aren’t you doing it right now? Why aren’t your competitors? It’s because it’s hard, you lack the expertise, and you haven’t or can’t hire the necessary resources – all of whom are rare and expensive.
Infochimps are experts in the full-stack of big data technologies. We want to inform you of several factors you should be evaluating as you start moving into the big data space.You need the expertise to evaluate and choose the right tools and architecture to solve your big data problems. These choices need to provide a scalable infrastructure that can handle volume, velocity, and variety of big data while remaining flexible and adaptable as your needs change. You’ll likely also want to integrate with whatever technologies you’re already running, whether they are traditional resources like SQL servers or Tableau or your first forays into big data.
When we approach these problems, we think about four major tools that need to work together. We’ve designed and built these tools to work together in what we call the Infochimps Platform. It consists of four main pieces:The data delivery service. This is our streaming sub shop. We can ingest data at the terabyte scale in real time, transport it to wherever else in the infrastructure it needs to go, and transform it on the flyNext is our database management service. The world of big data has a lot of choices for database technologies, everything from Cassandra, to Hbase, to Mongodb to Elasticsearch, to even traditional SQL services. Each of these databases has strengths and weaknesses. Don’t lock yourself into only being able to use a single database and don’t work with vendors who push this choice onto you. You want to be able to grow your stack as your needs change; a federated database management system is the right approach.Of course there’s also Hadoop. Our approach to Hadoop is a little bit different than the usual. Instead of having a Hadoop cluster around at all times, possibly being underutilized and therefore wasting resources, we allow you to spin up Hadoop clusters on demand, run your jobs, and spin them back down. You can always leave one of them up forever, if that’s your use case, but if it isn’t, you can unlock the power of Hadoop while also seeing tremendous cost savings due to our dynamic approachfinally, you’ll need an analytics and monitoring layer. Whether its seeing what your servers are up to, watching data stream by, writing simple scripts that can be automatically turned into Hadoop jobs, you need analytics tools make it easy for even non-big data experts to be productive in this new world. For those of you that want to reach into the lower layers of these systems, you’ll want to have direct access to the Hadoop Java APIs and tools like Pig or Hive.The advantage of working with the Infochimps Platform is that we’ve already identified these needs because of our long experience in this space and we’ve built a framework with compatible tools that fit together nicely into a solution.
Before we get into talking about how the Infochimps platform can help you, we’d like to know a little bit more about your plans to get into big data.How many of you have hired big data talent? Team of 1-3 Team of 4+ people Not hired yet Not planning on hiringI ask because investing in people to manage a big data infrastructure isn’t always the best strategy for an agency.
Different parts of a modern digital agency scale differently and have different cost structures and requirements. People are the most expensive and least scalable part of the operation but they also provide the most value – your people are your core competency, your creatives that differentiate your agency, what your clients pay you for.But these people are increasingly supported by reports and visualizations, the kinds of resources required by analysts to provide quantitative insight into the scope or effectiveness of campaigns. These reports themselves are supported by listening and content technologies that drive their value upward through the stack.It’s a mistake to pull people from the top of this pyramid, where they produce the most value, to the lower tiers which really should be supported entirely by technology.
And that’s exactly what Infochimps does. The tools that comprise our platform provide a grammar for building solutions that power your creative teams without bogging down your organization with the responsibility of having to manage these new systems.
To help you understand how our platform could be useful to you, I want to tell you a simple story about a made up Agency, Agency X.Agency X is excited about making their first inroads into leveraging big data to create insights for their clients and to create repeatable new revenue streams for themselves.This system represents a fully-automated big data platform that provides a foundation for a repeatable service Agency X can sell to client after client.
Just a quick overview of what we just saw. We collected data, moved it through a transformation, and lodged it into a database where it could be queried in both an ad hoc way, behind the scenes, or through a custom frontend application.
I’d also like to take a moment to bring it back to the use cases we outlined at the beginning of the webinar. The Agency X example and the demo I gave illustrate just one of these possible agency products: a social media listening service.Because the Infochimps Platform is a set of tools designed to work together to solve a big data problem, it can be used to for any of these use cases.
Solutions like these are becoming more important because the world has changed. Big data is here and your clients want to know about what their customers are doing. We can help you help *your* clients gain those insights, giving your agency a bigger seat at the CMOs table.Agencies have been dabbling in big data 1.0 for a while. Black box services that provide limited abilities to do cross-channel marketing, meager (but passable) sentiment analyses, are common throughout the industry.The next wave is about mixing and matching different data sources, public and private, and connecting them to provide a *context* for your customers. This context turns pretty dashboards into useful tools.
Our platform helps you build this context in an entirely new way.We aren't just a data supplier or an infrastructure hosting provider or a software company - we're a strategic resource. The agencies who will win are making big data analysis one of their new and essential superpowers. And we’re helping get them there.
This concludes our webinar and I thank you all for coming and please come visit our website and schedule a free big data consultation. I’d love to hear about the use cases that you’re engaged with and help you understand the most effective way to realize success.