Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

From Volume to Value - A Guide to Data Engineering

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 18 Publicité

From Volume to Value - A Guide to Data Engineering

Télécharger pour lire hors ligne

In this guide you will uncover the steps your organization needs to take to maintain a modern data stack and support growth.

In this guide you will uncover the steps your organization needs to take to maintain a modern data stack and support growth.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à From Volume to Value - A Guide to Data Engineering (20)

Publicité

Plus récents (20)

Publicité

From Volume to Value - A Guide to Data Engineering

  1. 1. 1A S T R O N O M E R . I O From Volume to Value A Guide to Data Engineering
  2. 2. 2Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Table of Contents Introduction........................................................................................................................................ 3 Information Overload......................................................................................................................5 Talent Gap..........................................................................................................................................6 A New Role: Data Engineering........................................................................................................8 Data Maturity Goals........................................................................................................................10 Starting to Climb..............................................................................................................................12 Next Steps..........................................................................................................................................15 Connect and Route Your Data with Astronomer........................................................................16 Conclusion (TL;DR)..........................................................................................................................17 About Astronomer............................................................................................................................18 Sources...............................................................................................................................................19
  3. 3. 3Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Introduction In today’s digital age, getting ahead depends on leveraging data better than competitors. Take Amazon’s acquisition of Whole Foods that caused competitors’ stock to drop significantly. Why? Because shareholders understand that when Amazon adds this plethora of storefront data to its abundance of virtual-buyer data, they will discover exclusive insights to drive business.1 And while reaching the peak of success and retaining the lead in the race to the summit look different based on industry, geography and other factors, some commonalities hold true. At Astronomer, we’ve mapped out the journey to becoming more mature with data—in other words, the path to gaining a competitive advantage.
  4. 4. 4Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O No matter where organizations are on their journey, next steps will require more data sets to deal with and more preparation to ready that data for analytics. Before moving toward the summit, it’s important to consider some key questions: • What metrics are most important to measure in my business? • What data sets are needed to measure them? • How can those data sets be accessed? • Who’s responsible to clean, reformat, organize, transform and otherwise prepare the data for analysis? Answering these questions is certainly challenging, which perhaps explains why only 4% of companies actively use their data. The remaining 96% includes thousands of companies that collect data but haven’t quite figured out how to derive maximum value from it.2 Those who have, however, will quickly gain a competitive advantage and see their early efforts pay off in the long run. In this guide, we’ll discuss three things to get you there: 1. Core challenges to extracting value from data 2. Practical ways to overcome those challenges and get to value 3. Actionable next steps for your organization Only 4 percent of companies are actively using their data. Are you? (Bain and Company)
  5. 5. 5Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Information Overload According to a McKinsey Global Institute (MGI) report, “data have swept into every industry and business function and are now an important factor of production, alongside labor and capital.” MGI estimated that retailers using big data to its fullest potential could increase operating margins by more than 60 percent, and that both businesses and consumers would benefit from leveraging the exponentially increasing data sets.3 And that was back in 2011. In 2016, a Gartner analysis further defined the need for data: organizations that provide agile, curated internal and external data sets for a variety of content authors will realize twice the business benefits of those that don’t.4 So why isn’t everybody curating these data sets and enabling individual analysts to not only ac- cess information but also contribute back to models? Because the many data sets available to companies between legacy systems, cloud-based tools, CRMs, databases, websites and other data-generating sources create a mass of structured, unstructured and siloed data sets that don’t “talk” to each other. Consolidating data is a critical first step, but it costs companies count- less hours of cleaning, enriching, and formatting. Simply put, data is a mess. Do you have data in a ... • legacy system? • cloud-based tool? • CRM? • database? • data lake? • website? • app? • more than one of any of the above? It’s likely you have a LOT of data. In various forms. Accumulating quickly. 
  6. 6. 6Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Talent Gap Of course, any mess can be cleaned up. The state of the mess— commonly described as the “three v’s of data” (volume, velocity and variety) aren’t the only obstacles. There’s another problem: the deep technical skills required to build, deploy and maintain a modern data infrastructure that can handle big data, and fast, are rare. In fact, the MGI analysis predicted that by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills and a shortage of 1.5 million managers and analysts who understand how to make effective decisions based on data. To contend with this, many companies have created a new role: the data scientist. Data scientists, according to the Harvard Business Review, are a “hybrid of data hacker, analyst, communicator and trusted adviser” with skills like programming, multivariable calculus and linear algebra and an understanding of machine learning. They can find patterns and extract insights from a giant body of data and write algorithms to run over these data sets.5 Becoming mature with data is impossible without these capabilities. There’s just one problem: data scientists aren’t spending their time creating algorithms, mining data for patterns or interpreting insights. Do you have a data scientist on staff? Ask them how much time they spend ... • Building training sets • Cleaning and organizing data • Collecting data sets • Mining data for patterns • Refining algorithms • Articulating analysis If you don’t have a data scientist on staff, who does these tasks? And how much of their time is devoted to each one? 
  7. 7. 7Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Eighty percent of a data scientist’s time is spent collecting data sets and cleaning and organizing them.6 It takes a high level of skill to do, but it’s not data science. So having a data science team isn’t enough. Every company must take a step back and clean, enrich, reformat and otherwise prepare data for the data scientists and analysts. All these activities fall into the category of data engineering. To maximize insights from data and get to value faster, forward-thinking organizations are creating a new role: the data engineer. Data engineering [dat-uh en-juh-neer-ing]: verb. the act of accessing, processing, enriching, cleaning and/ or otherwise orchestrating data analysis 
  8. 8. 8Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O A New Role: Data Engineering So what is data engineering, exactly? And why is it so important? Data engineering is the act of accessing, processing, enriching, cleaning and/or otherwise orchestrating data analysis. Data engineers build tools, infrastructure, frameworks, and services. In smaller companies— where no data infrastructure team has yet been formalized—the data engineering role may also cover the workload around setting up and operat- ing the organization’s data infrastructure. ( Maxime Beauchemin, Airbnb. The Rise of the Data Engineer) Maxime joined Facebook as a business intelligence engineer in 2011 and left as a data engi- neer two years later. The need for more complex, code-based ETL and changing data mod- eling drove the demand for data engineering.7 Even though data engineering alone doesn’t reveal insights, it readies your data to be analyzed reliably. Without it, there’s no possibility for meaningful analysis or data science.
  9. 9. 9Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Data ScientistsData Engineers Prepare data for analysis Process raw data Function behind the scenes Build infrastructure to consolidate and enrich numerous data sets Handle large-scale data processing Monitor and maintain systems Probe for insights Deliver results to business users Apply machine learning, algorithms and other analytics approaches Uncover meaning in large amounts of data Articulate analysis, often visually Interpret results of analysis In simple terms, data engineers and data scientists work together like this: When both data engineering and data science are priorities for an organization, getting more mature with data is inevitable.
  10. 10. 10Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Data Maturity Goals In considering how to become more mature with data, it can be helpful to look to practical examples of companies who have done it well. Airbnb is near the summit of the data maturity mountain. It’s reached heights most companies can’t yet fathom—heights to the tune of $3.5 billion in projected earnings in 2020, which exceeds the bottom lines of 85% of Fortune 500 companies.8 For them, data engineering isn’t a black box; it’s cultural.9 Access to data and the ability to contribute to business logic have been democratized. As the company’s size and reach (and number of employees) increased, so did its available data sets. Making the right data available across the organization required strategic data engineering. First, Airbnb established what they called “Core Data,” a single source of truth for everyone. To do this, they created Airflow, a workflow management system that programmatically authors, schedules and monitors dependency-based data pipelines, without running unnecessarily. This technology allows them to schedule all their data to flow to a single data-space.10 They also built a data portal for employees, a “search and discovery tool” through which they can pull the numbers they need on their own. It puts the power of real-time data analytics into the hands of everyone working to make the company successful. Now everyday decision-makers have access to information on the spot, but at the same time, a data engineering team maintains quality control by managing data warehousing, enhancing the performance of core data infrastructure, integrating data flow between systems and tools and looking for new ways to automate their tasks.11 Airbnb is near the summit of the data maturity mountain. WIth $3.5 billion in projected earnings, what do they do differently? Democratize data. How? A single source of truth that is searchable for everyone and a “Data University” to make sure everyone knows how to use it.
  11. 11. 11Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Of course, even the most reliable data portal is only as good as it is useful, so the Airbnb data science team went a step further and tracked the weekly active users (WAUs) logging into the portal, then created a “Data University” with courses to teach those employees how to use the portal and mine the data it holds.12 This has allowed the company to operate under a philosophy of data democratization, giving every employee access to up- to-date data and the power to make decisions based on that data. And all of that happens without an Airbnb data scientist in every department because each employee is empowered at a larger scale to find and use data—they also understand exactly how to do that thanks to the Data University. Now, 45% of Airbnb employees are WAUs, and that particular economy of scale has eliminated an information bottleneck and freed up the data science team to focus on the most pressing problems. Airbnb is far from the only company to understand the appeal of data democratization. Other tech giants like Facebook have pioneered the trend, but many others are jumping on board— companies like Finish Line 13 ,Chobani14 and even the government 15 . TL;DR *Some practical steps Airbnb took to get to the summit • Hired a data engineer • Consolidated all data in one place • Made data fully accessible • Taught their employees to query • Allowed multiple content authors • Took action based on data • Watched revenue grow *Though this guide doesn’t get technical, if you’re wondering how data flows, Airbnb uses Apache Airflow, a workflow management system. 
  12. 12. Starting to Climb Implementing a world-class culture of data engineering within your company requires scaling the data maturity mountain. If that seems daunting, take heart: remember that 96% of companies are not maximizing their da- ta’s value. There are many points in between the base camp and the summit, and organizations can pick up and move to the next campsite anytime. The first step is determining where you stand now: 0.0 Camp Flying Blind Data initiatives are most likely not a priority for you, which means you’re probably not reading this. 1.0 Camp Frustrated You collect data, but probably aren’t sure how to extract actionable business intelligence from it. 2.0 Camp In Control Here, you’re using some tools to aggregate data and likely understand how to access the information you need for your role. But you’re not totally sure it’s reliable and have no idea what other teams are doing. 3.0 Camp Activated With connected data, you’re looking for new and relevant data sets that you can plug in for even greater insights. You’ve got basic algorithms in place and are starting to explore data science. But you’re spending more time preparing data for analytics than analyzing it. 0.0 Flying Blind 1.0 Frustrated 2.0 In Control CompetitiveAdvantage 12A S T R O N O M E R . I O 3.0 Activated
  13. 13. Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O 4.0 Camp Intelligent At this stage, you offer data visualization in several forms across your organization and rely on predictive analytics—and maybe machine learning and artificial intelligence (AI) technology. You’re probably enabling better data science through intentional, improved data engineering. 5.0 Camp Insane - Summit Your organization is devoted to data engineering or data science, and insights drive and de- fine every decision you make for your business. To enable that, there is a single source of truth that is accessible to everyone. Anyone from marketers to data scientists can contribute back to business logic. If you’re not exactly sure which camp you’re in, take the 60-second self-assessment. astronomer.io/data-assessment No matter where you’ve mapped yourself, remember: very few businesses have reached the summit of “Insane”—and few are still stuck in the doldrums “flying blind” at the zero spot—so it’s fair to assume that your business’s data strategy, and that of your biggest competitors, is somewhere in between these two extremes. And that’s a good thing; it means you can scale up whenever you like. 4.0 Intelligent 5.0 Insane Mode! A S T R O N O M E R . I O 13
  14. 14. 14Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Next Steps Like Stephen Covey says, begin with the end in mind. If Airbnb’s culture of data engineering represents the summit, here’s a checklist of steps to getting there:  Read this guide  Commit to getting value from your data  Consider hiring a data scientist  Create a data engineering capability in your organization This is where Astronomer can help!  Consolidate all data in one place  Route data to give decision-makers full access  Teach them to query (if necessary)  Empower business users to contribute to core tables  Once you trust and understand the data, probe for insights  Take action  Grow your revenue! How does Astronomer fit in? The rapid, agile, secure data routing and prep required for this to-do list relies on specialized tools. For Airbnb, that’s Apache Airflow. Astronomer’s data engineering platform incorporates all the strength of Apache Airflow with all the power of Astronomer to empower teams to con- struct the data infrastructure they need for cross-organizational data democratization. Astronomer’s data engineering platform streamlines and amplifies your data engineering capabilities. ✔
  15. 15. 15Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Connect and Route Your Data with Astronomer Astronomer is a data engineering platform that connects data from legacy systems, BI tools, databases and other sources—and routes it where it can be analyzed. Astronomer offers complete customizability through its use of open-source software, including Airbnb’s Apache Airflow, and offers both a library of standard data pipelines and full access to developers to write custom pipelines, defined as code. A business user can set up a standard pipe, like sending Facebook Ads to Redshift, in minutes. Or a data scientist, analyst or data engineer can author, schedule and monitor their own dependen- cy-based data pipelines to centralize and route data from analytics tools, legacy systems, apps and more. Whatever camp you’re currently in, Astronomer meets you where you are and helps you get ahead.
  16. 16. 16Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Conclusion (TL;DR) • Digital Darwinism threatens every organization. • For most companies, data is a mess. • There is a shortage of folks with the skills to deal with data. • Companies who get ahead now have a serious advantage. • Getting ahead looks like: 1. making data engineering a priority. 2. consolidating data into a single source of truth. 3. democratizing data for the entire organization.
  17. 17. 17Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O About Astronomer Since our beginning in 2015, we have said we are with the machines. We believe the future of work looks like machines + humans operating in their respective strengths and accomplishing more, together. By assembling a world-class team of data engineers to program machines to connect, process and route large amounts of data, we free humans up to do what they do best: analyze data to discover insights and make essential decisions. Learn more at astronomer.io or connect with us at humans@astronomer.io.
  18. 18. 18Created by Astronomer, Inc. 2017 A S T R O N O M E R . I O Sources 1. “Big Prize in Amazon-Whole Foods Deal: Data” by Laura Stevens and Heather Haddon, Wall Street Journal, 2017, astrnmr.co/2uTXNdc 2. “The Value of Big Data: How analytics differentiates winners” by Rasmus Wegener and Velu Sinha, Bain & Company, 2013, astrnmr.co/2uTRE0y 3. “Big data: The Next Frontier for Innovation, Competition and Productivity” by James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh and Angela Hung Byers, McKinsey and Com- pany, 2011, astrnmr.co/2sPDMrK 4. “Market Guide for Self-Service Data Preparation” by Rita L. Sallam et al, Gartner, 2016, astrnmr.co/2tzriSo 5. “Data Scientist: The Sexiest Job of the 21st Century” by Thomas H. Davenport and D.J. Patil, Harvard Business Review, 2012, astrnmr.co/2syVbAW 6. “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says” by Gil Press, Forbes, 2016, astrnmr.co/2uzVgWx 7. “The Rise of Data Engineering” by Maxime Beauchemin, 2017, astrnmr.co/2uTRiqV 8. “Airbnb’s Profits to Top $3 Billion by 2020” by Leigh Gallagher, Fortune, 2017, astrnmr.co/2syKtKR 9. “Democratizing Data at Airbnb” by Chris Williams, Eli Brumbaugh, Jeff Feng, John Bodley, and Michelle Thom- as, Airbnb, 2017, astrnmr.co/2uzEt5V 10. “Airflow: A Workflow Management Platform” by Maxime Beauchemin, Airbnb, 2015, astrnmr.co/2uA286c 11. “How Airbnb Democratized Data” by Olivia Timson, Innovation Enterprise, 2016, astrnmr.co/2sPjEpI 12. “How Airbnb Democratizes Data with Data University” by Jeff Feng, Erin Coffman and Elena Grewal, Airbnb, 2017 https://astrnmr.co/2v2hY8F 13. “The Value of Democratizing Data” by Samuel Greengard, Baseline, 2015, astrnmr.co/2vBVVcn 14. “How Data Democratization Can Deliver a Healthy Breakfast” by Errol Apostolopoulos, DataInformed, 2016,astrnmr.co/2vBsffB 15. “Democratizing Big Data to Bring Government Ahead of the Curve” by Quinton Alsbury, Wired, astrnmr. co/2vB4Uuf

×