Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×
Big Data
What it is and why it matters
Big data is a term that describes the large
volume of data – both structured and
the now-mainstream definition of big data
as the three Vs:
Volume. Organizations collect data from a
variety of sources, i...
Prochain SlideShare
Capitalizing on Big Data
Capitalizing on Big Data
Chargement dans…3

Consultez-les par la suite

1 sur 29 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Bidata (20)


Plus par Tamojit Das (20)

Plus récents (20)



  2. 2. Big Data What it is and why it matters Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Big Data History and Current Considerations While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated
  3. 3. the now-mainstream definition of big data as the three Vs: Volume. Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden. Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near- real time. Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.
  4. 4. At SAS, we consider two additional dimensions when it comes to big data: Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data. Complexity. Today's data comes from multiple sources, which makes it difficult to link, match, cleanse and transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. Big data’s big potential The amount of data that’s being created and stored on a global level is almost inconceivable, and it just keeps growing.
  5. 5. That means there’s even more potential to glean key insights from business information – yet only a small percentage of data is actually analyzed. What does that mean for businesses? How can they make better use of the raw information that flows into their organizations every day? Why Is Big Data Important? The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:  Determining root causes of failures, issues and defects in near-real time.
  6. 6.  Generating coupons at the point of sale based on the customer’s buying habits.  Recalculating entire risk portfolios in minutes.  Detecting fraudulent behavior before it affects your organization. Why is Big Data so Important in Today’s World? There is no doubt that the industries are going ablaze with the huge eruption of data. None of the sectors have remained untouched of this drastic change in a decade. Technology has crept inside each business arena and hence, it has become an essential part of every processing unit. Talking about IT industry specifically, software and automation are the bare essential terms and are used in each and every phase of a process cycle.
  7. 7. Businesses are focusing more on agility and innovation rather than stability and adopting the big data technologies help the companies achieve that in no time. Big data analytics has not only allowed the firms to stay updated with the changing dynamics but has also let them predict the future trends giving a competitive edge. What is driving the widespread adoption of big data across the industries? Let's find out the reasons behind all the hype of big data- Firms witnessing surprising growth Needless to say that Big Data is taking the world by storm through its countless benefits. Big Data is allowing the leading firms like IBM, Amazon, to develop some of the cutting-edge technologies providing high-end services to their customers. "Orchestrating Big Data, Cloud and Mobility strategies leads to 53% greater growth than
  8. 8. peers not adopting these technologies." - Forbes. A survey conducted by Dell a surprising revealed that the companies which were using Big Data, Cloud and Mobility are way ahead, i.e., 53%, of the companies which are late adopters or the non- adopters. Though Big Data is still in its nascent stage, but it is giving 50% more revenue to the companies who have integrated this concept into their processes. This clearly highlights that Big Data is acting as oil or coal for the staggering performance of today's businesses. Getting a hold on the dark data A renowned consulting firm Gartner Inc. describes dark data as the “information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes.”
  9. 9. However, the advent of Big Data systems has allowed the companies to put the untouched data into use and extract meaningful insights from it. Much to everyone’s amazement, the data which was left unrecognized or considered useless in the past has suddenly become a goldmine for the companies. The companies can accelerate their processes and thus reduce their operating costs at the end, all thanks to big data analytics. Software is eating the world We are currently in a data-driven economy where no organization can survive without analyzing the current and future trends. Whether it is a manufacturing firm or a retail chain, wrangling data has become a crucial job to be done before taking a single step further. As customers hold the central point for any organization, it becomes highly imperative to address their needs on time. This could be possible only if the business strategies
  10. 10. are backed by strong software to support and accelerate the business operations. This ultimately fuels the need for powerful big data technologies that can benefit the organizations in numerous ways possible. Smarter and faster decision-making In this era of fierce competition, everyone wants to stand out from the crowd. But the question is HOW??? How will the companies be perceived as unique despite having the same operations as others in the industry? The answer lies in the practices adopted by the firms. In order to perform better than the competitors, the ability to make good and intelligent decisions play a pivotal role in every step. The decisions should not only be the good ones, but should be made smartly and as quickly as possible to allow the companies to remain proactive in their approach instead of being reactive. The practice of implementing Big Data analytics into the process sheds light
  11. 11. on the unstructured data in such a way that helps the managers analyze their decisions in a systematic manner and take the alternative approach as and when needed. Customer-centricity is the new policy Now that customers have the opportunity to shop anywhere and anytime, it has turned into a challenge for the companies to make every interaction better than the previous one with the help of relevant information. But how will the companies do it on a continuous basis? The answer is “Big Data”. The customer dynamics are ever-changing and so should be the strategies of marketers accordingly. The companies can become more responsive by incorporating the past as well as the real- time data to assess the taste and preferences of the customers. For example, Amazon has grown from a product-based company to a big market player comprising of 152 million customers
  12. 12. by leveraging the abilities of powerful big data engines. Amazon aims to delight customers by tracking their buying trend and providing marketers all the related information they need instantly. Moreover, Amazon successfully fulfills the needs of its customers by monitoring 1.5 billion products across the world in real-time. Value generation through leveraging data silos The companies are getting bigger and hence different processes generate varied data. Many of the important information sulk in the data silos remain inaccessible. However companies have been able to dig this mountain with the weapon called big data analytics that has let the analysts and engineers drill deep and come out with fresh and informative insights. After this discussion, one thing is for sure that this is just the beginning of a highly digitized and technology-driven era
  13. 13. revolving around the powerful real-time big data analytics. How Midsized Businesses Can Take Advantage of Big Data More and more midsized businesses are taking a serious look at data visualization because it makes it easier to identify insights in the huge amounts of data that can now be tapped. With tight budgets, limited IT resources, and (for the most part) no highly trained data analysts on staff, many midsized companies aren’t sure where to begin. This white paper offers some practical tips on how to get started with data visualization and how best to succeed. It also covers some specific business functions where visualizing and analyzing data can deliver results.
  14. 14. A Non-Geek’s Big Data Playbook This paper examines how a non-geek yet technically savvy business professional can understand how to use Hadoop – and how it will affect enterprise data environments for years to come. Big data and data mining Data mining expert Jared Dean wrote the book on data mining. He explains how to maximize your analytics program using high-performance computing and advanced analytics. Adding Hadoop to your Big Data Mix? The Big Data Hadoop Architect Master's Program is a structured learning path recommended by leading industry experts and designed to give you an in-depth education on Big Data technologies such as Hadoop, Spark, Scala, MongoDB/Cassandra, Kafka, Impala, and Storm.
  15. 15. The program begins with Big Data Hadoop and Spark developer course to provide a solid foundation in the Big Data Hadoop framework, then moves on to Apache Spark and Scala to give you an in-depth understanding of real time processing. Finally, you will get an introduction to the concepts of NoSQL database technology, where you will choose between Apache Cassandra and Mongo DB. We have included Apache Storm, Kafka, and Impala as electives to help you gain a further edge. You will get access to 100+ live instructor- led online classrooms, 90+ hours of self- paced video content, 7+ industry-based projects to be practiced on CloudLabs/virtual machines, 10+ simulation exams, a community moderated by experts, and other resources that ensure you follow the optimal path to your dream role as Big Data Hadoop architect.
  16. 16. Who uses big data? Big data affects organizations across practically every industry. See how each industry can benefit from this onslaught of information. Banking With large amounts of information streaming in from countless sources, banks are faced with finding new and innovative ways to manage big data. While it’s important to understand customers and boost their satisfaction, it’s equally important to minimize risk and fraud while maintaining regulatory compliance. Big data brings big insights, but it also requires financial institutions to stay one step ahead of the game with advanced analytics.
  17. 17. Education Educators armed with data-driven insight can make a significant impact on school systems, students and curriculums. By analyzing big data, they can identify at-risk students, make sure students are making adequate progress, and can implement a better system for evaluation and support of teachers and principals. Government When government agencies are able to harness and apply analytics to their big data, they gain significant ground when it comes to managing utilities, running agencies, dealing with traffic congestion or preventing crime. But while there are many advantages to big data, governments must also address issues of transparency and privacy.
  18. 18. Health Care Patient records. Treatment plans. Prescription information. When it comes to health care, everything needs to be done quickly, accurately – and, in some cases, with enough transparency to satisfy stringent industry regulations. When big data is managed effectively, health care providers can uncover hidden insights that improve patient care. Manufacturing Armed with insight that big data can provide, manufacturers can boost quality and output while minimizing waste – processes that are key in today’s highly competitive market. More and more manufacturers are working in an analytics-based culture, which means they can solve problems faster and make more agile business decisions.
  19. 19. Retail Customer relationship building is critical to the retail industry – and the best way to manage that is to manage big data. Retailers need to know the best way to market to customers, the most effective way to handle transactions, and the most strategic way to bring back lapsed business. Big data remains at the heart of all those things. Big data in action: UPS As a company with many pieces and parts constantly in motion, UPS stores a large amount of data – much of which comes from sensors in its vehicles. That data not only monitors daily performance, but also triggered a major redesign of UPS drivers' route structures. The initiative was called ORION (On-Road Integration Optimization and Navigation), and was arguably the world's largest operations research project.
  20. 20. It relied heavily on online map data to reconfigure a driver's pickups and drop- offs in real time. The project led to savings of more than 8.4 million gallons of fuel by cutting 85 million miles off of daily routes. UPS estimates that saving only one daily mile per driver saves the company $30 million, so the overall dollar savings are substantial. It’s important to remember that the primary value from big data comes not from the data in its raw form, but from the processing and analysis of it and the insights, products, and services that emerge from analysis. The sweeping changes in big data technologies and management approaches need to be accompanied by similarly dramatic shifts in how data supports decisions and product/service innovation. data exploration
  21. 21. Data exploration is the first step in data analysis and typically involves summarizing the main characteristics of a dataset. It is commonly conducted using visual analytics tools, but can also be done in more advanced statistical software, such as R. Before a formal data analysis can be conducted, the analyst must know how many cases are in the dataset, what variables are included, how many missing observations there are and what general hypotheses the data is likely to support. An initial exploration of the dataset helps answer these questions by familiarizing analysts about the data with which they are working. Analysts commonly use data visualization software for data exploration because it allows users to quickly and simply view most of the relevant features of their dataset. From this step, users can identify variables that are likely to have interesting
  22. 22. observations. By displaying data graphically -- for example, through scatter plots or bar charts -- users can see if two or more variables correlate and determine if they are good candidates for further in- depth analysis. Data mining project: Exploring data Part of any data mining project is learning about and understanding the nature of your data. By leveraging controls from Office Web Components (OWC), the DSV Designer provides the functionality to explore your data in four different views. By right-clicking a DSV table and selecting Explore Data, you can view your data as a table, pivot table, simple charts, and a pivot chart. By default, the Explore Data component will sample 5,000 points of your data. The option buttons in the upper left of the Explore Data window allow you to change this setting to a maximum of 20,000 points, due to a limitation of the OWC controls.
  23. 23. The tabular views allow you to do a simple exploration of your data. Clever use of the pivot table will allow you to get a better understanding of the data by arranging, slicing, and aggregating your data in different ways. For example, by exploring a pivot chart on the Customers table, you can find the average Age and its standard deviation by using the Bedrooms column we created previously. (See Figure 3.8.) This is possible because we are exploring the DSV table and not the actual source table as it is in the data. We can explore Named Queries in the DSV in precisely the same manner. The graphical exploration offers a page of simple column, pie, and bar charts plus a pivot chart view. Using the simple charts you can see histograms and pies of various attributes side by side. If your data is continuous, the chart divides the continuous range into 10 buckets. The pivot chart, on the contrary, provides a wealth of graphing controls to analyze your
  24. 24. data, from your standard line, bar, scatter, column, and pie charts, to more exotic types such as doughnut and radar charts, as shown in Figure 3.9. The pivot table and chart have many configuration options to help you analyze your data in different ways. Many of these are available through the context-sensitive Command and Options dialog box, from the Context menu, or from embedded toolbars. Virtually every aspect of the tables and charts can be modified, either by graphically selecting the object or by using the selection box on the General tab of the dialog. Describing the full feature set of the OWC could easily fill another book and mastering the OWC controls for best value will take some practice, but with experience you will be able to manipulate the controls to find exactly the right view for you. Additionally, the pivot table and chart are linked, so you can switch back and forth, make edits, and see how the change affected the other view.
  25. 25. One additional feature of the pivot chart that is important for data exploration is graphical named query generation. By clicking the Named Query button on the toolbar, you can use elements of the chart to define a named query. For instance you could select only those homeowners with one bedroom and renters with four or more on the chart and add them to the query. This named query becomes like any other and can be used as a source for exploring data. Note: Although the Explore Data window looks like other document windows, it is, in fact, a tool window like the Solution Explorer and Properties windows. By right-clicking the Window tab you can change the Explore Data window into a floating or dockable window. You can also open up many Explore Data windows on different DSV tables to display charts and tables side by side.
  26. 26. Click here to return to the complete list of book excerpts from Chapter 3, 'Using SQL Server 2005 data mining,' from the book Data Mining with SQL Server 2005. How It Works Before discovering how big data can work for your business, you should first understand where it comes from. The sources for big data generally fall into one of three categories: Streaming data This category includes data that reaches your IT systems from a web of connected devices. You can analyze this data as it arrives and make decisions on what data to keep, what not to keep and what requires further analysis. Social media data The data on social interactions is an increasingly attractive set of information,
  27. 27. particularly for marketing, sales and support functions. It's often in unstructured or semistructured forms, so it poses a unique challenge when it comes to consumption and analysis. Publicly available sources Massive amounts of data are available through open data sources like the US government’s data.gov, the CIA World Factbook or the European Union Open Data Portal. After identifying all the potential sources for data, consider the decisions you’ll need to make once you begin harnessing information. These include: How to store and manage it Whereas storage would have been a problem several years ago, there are now low-cost options for storing data if that’s the best strategy for your business.
  28. 28. How much of it to analyze Some organizations don't exclude any data from their analyses, which is possible with today’s high-performance technologies such as grid computing or in-memory analytics. Another approach is to determine upfront which data is relevant before analyzing it. How to use any insights you uncover The more knowledge you have, the more confident you’ll be in making business decisions. It’s smart to have a strategy in place once you have an abundance of information at hand. The final step in making big data work for your business is to research the technologies that help you make the most of big data and big data analytics. Consider:  Cheap, abundant storage.
  29. 29.  Faster processors.  Affordable open source, distributed big data platforms, such as Hadoop.  Parallel processing, clustering, MPP, virtualization, large grid environments, high connectivity and high throughputs.  Cloud computing and other flexible resource allocation arrangements.