Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Big data
Big data
Chargement dans…3
×

Consultez-les par la suite

1 sur 19 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Big Data (20)

Publicité

Plus récents (20)

Big Data

  1. 1. BIG DATA
  2. 2. What is Big Data?
  3. 3. What is Big Data?  Big data is a term that describes the large volume of data – both structured and unstructured – that overburdens a business on a day-to-day basis.  But it’s not the amount of data that’s important. It’s what organizations do with the data that matters.  Big data can be analyzed for deep understanding that lead to better decisions and strategic business moves.
  4. 4. Big Data History and Current Considerations  While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:  Volume. Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.  Velocity. Data streams in at an extraordinary speed and must be distributed with in a timely manner. sensors and smart metering are driving the need to deal with flows of data in near-real time
  5. 5.  Variety. Data comes in all types of formats – from structured, a data is a specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. Any data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked with in appropriate ways.  Unstructured data comes from information that is not organized or easily interpreted by traditional databases or data models, and typically, it’s text-heavy. Metadata, Twitter tweets, and other social media posts are good examples of unstructured data.
  6. 6. Big Data Challenges  The major challenges associated with big data are as follows:  Capturing data  Storage  Searching  Sharing  Transfer  Analysis  Presentation  To fulfill the above challenges, organizations normally take the help of enterprise servers.
  7. 7. Traditional Approach  In this approach, an enterprise will have a computer to store and process big data. Here data will be stored in an RDBMS like Oracle Database, MS SQL Server or DB2 and sophisticated software's can be written to interact with the database, process the required data and present it to the users for analysis purpose.
  8. 8. Limitation  This approach works well where we have less volume of data that can be accommodated by standard database servers, or up to the limit of the processor which is processing the data.  But when it comes to dealing with huge amounts of data, it is really a tedious task to process such data through a traditional database server.
  9. 9. Google’s Solution  Google solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset.  Below diagram shows various commodity hardwares which could be single CPU machines or servers with higher capacity
  10. 10. Hadoop  Doug Cutting, Mike Cafarella and team took the solution provided by Google and started an Open Source Project called HADOOP in 2005 and Doug named it after his son's toy elephant. Now Apache Hadoop is a registered trademark of the Apache Software Foundation.  Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel on different CPU nodes. In short, Hadoop framework is capable enough to develop applications capable of running on clusters of computers and they could perform complete statistical analysis for a huge amounts of data.
  11. 11. MapReduce  Hadoop MapReduce is a software framework for easily writing applications which process big amounts of data in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.  The term MapReduce actually refers to the following two different tasks that Hadoop programs perform:  The Map Task: This is the first task, which takes input data and converts it into a set of data, where individual elements are broken down into tuples (key/value pairs).  The Reduce Task: This task takes the output from a map task as input and combines those data tuples into a smaller set of tuples. The reduce task is always performed after the map task.  Typically both the input and the output are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
  12. 12. Why Is Big Data Important?  The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:  Determining root causes of failures, issues and defects in near-real time.  Generating coupons at the point of sale based on the customer’s buying habits.  Recalculating entire risk portfolios in minutes.  Detecting fraudulent behavior before it affects your organization
  13. 13. Who uses big data?  Banking  With large amounts of information streaming in from countless sources, banks are faced with finding new and innovative ways to manage big data. While it’s important to understand customers and boost their satisfaction, it’s equally important to minimize risk and fraud while maintaining regulatory compliance. Big data brings big insights, but it also requires financial institutions to stay one step ahead of the game with advanced analytics.  Government  When government agencies are able to harness and apply analytics to their big data, they gain significant ground when it comes to managing utilities, running agencies, dealing with traffic congestion or preventing crime. But while there are many advantages to big data, governments must also address issues of transparency and privacy.
  14. 14.  Education  Educators armed with data-driven insight can make a significant impact on school systems, students and curriculums. By analyzing big data, they can identify at-risk students, make sure students are making adequate progress, and can implement a better system for evaluation and support of teachers and principals.
  15. 15. References  http://www.tutorialspoint.com/hadoop/  http://www.sas.com/en_th/insights/big-data/what-is-big-data.html  http://www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/#371496003487
  16. 16. Thank you

×