Big data trends_problems_v2

•Télécharger en tant que PPTX, PDF•

1 j'aime•490 vues

Big Data is going to explore - from 5 exabyte in 2010-11 to 50 Zettabyte in 2020. What will be things that will enable this? What will be data sources that will contribute to this? What problems we need to solve to enable this?

Technologie

Size of all
Internet data
2020

Size of all
Internet data
2011

Size of all
Internet data
2020

Size of all
Internet data
2011 70% Packaged
Goods Media

User Generated Content (UGC)

Source - DOMO

what will enable
growth of user
generated content?

 network bandwidth
 cheap storage
 cheap compute power
 user friendly devices

 software defined network
 google fiber
 innovations related to:-
Switches
Routers
Packets size
compressions

• cheap storage - a forcing function
• storage companies provide free
storage
• in return, they have access to user
data
• raw data is turned into boutique data
• sold at premium to interested
companies and advertisers

• Innovations on rack space
• cheap, baremetal hardware
• lowers TCO of servers
• operational tasks become easier
• allows companies to offer cloud

 buttons free
 WYSIWYS(tore)
 connectivity – most important
and a “given”
 tendency to track family

 10 TB of Data/Engine/30 minutes
 6 hour flight from NY to LA for Twin Engine
737 = 240 TB of Data/flight
 28,537 Airliners in US Skies/day
 6.5 Exabytes (6688 Petabytes/day)

“
……..within the next five years, sensor
data will hit the crossover point with
unstructured data generated by social
media. From there, the sensor data will
dominate by factors 10-to-20 times that of
social media……

”
- Stephen Brobst, CTO, Teradata

Pic – coolarcade.org

• ~225 million seventh-generation game consoles sold worldwide by early 2012
• ~700 million Wii games,
• 425 million PlayStation 3 games
• 600 million Xbox 360 games.

GPS data

Innovations in Transportation
Applications
Multiple sources:
•Computers Embedded in
Vehicle
•In-vehicle navigation systems
•Drivers’ cell phones.
•Communication networks
•Third-party data like weather
•Traffic
Pic – www.bmwusa.com

• roads with sensors
• determine traffic patterns
• sustainable ways to route
traffic
• generate data for:-
• law enforcement
• transportation
• insurance companies
• medical agencies

* INTRO – INTelligent ROads – a project of European Commission

• curated content
• mashed content (pinterest like)
• blogs
• videos (own shows, personal videos,
etc)
• pics
• collaboration – emails/IMs/ “Likes” etc
• microblogs (twitter like)

3 V’s*
• volume

• velocity

• variety
* coined by Doug Laney of Gartner Inc

3 I’s
• immediate – do something now!!

• intimidating – what if you don’t?

• ill-defined – what is it, anyway
- Vance Loiselle, CEO, Sumo Logic

• near real time
• new data sources
• mobile
• immediately actionable
• big
• agile
• core of business

• data scientists lead the “Data Orchestra”
• developers/product mgrs/DBAs/Ops will
merge
• Data Techs will emerge
• “behavior”, “intent” and “thought” targeting
• hourly trends will be considered “Jurassic”
old

• store Exabytes (Petabytes)
• huge compression ratio (80% compression)
• cheap storage (~ 10 cents/GB/month)
• MTTF rate (High failure 8%)
• distributed storage
• storage over software defined networking
• read compressed data
• ETL

• servers and storage merge?
• special CPUs to handle compression?
• encryption?
• better cpu
• bus speed

• understand data
• analytical skills
• discover new ways of looking at data
• new containers for data warehouses
incldg data warehouses on cloud
• backup and recovery (should not be
an issue)

Recommandé

Large scale data analytics for smart cities and related use casesPayamBarnaghi

Smart City from the Data PerspectiveCharles Mok

CityPulse: Large-scale data analysis for smart city applications PayamBarnaghi

Big Data for Smart CityKoltiva

Dealing with Data Diversity in a Smart City Data HubMathieu d'Aquin

A Brief History of Big DataBernard Marr

The big story (BIG DATA)Tricon Infotech

Data, Big Data and Communication - Ki-byoung KimCreative Commons Korea

Recommandé

Large scale data analytics for smart cities and related use casesPayamBarnaghi

Smart City from the Data PerspectiveCharles Mok

CityPulse: Large-scale data analysis for smart city applications PayamBarnaghi

Big Data for Smart CityKoltiva

Dealing with Data Diversity in a Smart City Data HubMathieu d'Aquin

A Brief History of Big DataBernard Marr

The big story (BIG DATA)Tricon Infotech

Data, Big Data and Communication - Ki-byoung KimCreative Commons Korea

Open data for smart citiesSören Auer

NDGISUC2017 - Understanding the Internet of Things, Data Explosion and GIS An...North Dakota GIS Hub

Big data : Coudbells.comCloudbells.com

Banji Adenusi - big data prezzie - InfoSciBanji Adenusi

Project overview big data europeSören Auer

Big dataahmed mokhtar

SC6 Workshop 1: What can big data do for you? BigData_Europe

Mobile & Big DataSuzzicks

Collab Space DC Open DataAlexander Howard

5. global forum sg grumbachGlobalForum

Ppt shark global forum session 3 2012 v4GlobalForum

Steven adler ibm big data predictionsGlobalForum

SC4 Workshop 1: Dave Marples: Role of social media in transport BigData_Europe

On the Political Economy of Big Data: Some Ethical ConsiderationsDavid Bieri

A Noble, Logical Diagram: SharePoint & the Power of a Global Platform by Lewi...Dux Raymond Sy

Apps 4 ghent Rosseau Bart

Brief History Of Big DataTyrone Systems

The implications of Big Data for BTS and COSGeorge Kershoff

The Rise of Enterprise Data Stories in Data Visualization by Erik Laurijssen ...Patrick Van Renterghem

04022021 Miapetra Kumpula-Natri: A Human-centric Data strategy and sustainabl...Sitra / Hyvinvointi

Webtrends Konversionsoptimierungecomplexx

Engage 2013 - Multi Channel Data CollectionWebtrends

Contenu connexe

Tendances

Open data for smart citiesSören Auer

NDGISUC2017 - Understanding the Internet of Things, Data Explosion and GIS An...North Dakota GIS Hub

Big data : Coudbells.comCloudbells.com

Banji Adenusi - big data prezzie - InfoSciBanji Adenusi

Project overview big data europeSören Auer

Big dataahmed mokhtar

SC6 Workshop 1: What can big data do for you? BigData_Europe

Mobile & Big DataSuzzicks

Collab Space DC Open DataAlexander Howard

5. global forum sg grumbachGlobalForum

Ppt shark global forum session 3 2012 v4GlobalForum

Steven adler ibm big data predictionsGlobalForum

SC4 Workshop 1: Dave Marples: Role of social media in transport BigData_Europe

On the Political Economy of Big Data: Some Ethical ConsiderationsDavid Bieri

A Noble, Logical Diagram: SharePoint & the Power of a Global Platform by Lewi...Dux Raymond Sy

Apps 4 ghent Rosseau Bart

Brief History Of Big DataTyrone Systems

The implications of Big Data for BTS and COSGeorge Kershoff

The Rise of Enterprise Data Stories in Data Visualization by Erik Laurijssen ...Patrick Van Renterghem

04022021 Miapetra Kumpula-Natri: A Human-centric Data strategy and sustainabl...Sitra / Hyvinvointi

Tendances (20)

Open data for smart cities

NDGISUC2017 - Understanding the Internet of Things, Data Explosion and GIS An...

Big data : Coudbells.com

Banji Adenusi - big data prezzie - InfoSci

Project overview big data europe

Big data

SC6 Workshop 1: What can big data do for you?

Mobile & Big Data

Collab Space DC Open Data

5. global forum sg grumbach

Ppt shark global forum session 3 2012 v4

Steven adler ibm big data predictions

SC4 Workshop 1: Dave Marples: Role of social media in transport

On the Political Economy of Big Data: Some Ethical Considerations

A Noble, Logical Diagram: SharePoint & the Power of a Global Platform by Lewi...

Apps 4 ghent

Brief History Of Big Data

The implications of Big Data for BTS and COS

The Rise of Enterprise Data Stories in Data Visualization by Erik Laurijssen ...

04022021 Miapetra Kumpula-Natri: A Human-centric Data strategy and sustainabl...

En vedette

Webtrends Konversionsoptimierungecomplexx

Engage 2013 - Multi Channel Data CollectionWebtrends

Ad technology101 v8Satish Mehta

Thom Point of View on SegmentationPieterDuron

Design patterns - ICIN 2010steccami

Emakina Academy 5 - Know your audience - WebtrendsEmakina

En vedette (6)

Webtrends Konversionsoptimierung

Engage 2013 - Multi Channel Data Collection

Ad technology101 v8

Thom Point of View on Segmentation

Design patterns - ICIN 2010

Emakina Academy 5 - Know your audience - Webtrends

Similaire à Big data trends_problems_v2

Data science and Artificial IntelligenceSuman Srinivasan

Internet of ThingsAniekan Akpaffiong

Barga ACM DEBS 2013 KeynoteRoger Barga

Bigdatappt 140225061440-phpapp01nayanbhatia2

ppt final.pptxkalai75

Big data pptNasrin Hussain

Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris

Kartikey tripathiKARTIKEY TRIPATHI

SECON'2016. Семенченко Антон, Как тренды в Мобильной разработке и Интернете в...SECON

Special issues on big dataVedanand Singh

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...Geoffrey Fox

Bigdata " new level"Vamshikrishna Goud

Big_Data_ppt[1] (1).pptxTanguturiAvinash

BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxtangyechloe

Разработка и тестирование интернета вещей. Тренды индустрииcorehard_by

Internet of Things Presentation to Los Angeles CTO ForumFred Thiel

Harnessing Big Data_UCLAPaul Barsch

Big dataSaraRao3

Big dataMahmudul Alam

GK NU CS 101 Session 1B (1).pptPiyushRanjan269184

Similaire à Big data trends_problems_v2 (20)

Data science and Artificial Intelligence

Internet of Things

Barga ACM DEBS 2013 Keynote

Bigdatappt 140225061440-phpapp01

ppt final.pptx

Big data ppt

Content1. Introduction2. What is Big Data3. Characte.docx

Kartikey tripathi

SECON'2016. Семенченко Антон, Как тренды в Мобильной разработке и Интернете в...

Special issues on big data

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...

Bigdata " new level"

Big_Data_ppt[1] (1).pptx

BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx

Разработка и тестирование интернета вещей. Тренды индустрии

Internet of Things Presentation to Los Angeles CTO Forum

Harnessing Big Data_UCLA

Big data

GK NU CS 101 Session 1B (1).ppt

Dernier

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

GenCyber Cyber Security Day PresentationMichael W. Hawkins

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Artificial Intelligence: Facts and MythsJoaquim Jorge

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

How to convert PDF to text with Nanonetsnaman860154

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Real Time Object Detection Using Open CVKhem

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount

GenCyber Cyber Security Day Presentation

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Advantages of Hiring UIUX Design Service Providers for Your Business

Artificial Intelligence: Facts and Myths

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

The Codex of Business Writing Software for Real-World Solutions 2.pptx

How to convert PDF to text with Nanonets

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Axa Assurance Maroc - Insurer Innovation Award 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Handwritten Text Recognition for manuscripts and early printed texts

A Domino Admins Adventures (Engage 2024)

Real Time Object Detection Using Open CV

🐬 The future of MySQL is Postgres 🐘

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Big data trends_problems_v2

1. Nov 05, 2012

2. Size of all Internet data 2020 Size of all Internet data 2011

3. Size of all Internet data 2020 Size of all Internet data 2011 70% Packaged Goods Media

4. a future view…… 90% UGC/Sensor

5. User Generated Content (UGC) Source - DOMO

6. what will enable growth of user generated content?

7. some enabling technologies…….

8.  network bandwidth  cheap storage  cheap compute power  user friendly devices

9.  software defined network  google fiber  innovations related to:- Switches Routers Packets size compressions

10. • cheap storage - a forcing function • storage companies provide free storage • in return, they have access to user data • raw data is turned into boutique data • sold at premium to interested companies and advertisers

11. • Innovations on rack space • cheap, baremetal hardware • lowers TCO of servers • operational tasks become easier • allows companies to offer cloud

12.  buttons free  WYSIWYS(tore)  connectivity – most important and a “given”  tendency to track family

13.

14.

15.  10 TB of Data/Engine/30 minutes  6 hour flight from NY to LA for Twin Engine 737 = 240 TB of Data/flight  28,537 Airliners in US Skies/day  6.5 Exabytes (6688 Petabytes/day)

16. “ ……..within the next five years, sensor data will hit the crossover point with unstructured data generated by social media. From there, the sensor data will dominate by factors 10-to-20 times that of social media…… ” - Stephen Brobst, CTO, Teradata

17. Pic – coolarcade.org • ~225 million seventh-generation game consoles sold worldwide by early 2012 • ~700 million Wii games, • 425 million PlayStation 3 games • 600 million Xbox 360 games.

18. GPS data Innovations in Transportation Applications Multiple sources: •Computers Embedded in Vehicle •In-vehicle navigation systems •Drivers’ cell phones. •Communication networks •Third-party data like weather •Traffic Pic – www.bmwusa.com

19. • roads with sensors • determine traffic patterns • sustainable ways to route traffic • generate data for:- • law enforcement • transportation • insurance companies • medical agencies * INTRO – INTelligent ROads – a project of European Commission

20.

21. • curated content • mashed content (pinterest like) • blogs • videos (own shows, personal videos, etc) • pics • collaboration – emails/IMs/ “Likes” etc • microblogs (twitter like)

22.

23.

24.

25. Pic source – bigdatabytes.com

26.

27. 3 V’s* • volume • velocity • variety * coined by Doug Laney of Gartner Inc

28. 3 I’s • immediate – do something now!! • intimidating – what if you don’t? • ill-defined – what is it, anyway - Vance Loiselle, CEO, Sumo Logic

29.

30.

31. • near real time • new data sources • mobile • immediately actionable • big • agile • core of business

32. • data scientists lead the “Data Orchestra” • developers/product mgrs/DBAs/Ops will merge • Data Techs will emerge • “behavior”, “intent” and “thought” targeting • hourly trends will be considered “Jurassic” old

33. problems….

34. • store Exabytes (Petabytes) • huge compression ratio (80% compression) • cheap storage (~ 10 cents/GB/month) • MTTF rate (High failure 8%) • distributed storage • storage over software defined networking • read compressed data • ETL

35. • servers and storage merge? • special CPUs to handle compression? • encryption? • better cpu • bus speed

36. • understand data • analytical skills • discover new ways of looking at data • new containers for data warehouses incldg data warehouses on cloud • backup and recovery (should not be an issue)

Notes de l'éditeur

User generated content – this term was coined in sometime in 2005– also called conversational media as opposed to Packaged Goods Media. It also goes by name of Performance Media. This is the kind of media that has been labeled, somewhat hastily and often derisively, as “User Generated Content,” “Social Media,” or “Consumer Content.” UGC has its fair share of legal and copyrightissues but UGC
Google Glasses type devices
http://fiber.google.com/about/
Non-connected will be unheard of
Jet generates 10 TB of Data/Engine/30 mins6 hour flight from NY to LA for Twin Engine 737 – 240 TB of Data/flight28,537 Airliners in US Skies/day 6.5 Exabytes (6688 Petabytes/day)
Brobst says within the next five years, ……sensor data will hit the crossover point with unstructured data generated by social media. From there, the sensor data will dominate by factors 10-to-20 times that of social media. However, using this data will be difficult for the time being, as there are no standards to ensure the data’s readability beyond those possessing the right software or algorithm. There’s also a question of who owns the data.
Meanwhile, approximately 225 million seventh-generation game consoles (referring to recent units on the market like the Sony PlayStation 3) had been sold worldwide by early 2012, along with about 700 million Wii games, 425 million PlayStation 3 games and 600 million Xbox 360 games. In fact, the global games industry, including hardware and software, had reached the $63 billion per year range. While the global recession of 2008-09 was hard on the games industry, new games and enhanced console technology have put life back into the business.Apps, including those for magazines, information services such as health site WebMD, games, newspapers, catalogs and ebook readers, to name but a few of the tens of thousands of uses, didn’t really exist before the introduction of the iconic iPhonesmartphone a few years ago. For 2011, Gartner estimated global app store revenues at $15.1 billion. That was only an early stage in this soaring business sector. For example, the Apple iTunes App Store launched in July 2008 with only about 500 apps available. By early 2012, Apple had more than 500,000 apps for sale in the iTunes App Store. Analysts at Gartner estimated that 4.5 billion apps were downloaded in 2010 and 17.7 billion in 2011. Gartner predicted volume to grow to 185 billion downloads by 2014 that will produce $58 billion in revenue.By mid 2011, figures for apps for Apple products alone indicated there were at least 85,000 app creators worldwide. By one estimate, 37% of all apps are free downloads, while the average price for paid apps is $3.64.Meanwhile, more than 450,000 apps are also available for the Android mobile phone operating system (the world’s leading smartphone platform), as well as thousands more for the Blackberry and other devices. Android is the mobile operating system developed by Google. On all platforms, the most popular apps include games, such as Angry Birds; tools such as Google Maps; and entertainment and media related apps, such as those for Pandora Internet-based radio and for leading newspapers. At the same time, apps provide tools for business people, travelers, students, hobbyists, wine drinkers, people who like to cook, job seekers, children, sports fans, shoppers, car enthusiasts and myriad other special interest niches.http://www.plunkettresearch.com/games-apps-social-market-research/industry-and-business-data
Description of workThree technical strands of research will be conducted:Surface safety monitoring: integration and testing of real-time warning systems at network level to achieve a significant decrease in the number of accidents due to ‘surprise effects’ from sudden local changes in weather resulting in low friction and hence skiddingincreasing drivers’ attention to low road friction by only a few percent may result in significantly higher reduction of accident rates due to its non-linear relationshipEurope’s most advanced driving simulator will be used to optimise driver responses to new types of information.Traffic and safety monitoring: combination of different sensor data will enable the estimations of entirely new real-time safety parameters and performance indicators to be used in traffic monitoring and early warning systems.Intelligent pavement and intelligent vehicles: innovative use and a combination of new and existing sensor technologies in pavements, bridges and vehicles in order to prevent accidents, enhance traffic flows and significantly extend the lifetimes of existing infrastructurea prolonged lifetime of high capacity roads could thus be obtained using novel methods for early warning detection of deterioration and damage to road surfaces.ResultsDeliverables:Consolidated state of the art focused on the scope of INTRO and focused needs across EuropeReport on scenarios, structure and potential short-term trendsReport on implementation strategiesModel for estimating expectable stopping distancesReport on the simulator study, including evaluation of impact on safety and driversData model for road safety-related dataReport on technical implementation and users’ feedbackDemonstration of methods for the measurement of condition using probe vehiclesReport on the assessment of methods to identify pavement conditions using current and novel in situ sensorsReport on the use of combined probe vehicle and in situ measurements. Proposals for best practice implementationTraffic indicator needs: single source and data fusion estimation modelsIntegration of weather effects for traffic indicators forecastingSafety indicators needs: simulation-based and field-based modelsCreation of a websiteReport on the launch workshop held in June 2005Report: A Vision of Intelligent Roads Final summary reportProject quality assurance planProject mid-term reportProject final reportExploitable product(s) or measure(s:guidelines and recommendations for ITS deployment use in future standardsimplemented data model combining static and dynamic skid warningsnew use of in situ sensors and probe carsnew methods for data fusion and travel time estimationsSectors:road authoritiesITS service providerstraffic management
Car windscreens, train and bus windows, Google glasses, http://www.ted.com/talks/pattie_maes_demos_the_sixth_sense.html, PranavMistry and Pattie Maes TED talk demo
Volume: Data Volume is the primary attribute of “Big Data.” Volume is often quantified in terms of terabytes of data. Anything between 3 – 10 terabytes of data falls within the realm of “Big Data”. In addition, data volume can also be quantified by counting records, transactions, tables, and files. A large number of records, transactions, tables, or files can be categorized as “Big Data.” Volume of data is one of the defining characteristics of “Big Data;” however, data velocity and data variety (highlighted below) constitute the other key characteristics/ingredients of “Big Data.”Velocity: Speed or Velocity of data is another defining characteristic of “Big Data.” Data Velocity encompasses the frequency of data generation and the frequency of data delivery. In today’s hyper-connected and networked society, there is a continuous stream of information coming from a range of devices ranging from sensors and robotics manufacturing machines, to video cameras and mobile gadgets. This ever-increasing amount of data relentlessly flying from devices in real-time is causing data volumes to grow and do so in a hurry. Variety: One thing that makes “Big Data” really big is that it’s coming from a greater variety of sources than ever before. Data from Web sources (i.e., Web logs, clickstreams) and social media is remarkably diverse. RFID data from supply chain applications, text data from call center applications, semi-structured data from various business-to-business processes, and geospatial data in logistics make up an eclectic mix of data types that makes variety/diversity an important attribute characterizing “Big Data.”
The 3 I's Of Big Data+ Comment nowBig Data is:Immediate – in the sense that you need to do something about it nowIntimidating – what if you don’t?Ill-defined – what is it, anyway?This is what Vance Loiselle, CEO of analytics company Sumo Logic recently told me. With a nod to the well-known 3 V’s of Big Data (volume, velocity, and variability), I have coined these the 3 I’s of Big Data.The definition of Big Data may still be up for debate. But with overall corporate data nearly doubling year over year, the number of Facebook users exceeding 900M, and Twitter tweets blowing through 400M per day, two things about Big Data are certain. As Loiselle put it, “Big Data is not going away and it’s only going to get bigger.”So let’s explore the 3 I’s of Big Data. As always, I welcome your comments here and at dave@vcdave.com.1. Ill-defined: What is Big Data?Gartner analyst Doug Laney has characterized Big Data as “data that’s an order of magnitude greater than data you’re accustomed to.”Ed Dumbill, program chair for the O’Reilly Strata Conference, describes Big Data as, “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”Another way to view Big Data is that it’s a transformative set of technological advances that have made analyzing data vastly more efficient.Consumer facing companies like Google and Facebook have driven many of the recent advances in Big Data efficiency. Facebook has some 900 million users and is still growing, while some estimates put the number of search queries Google handles at 3 billion per day. Twitter handles some 400 million tweets per day.In an ironic twist, highlighted by cloud cost management vendorCloudyn, increased efficiency doesn’t drive down usage. It increases it.Known as Jevons Paradox, it’s named for the economist who made the observation about the Industrial Revolution. Similarly, as technological advances make storing and analyzing data more efficient, companies are doing a lot more analysis — not less. This, in a nutshell, is Big Data.2. Intimidating: How do you make Big Data approachable?There are lots of challenges in leveraging Big Data, from managing the data to having the right tools to get you the insights that matter.Fortunately, Big Data Apps are springing up all over the place to make Big Data a lot easier to take advantage of.Companies like Splunk and Sumo Logic are Big Data Apps for machine data. Marketing relevance company BloomReach is another such example. The company processes more than 100 million web pages, generating 94% average annual incremental traffic as a result.3. Immediate: What’s actionable about big data?Technological improvements that increased the efficiency of coal use led to increased consumption of coal in a wide range of industries, fueling the Industrial Revolution. In much the same way, technological advances that are increasing the efficiency of analyzing and storing data are driving a Big Data Revolution:A lot more data is being generated. While humans generate a seemingly large amount of data in the form of photos and emails, that data production is limited by the number of people. That amount of data is dwarfed by “sensor” data generated by machines–data from computers and network devices, from airplanes, from cell phones, and from connected GPS devices, for example. And high bandwidth wireless networks are now in place to transport that data back to data centers for storage and analysis.Technologies created by companies serving an unprecedented number of consumers have driven efficiencies in how data is stored and analyzed. You now have the ability to store and analyze vastly more data than you could in the past.You can setup your own computer resources to store and analyze data, but the availability of scaleable cloud computing resources like Amazon Web Services means you can access the resources necessary to do large scale data analysis quickly and easily.The next step in making big data actionable is to make Big Data truly immediate by reducing the time between when data is collected and when you get insights from that data. As J. Andrew Rogers, founder and CTO of Space Curve put it, “the analytic value of data decays rapidly.” That means being able to analyze your data as fast as possible is critical to gaining competitive advantage.Educate. This phase focuses on knowledge gathering and market observations.Explore. After completing the education phase, companies will develop a strategy and roadmap based on business needs and challenges.Engage. During the third phase, a business will pilot big data initiatives to validate value and requirements.Execute. Companies in the fourth phase have deployed two or more big data initiatives and are continuing to apply advanced analytics.
Store Exabytes (Petaytes)Huge compression ratio (80% compression)Cheap storage (~ 10 cents/month)MTTF rate (High failure 0.88%)Distributed Storage Storage over Software defined networkingRrecent independent studies from Google and Carnegie Mellon University have concluded that disk drive failure rates are considerably higher than the rates reported by disk drive manufacturers. But, it turns out, many users may not care.At a Usenix conference in San Jose, CA, this past February, Google released its study, which found an 8% annual failure rate for drives in service for two years. That's one out of every 12 drives.Manufacturers claim the mean time to failure (MTTF) of Fibre Channel (FC) and SATA drives ranges between 1,000,000 and 1,500,000 hours, suggesting a normal annual failure rate of 0.88%."Typically, this problem does not hit home for me because vendor support contracts offset the cost associated with the drive replacements," says Earl Hartsell, senior IT analyst at Solvay Pharmaceuticals, Marietta, GA. "It would take a relatively large increase in support costs for this problem to become a pain point."Similarly, Mark Holt, information technology specialist at Media General in Richmond, VA, says failure rates help manufacturers control support costs, but don't mean much to users. "We have very little interest in that magic number," says Holt. "The complexity of systems means a failure generally isn't worth chasing down; we only want to know if the vendor or supplier is going to be there quickly when we do lose a drive, for whatever reason."Carnegie Mellon's study of approximately 100,000 consumer and enterprise drives