There is a flood of information online from tweets,feeds, status updates, photos, government, private, and other
sources. Just how big is “big data”? This presentation will share examples of big and open data in the cloud:where it
comes from, how it’s stored, and what you can do with it. Learn to incorporate real world data online for your
students to analyze using Excel; create data visualizations and infographics, and understand the impact of Data
as a Service as a model for cloud computing.
16. 3 V's
• Volume - amount of data is larger than
those conventional relational database
infrastructures can handle
• Velocity - the rate at which data is
generated, processed and analyzed in
(real) time
• Variety – data formats are unstructured
and inconsistent
19. Walmart
• Walmart collects more than 2.5
petabytes of data every hour from its
customer transactions.
• A petabyte is one quadrillion bytes, or the
equivalent of about 20 million filing
cabinets’ worth of text.
http://hbr.org/2012/10/big-data-the-management-revolution/ar
20. Velocity: Drinking from the Firehose
• Scrutinize 5 million trade events created
each day to identify potential fraud
• Analyze 500 million daily call detail
records in real-time to predict customer
churn faster
22. McKinsey&Company Report (2011)
• Data is part of every
industry and business
function.
• Data creates value.
• Big data becomes a basis
of competition and growth.
• Some sectors will achieve
greater gains.
• Shortage of people with
analytical skills.
• Need policies related to
privacy, security,
ownership.
27. Big Data Technologies
• HADOOP: scalable
storage, parallel
computation
• NoSQL: distributed
querying
28. What this Means
• Change your web page and Google finds it
in minutes.
• Ten years ago, you would have to submit a
request to Yahoo! to reindex your site.
• All you need is a lot of servers.
• Google has a million of them.
• No problem.
48. Mark Frydenberg
mfrydenberg@bentley.edu
cis.bentley.edu/mfrydenberg
CourseMate
Enhanced
Edition
Invite me to your school!
Notes de l'éditeur
6 Degrees of Kevin Bacon, Name is Dumb Luck6 Degrees of Separation – within networks of people or things, there is a theoretical maximum of 6 points between any two nodesThat’s the Bacon IndexBob is 1, Ann is 2, Joe is 3. Index can only get so big because of interconnections.If Kim is connected to Bob, Kim is 2, not 4.
Twitter can’t be structured. Twitter is a bunch of words that humans are the best at parsingAnd so again we’re back to the 3 V’s, Volume, Velocity, and Variety. Not only is twitter’s data disorganized, it handles over 3000 new tweets per secondTwitter is using this data to recommend things to you, and it does it all lightning fast through an engine called Storm
If Amazon can see that lots of people buy forks and knives together, or that people buy curtains and curtain rods together how do they not recommend everyone who has bought a wrench set or a copy of black beauty buy them together if someone else has?This is where things get complicated
Twitter isn’t the only place where unstructured, realtime data is being processed. Facial recognition is a massive big data problemYour iPhone does facial recognition. Facebook does facial recognition. Aperture learns about faces from hundreds of data points and can help you find who is in what photos. Amazing.How do we do this so quickly?
Should it be opt-in only? http://www.code.org/sites/all/themes/codedotorg/logo.png
- Hereis a blood pressure monitor fromiHealththat stores yourblood pressure data in the cloud.
Here’s an appthat monitors yourheart rate fromyourphone’s camera, amazingstuffSo all thiswellness data isnowbeingcollectedubiquitously. How canitbeusedsecurely and effectively to make all of us healthier? This is the big data problem in health care