1. Advanced Data Analytics: Introduction
Jeffrey Stanton
School of Information Studies
Syracuse University
2.
3. Kilo, Mega, Giga, Tera, Peta, Exa
Zetta = 1021 bytes
…An organization Over 95% of the
employing 1,000 digital universe is
knowledge workers "unstructured data" –
loses $5.7 million meaning its content
annually just in time can't be truly
wasted having to represented by its field
reformat information in a record, such as
as they move among name, address, or date
applications. Not of last transaction. In
finding information organizations,
costs that same unstructured data
organization an accounts for more than
additional $5.3m a 80% of all
year. information.
Source: IDC Source: IDC
4. Major sources of data
• Health-related services, e.g. benefits, medical analyses
• Business:
– Walmart: 20 million transactions/day, 10 terabyte database
• Science:
– NASA: 0.5+ terabytes per day per satellite
• Society and everyone: news, digital cameras, YouTube
• DOD and intelligence
4
6. Analytics: Multiple Skills
• Curiosity – Interest and intrinsic motivation to figure things
out, ask why, and pursue solutions
• Skepticism – Seek simplicity and distrust it, go below the
surface explanation of things, question all assumptions
• Writing – Communicate results, tell stories, convince others
of the merits of your case
• Visual Reasoning – Develop and present visualizations that
support your conclusions
• Statistics – Draw inferences from and summarize data to
develop a case and a story
• Programming – Manipulate software tools to create a chain
of provenance for data and analysis
6
7. Knowledge Development
for Industry, Education,
Government, Research
Domain
Experts Infrastructure
Professionals
Expertise in specific Information Rapid pace of
subject areas Organization &
IT development
Visualization
Limited opportunity to Limited expertise in
master technology skills Information Data Solution
domain areas
Analysis Scientists Integration
Proliferation of big data &
Specialized knowledge of
new technology
HW, FW, MW, SW
Digital Curation
Need for knowledge and Communication
information managers challenges
Transforming Data Into Decisions
8. Analytics: Key Steps
• Learn the application domain
• Locate or develop a data source or data set
• Clean and preprocess data: May take 60% of effort!
• Data reduction and transformation
– Find useful pieces, squeeze out redundancies
• Choose analytical approaches
– summarize, visualize, organize, describe, explore, find
patterns, predict, test, infer
• Communicate the results and implications to data users
• Deploy discovered knowledge in a system
• Monitor and evaluate the effectiveness of the system
8
Notes de l'éditeur
Facebook friend connections worldwide, a network diagram of the Enron email set, a comparison of similar gene sequences between humans, chimps, and macaques