➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
Data fluency for the 21st century
1. Data Fluency for the 21st
Century
Martin Frigaard & Peter Spangler
access these slides: http://bit.ly/data-fluency-slides
icons by https://www.freepik.com/
comment on these slides: http://bit.ly/data-fluency-slides
2. Objectives
● Why are you here?
● Operational definitions
● Basic skills
● Data analysis toolkit
● Communicating with data
● Questions
3. Why are you here?
Data skills are in high demand!
'Data scientist' has been the sexiest job for over 5 years. Fortunately, many of the problems
businesses and organizations face do not require someone with a PhD in machine learning,
or a fancy software solution. Many of these problems can be solved by people with domain
knowledge, data analysis skills, curiosity and the ability to communicate.
4. Why are you here?
Government agencies, nonprofits, and non-governmental
organizations are also recognizing the need for data
analysis skills
- Data analysis has become an essential tool for all policy makers, agencies, and community
action organizations to demonstrate the evidence for their ideas.
- Data for Democracy: "We work together to make the world a better place. At the heart of our
collective efforts is how data and technology can be used for good. We work to help shape a
better future and make positive changes in communities around the globe."
https://www.datafordemocracy.org/about-us
- The Civic Analytics Network: "The network will collaborate on shared projects that advance
the use of data visualization and predictive analytics in solving important urban problems
related to economic opportunity, poverty reduction, and addressing the root causes of social
problems of equity and opportunity."
https://datasmart.ash.harvard.edu/news/article/about-the-civic-analytics-network-826
- Our World in Data: "We cannot know what is happening in the world from the daily news
alone. The news media focuses on single events, too often missing the long-lasting, forceful
changes that reshape the world we live in." https://ourworldindata.org/about
5. Why should more people be here?
Today, everyone needs to understand how data and statistics are shaping the
world we live in
Data are used to represent and
nearly every aspect of life...
- Redistricting has a huge effect on U.S. politics but is greatly misunderstood. This project
uncovers what’s really broken, what's not and whether gerrymandering can (or should) be
killed. Depending on the desired outcome, each of the different maps could represent the
“right” way to draw congressional district boundaries - fivethirtyeight's gerrymandering
project
6. Operational definitions
What is data science vs. machine learning?
Data science: "...integrates a set of problem definitions, algorithms, and processes
that can be used to analyze data so as to extract actionable insight...deals with both
structured and unstructured (big) data and encompasses principles from a range of
fields, including machine learning, statistics, data ethics and regulation, and
high-performance computing."
Machine learning: "The field of computer science research that focuses on
developing and evaluating algorithms that can extract useful patterns from data
sets."
- Both of these definitions involve a ton of school, training, and experience to understand.
However, as you can see, data science includes fields like statistics and machine learning.
- These are both far above what is required to work with data
- More on this here: https://arxiv.org/abs/1903.07639
7. Operational definitions
The good news!
Data science: "...integrates a set of problem definitions, algorithms, and processes
that can be used to analyze data so as to extract actionable insight...deals with both
structured and unstructured (big) data and encompasses principles from a range of
fields, including machine learning, statistics, data ethics and regulation, and
high-performance computing."
Machine learning: "The field of computer science research that focuses on
developing and evaluating algorithms that can extract useful patterns from data
sets."
USUALLY NOT NECESSARY!
- These are both far above what is required to work with data, create visualizations, and
gain useful insights!
- listen to this podcast:
https://soundcloud.com/dataframed/1-data-science-past-present-and-future
8. Operational definitions
Our concern is data fluency
Information literacy: "...the ability to know when there is a need for information,
to be able to identify, locate, evaluate, and effectively use that information for the
issue or problem at hand."
Data literacy: "...the ability to read, understand, create and communicate data as
information."
Statistical literacy: "...the ability to understand and reason with statistics and
data."
These are great--but why are they separated?
Why would you have one without the other?
9. Data Fluency
Data fluency combines 1) the situational assessment skills from
information literacy, 2) the storage, retrieval, manipulation, and
management abilities from data literacy, and 3) the problem
solving, reasoning, and critical thinking from statistical literacy.
Data fluency combines 1) the problem assessing skills from information literacy, 2) the
storage, retrieval, manipulation, and management abilities from data literacy, and the
problem solving, reasoning, and critical thinking from statistical literacy.
10. Operational definitions
skills that 'move across'
[Data] Transliteracy: "Transliteracy captures the idea of our capacity to
interact with information in whatever form it takes...[it] concerns the ability
to apply and transfer a range of skills and contextual insights to a variety of
settings. Rather than focusing on any one skill set or technology, transliteracy is
about fluidity of movement across a range of contexts. " - Transliteracy: The
Art and Craft of ‘Moving Across’
11. Basic Skills
What's required for analytic literacy?
1. Domain expertise: you need to know your stuff
2. Understanding data structures: know what gets measured, how it's stored, and
what it looks like
3. Programming: interact with data programmatically so you can express your
intentions clearly (and document your work)
4. Exploratory Data Analysis: be able to summarize and communicate the
characteristics and patterns of a data set, using tables, graphs, and visualizations
An analyst needs characteristics like curiosity, tenacity, and stick-with-it-ness.
12. Domain expertise
Providing the context and purpose
An analytic approach to solving problems typically starts with some version of the
following questions:
1. What happened?
2. Why did it happen?
3. What will happen if it continues?
4. What can we do about it (or what will happen to y if we do x)?
The people closest to a problem will often have the necessary information to solve it, so
training them to think analytically is a better long term solution than hiring an expensive
'data scientist' who doesn't know your business.
13. Data structures:
What kind of information is being collected?
What are data?
- Tweets
- Sales
- Addresses
How can we access them?
- API
- Relational databases
- Google sheets
Where are they stored?
- Tables (SQL, Google Sheets, etc.)
- Web structures (JSON)
14. Programming
Code is a necessary means of communication
"Instead of imagining that our main task is to instruct a computer what to do, let
us concentrate rather on explaining to human beings what we want a computer
to do." - Donald Knuth. "Literate Programming (1984)"
Should everyone learn to code?
- Knowing how to program "will vastly increase your potential in becoming a
valuable asset at any organization"
- "Having coding know-how equips you to better understand how the pieces of the
puzzle fit together in a business'
- "Coding doesn’t restrict you to a career in tech: it enhances the career, skills, or
interests you already have."
https://www.forbes.com/sites/laurencebradford/2016/06/20/why-every-millennial-should-
learn-some-code/#5ebd0b1870f2
15. Exploratory Data Analysis
The goal of the analysis is exploration (not models and algorithms)
- In order to know if you'll be able to use your data to predict anything, you'll
need to understand it's characteristics
- We do this through summaries, graphics, and visualizations
- "It is important to understand what you CAN DO before you learn to measure
how WELL you seem to have DONE it" - John Tukey
https://simplystatistics.org/2019/04/17/tukey-design-thinking-and-better-questions/
- ...goal of data analysis is to explore the data. In other words, data analysis is exploratory
data analysis...maybe this shouldn’t be so surprising given that Tukey wrote the book on
exploratory data analysis.
- In this paper, at least, he essentially dismisses other goals as overly optimistic or not really
meaningful.
- For the most part I agree with that sentiment, in the sense that looking for “the answer” in
a single set of data is going to result in disappointment. At best, you will accumulate
evidence that will point you in a new and promising direction. Then you can iterate,
perhaps by collecting new data, or by asking different questions.
- At worst, you will conclude that you’ve “figured it out” and then be shocked when
someone else, looking at another dataset, concludes something completely different.
In light of this, discussions about p-values and statistical significance are very much
beside the point.
16. The Data Analysis
Toolkit
The necessary steps for an
analytic data project are on
the left
As you can see, staying
inside the RStudio IDE
minimizes the number of
additional tools you'll have
to work with
Problem statement or
question
Data collection and
wrangling
Data visualization and
modeling
Data communication
RStudio IDE
The RStudio IDE is a complementary cognitive artifact.
....Expert users of the abacus are not users of the physical abacus—they use a
mental model in their brain. And expert users of slide rules can cast the ruler aside
having internalized its mechanics. Cartographers memorize maps, and Edwin
Hutchins has shown us how expert navigators form near symbiotic relationships
with their analog instruments.
So our upper Paleolithic lineage has always possessed artificial intelligence to the
extent our ancestors have been aided in this way. In modern life, mobile devices and
their apps—to-do apps, calendar apps, journaling apps, astronomy apps, game
apps, social apps, and on near infinitum—just recapitulate the three essential
elements of the astrolabe: memory, search, and calculation.
Compare these complementary cognitive artifacts to competitive cognitive artifacts
like the mechanical calculator, the global positioning systems in our cars and
phones, and machine learning systems powering our App ecosystem. In each of
these examples our effective intelligence is amplified, but not in the way of
complementary artifacts. In the case of competitive artifacts, when we are deprived
of their use, we are no better than when we started. They are not coaches and
teachers—they are serfs. We have created an artificial serf economy where
incremental and competitive artificial intelligence both amplifies our productivity
and threatens to diminish organic and complementary artificial intelligence, and
17. the ethics of this sort of mechanical labor are only now engaging the attention of
practitioners and policy makers.
http://nautil.us/blog/will-ai-harm-us-better-to-ask-how-well-reckon-with-our-hybri
d-nature
18. Case Study Follow the following link:
https://rstudio.cloud/project/322459
Collecting Google data
21. This is all stuff I've learned from other people!
1. Hadley Wickham
2. Hilary Mason
3. Greg Wilson
4. David Krakauer
5. David Robinson
6. Jenny Bryan
7. Charlotte Wickham
8. Bradley Boehmke
9. Benjamin S. Baumer
10. Mara Averick
11. Andrew Gelman
12. Lucy D'Agostino McGowan
I didn't come up with any of this stuff on my own--I learned it from these
great folks (and many others!)