SlideShare une entreprise Scribd logo
1  sur  31
THE ROLE OF DATA SCIENCE
IN ASKING AND ANSWERING
‘GOOD QUESTIONS’
Eric Kostello
Big Data Camp--Data ScienceTrack
Los Angeles, CA14 June 2014
© 2014 Eric Kostello
DATA ANALYSIS WITHOUT
CONTEXT
Prepare
data
Analysis
Report /
Summary
Get data
THE CONTEXT OF DATA
ANALYSIS
Problem/Issue
Outcome
Form Objectives &Take actions
THE CONTEXT OF DATA
ANALYSIS
Problem/Issue
Outcome
Prepare
data
Analysis
Report /
Summary
Get data
• Which objectives are the right objectives?
• What is the right way to achieve them?
• Increased data set size and increased computational power
only address a small piece of the puzzle
EVEN BIGGER CONTEXT
• Question everything in the social world, and try to “get to the root of
the matter”
√
• How do social scientists explain social facts?
• With other social facts!
• Challenge: Everything in the social world is related to everything else
• The whole world is the original “big data” to sociologists
SOCIOLOGICALVIEW
VOCABULARYTEST
a be dawned finger Michael on Symonds
adjudged beckoned day Hussey Mike overnight the
an before depended inevitable much Oz this
and being dreaded inside mustered perished Thus
Andrew But duo latter near Ponting to/too/two
as capacity edge leg new pyrotechnics unfortunate
Australia cheered either little ninth run wicket
bar Clarke every lunch of shown with
batsmen crowd final lustily off side wizard
(Any objection to any of these words?)
VOCABULARYTEST
dawned finger
adjudged beckoned day overnight
before depended inevitable much
being dreaded inside mustered perished
duo latter near
capacity edge leg new pyrotechnics unfortunate
cheered either little ninth run wicket
bar every lunch shown
batsmen crowd final lustily side wizard
Stop words and proper nouns removed...
Ready for a reading test?
READINGTEST
Thus, as the final day dawned and a near capacity
crowd lustily cheered every run Australia mustered,
much depended on Ponting and the new wizard of
Oz, Mike Hussey, the two overnight batsmen. But this
duo perished either side of lunch--the latter a little
unfortunate to be adjudged leg-before--and with
Andrew Symonds, too, being shown the dreaded
finger off an inside edge, the inevitable beckoned, bar
the pyrotechnics of Michael Clarke and the ninth
wicket.
WhatYour Kindergartner Needs to Know
E.D. Hirsch, Jr. and John Holdren, eds., (2013)
WHAT IS A SKILL?
• Substantive knowledge required for reading
• Reading is not a skill
• Data analysis is not a skill
• Both require substantive understanding
• If there is such a thing as “data science” its practitioners must
combine skills and subject matter knowledge
• Hard Science
• Theories + Evidence = Laws
• Experiments
• Repeatability!
• Is anything we do remotely like that?
SCIENCE
• Essence of scientific approach is trying to find valid
generalizations
• “Theories” or “laws” relate causal conditions to outcomes
• Making predictions about things that are actually observable
is much more likely to result in valid generalizations.
• Placing p-values next to regressions does not make an
analysis scientific.
SCIENCE???
• Universal?
• Plenty of laws are not universal. (e.g. F = ma)
• Precise?
• The more accurately we measure, the more we discover
discrepancies.
• No exceptions?
• “Exceptions are just the least frequent alternative in a collection of
facts.” --M. Bunge
REQUIREMENTS FOR A
SCIENTIFIC LAW?
• No experiment or empirical result provides absolutely
consistency (between inputs/outputs or causes and effects)
• The finer the measurement, the more inconsistencies you find
IN/CONSISTENCY
ConsistencyConsistencyConsistency
High Low
More general laws,
more widely applicable
Laws with more limited
scope
• (But nobody cares about “laws” that have tiny, tiny scope)
• Lawfulness is not identical to universal applicability + 100%
consistency
• There can be all kinds of lawfulness, discoverable by
proceeding scientifically.
• Science advances in a community
• Discovery of lawfulness is not only the primary result of scientific research, it is a fundamental presupposition of scientific
endeavors.
SCIENCE:YES
• Counterfactual thinking
• Things could have been a different way [i.e. counter to] actual results [i.e. fact]
• Find the conditions that make the difference in outcomes
• A pattern in data is not evidence if you search only for that pattern
without letting other possibilities into the picture
• Observational data complicates things
• Have to find a way to meet “all else equal” condition
• Reminder: predictions about actually observable phenomena >> p-values
PROCEEDING SCIENTIFICALLY
READ A
MUCH
BETTER
VERSION
OFTHIS
ARGUMENT
HERE
• Matching the level of precision to the scope of the generalizations
needed
• It helps (a lot) to develop agreement on
• What you need to know about to meet objectives
• Balance required
• Too limited scope doesn’t do much for the next project
• Too ambitious is too costly, takes too long, and still might not answer
your questions
ORGANIZATIONAL
IMPLICATIONS
“SHOWYOUR WORK”
• Good answers have credibility when the process that
generates them is clear
• Building valid generalizations is often incremental, not starting
over from scratch each time (in an interactive environment)
• Obstacles to modifying your analysis create intolerable friction
(mental and organizational)
• Spreadsheet jockeys and interactive statistics package users:
you are on notice
REPRODUCIBLE COMPUTING
ENHANCES COLLABORATION
• Thanks to developments like knitr, anybody can reproduce your analysis.
• Motivation for the programming steps becomes much clearer
• Combine with distributed version control enables collaborative,
reproducible research.
• People with complementary skills can collaborate
• Most importantly, we are following Knuth’s dictum that we should be
concentrate on explaining to other humans what we want the
computer to do
GOOD QUESTIONS
• Started with “Problem/Issue” ... formulate that in form of question
• “What is the relationship between this and that?”
• Treats the relationship itself as a hypothesis
• Good questions are posed at a useful level of abstraction
• “So what?” Good questions provide answers for the inevitable “so
what?”
• Derived from and add to/extend/challenge current thinking
APPROPRIATE DATA SET SIZE
• Adding/collecting lots of data just because you can is not a strategy
• You might find something surprising/cool
• You might waste your time looking at what you have instead of what you need to
• The correct balance depends on the problem
• Too small to resolve issues is a waste of money
• Unnecessarily large is also a waste
• Lots of variables won’t save you from endogeneity problems (when cause and
effect are unclear)
• Big data is
• Going to help me find a parking spot
• Going to help you offer me up just the right ad at just the
right time
BIG [DATA] CONCLUSION
• Computational complexity of dealing with big data is dropping
• Not going to change the logic of establishing lawfulness
SAMPLESVS. DATA SETS
• True samples are random and representative.
• Statistics from random samples have clear relationship to the whole population
• The rest (i.e. “what you have”) are just ad hoc collections of data of varying quality
• Selection bias exists and is a problem when who (or what) gets
into the data set is related to what is being measured by the data
set.
• Election polls calling only landline telephones (because there is a tendency for
cellphone only voters to vote differently than landline users.)
• Important to think very carefully about the data-generation mechanism.
ISYOUR DATA SUBJECTTO
SELECTION BIAS?
• Yes
• But you have to size the impact
• (Paul Rosenbaum’s Observational Studies offers an accessible framework to quantify how
vulnerable results are to unmeasured biases)
• Thousands of variables won’t help you figure out what is going on if...
• you are missing substantial chunks of the population of interest
• you are not measuring the right thing(s)
• Temptation to assume that lots of data (variables or observations) means lots of coverage and
therefore implies representative data
• Very poor assumption
BLACK BOX PREDICTIVE
SYSTEMSTOTHE RESCUE?
• Can you build a machine-learning system that compensates
for the missing segments of the population?
• Assumes the relationship between the represented and
unrepresented population is stable.
• But we can’t measure everything, so what do we do?
THE NON-ROYAL ROAD
• Measure the right things to discover valid laws about the relationships we
are interested in
• It could take anywhere from tens to trillions of measurements
• Be cognizant of the valid scope of generalizations possible
• Selection bias, unreliable humans, etc.
• Knowing the weaknesses of a data set or a statistical method is up to us
• Combine: Subject matter knowledge and statistical understanding and data
manipulation
CAGE MATCH!
• Measure the
“right” things
• Use the “right”
data
• But this is the only data I have. I
know it’s flawed, but it’s this or
nothing.
• Consider the limitations of who is in the data, the
validity of claiming that a measurement is really the
thing you want it to be, etc.
• Consider if there is any other way to see how far
off you are on critical issues because of these limits.
Vs
APPLICATION
• During the talk I gave a too-hasty example that a lot of people didn’t get.
• It showed the strength of the relationship between different items. (The sort of visualization makes
sense for trade offs or flows as well.)
• I created the graph using the R library circlize, which implements the circos library in R.
• circlize
• Zuguang Gu (2014). circlize: Circular visualization in R. R package version 0.0.8.
• http://CRAN.R-project.org/package=circlize
• circose
• http://circos.ca/
• Then I was trying to make a little joke about how I would show the “steps” to produce the plot, with
the idea that the listener might think they are about to see R code. What they actually saw was...
f39869
f39324
f42394
f39578
f43461
f42245
f40023
f40884
f42052
f39112
f42252
f41756
f42090
STEPSTO PRODUCETHIS
VISUALIZATION...
• Join analyst group that uses data
• Try to figure things out “their” way
• (Take their problem seriously)
• Ask questions
• Of them
• Of the data
• See if there is a way to marshall the data to support better insights
• Iterate
FINALTHOUGHTS
• Situational lawfulness
• Defining the situation in which we are trying to find lawfulness of sufficient generality contributes to
organizational harmony and success
• “Defining” is not a solo activity
• Black boxes can be predictive, but understanding relationships matters
• The right data is better than lots of data
• The right data depends on the questions, which depends on substantive understanding
• Science progresses in a community: Listen and engage actively and broadly
• Communicate to contribute
• There is no simple formula for “good questions,” only general guidelines about how to head in the right
direction

Contenu connexe

Tendances

Managerial Decision-Making
Managerial Decision-MakingManagerial Decision-Making
Managerial Decision-MakingLee Schlenker
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
Managerial Decision Making
Managerial Decision MakingManagerial Decision Making
Managerial Decision MakingLee Schlenker
 
Systems thinking - a new approach for decision making
Systems thinking - a new approach for decision makingSystems thinking - a new approach for decision making
Systems thinking - a new approach for decision makingJuhana Huotarinen
 
Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010
Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010
Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010TEST Huddle
 
Computational Thinking - A Revolution in 4 Steps
Computational Thinking - A Revolution in 4 StepsComputational Thinking - A Revolution in 4 Steps
Computational Thinking - A Revolution in 4 StepsPaul Herring
 
Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea...
 Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea... Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea...
Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea...Robin De Croon
 
Patterson Consulting: What is Artificial Intelligence?
Patterson Consulting: What is Artificial Intelligence?Patterson Consulting: What is Artificial Intelligence?
Patterson Consulting: What is Artificial Intelligence?Josh Patterson
 
What is Artificial Intelligence
What is Artificial IntelligenceWhat is Artificial Intelligence
What is Artificial IntelligenceJosh Patterson
 
Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyMatti Heino
 
Computational thinking-illustrated
Computational thinking-illustratedComputational thinking-illustrated
Computational thinking-illustratedCraig Evans
 
Academic Research: A Survival Guide
Academic Research: A Survival GuideAcademic Research: A Survival Guide
Academic Research: A Survival GuidePayamBarnaghi
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide PayamBarnaghi
 
Computational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogyComputational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogyPaul Herring
 
From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...
From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...
From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...CAVEDU Education
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Josh Sheldon
 
Wtf is data science?
Wtf is data science?Wtf is data science?
Wtf is data science?Dylan
 

Tendances (20)

Managerial Decision-Making
Managerial Decision-MakingManagerial Decision-Making
Managerial Decision-Making
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Systems thinking class
Systems thinking classSystems thinking class
Systems thinking class
 
Getting your work funded
Getting your work fundedGetting your work funded
Getting your work funded
 
Managerial Decision Making
Managerial Decision MakingManagerial Decision Making
Managerial Decision Making
 
Articulo 50 palabras
Articulo 50 palabras Articulo 50 palabras
Articulo 50 palabras
 
Systems thinking - a new approach for decision making
Systems thinking - a new approach for decision makingSystems thinking - a new approach for decision making
Systems thinking - a new approach for decision making
 
Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010
Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010
Susan Windsor - Critical Thinking for Testers - EuroSTAR 2010
 
Computational Thinking - A Revolution in 4 Steps
Computational Thinking - A Revolution in 4 StepsComputational Thinking - A Revolution in 4 Steps
Computational Thinking - A Revolution in 4 Steps
 
Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea...
 Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea... Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea...
Augmenting Healthcare by Supporting General Practitioners and Disclosing Hea...
 
Patterson Consulting: What is Artificial Intelligence?
Patterson Consulting: What is Artificial Intelligence?Patterson Consulting: What is Artificial Intelligence?
Patterson Consulting: What is Artificial Intelligence?
 
What is Artificial Intelligence
What is Artificial IntelligenceWhat is Artificial Intelligence
What is Artificial Intelligence
 
Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychology
 
Computational thinking-illustrated
Computational thinking-illustratedComputational thinking-illustrated
Computational thinking-illustrated
 
Academic Research: A Survival Guide
Academic Research: A Survival GuideAcademic Research: A Survival Guide
Academic Research: A Survival Guide
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide 
 
Computational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogyComputational Thinking - a 4 step approach and a new pedagogy
Computational Thinking - a 4 step approach and a new pedagogy
 
From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...
From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...
From computational Thinking to computational Action - Dr. Hal Abelson, MIT Ap...
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...
 
Wtf is data science?
Wtf is data science?Wtf is data science?
Wtf is data science?
 

En vedette

Asking Questions of Data
Asking Questions of DataAsking Questions of Data
Asking Questions of DataTony Hirst
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questionscrystalpullen
 
Asking the Right Questions of Your Data
Asking the Right Questions of Your DataAsking the Right Questions of Your Data
Asking the Right Questions of Your DataDataWorks Summit
 
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...Data Con LA
 
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choiTajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choiData Con LA
 
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Data Con LA
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Data Con LA
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonData Con LA
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
Asking better questions
Asking better questionsAsking better questions
Asking better questionsInnoTech
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceBig Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceData Con LA
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Data Con LA
 

En vedette (20)

Probate Myths Debunked
Probate Myths DebunkedProbate Myths Debunked
Probate Myths Debunked
 
Asking Questions of Data
Asking Questions of DataAsking Questions of Data
Asking Questions of Data
 
The Five Data Questions
The Five Data QuestionsThe Five Data Questions
The Five Data Questions
 
Asking the Right Questions of Your Data
Asking the Right Questions of Your DataAsking the Right Questions of Your Data
Asking the Right Questions of Your Data
 
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
Big Data Day LA 2015 - Data mining, forecasting, and BI at the RRCC by Benjam...
 
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choiTajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
 
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
Big Data Day LA 2015 - Tips for Building Self Service Data Science Platform b...
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Asking better questions
Asking better questionsAsking better questions
Asking better questions
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceBig Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
 
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 

Similaire à Data science and good questions eric kostello

'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.Josh Cowls
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1sasi
 
Presentation1a paul carpenter
Presentation1a paul carpenterPresentation1a paul carpenter
Presentation1a paul carpenterYinglingV
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...Johann van Wyk
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsrobkitchin
 
Online Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical ApproachOnline Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical ApproachAsad Zaman
 
Introduction_to_Quantitative_Research_Me.pdf
Introduction_to_Quantitative_Research_Me.pdfIntroduction_to_Quantitative_Research_Me.pdf
Introduction_to_Quantitative_Research_Me.pdfAfframHspt
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
 
How to Become a Data Science Company instead of a company with Data Scientist...
How to Become a Data Science Company instead of a company with Data Scientist...How to Become a Data Science Company instead of a company with Data Scientist...
How to Become a Data Science Company instead of a company with Data Scientist...Ruth Kearney
 
Social Graphs for Better Drug Development
Social Graphs for Better Drug DevelopmentSocial Graphs for Better Drug Development
Social Graphs for Better Drug DevelopmentVaticle
 
Current challenges for educational technology research
Current challenges for educational technology researchCurrent challenges for educational technology research
Current challenges for educational technology researchMartin Oliver
 
Michael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project DesignMichael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project DesignAlice Sheppard
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data scienceJoe Keating
 
Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Michael Brodie
 

Similaire à Data science and good questions eric kostello (20)

'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.'Drinking from the fire hose? The pitfalls and potential of Big Data'.
'Drinking from the fire hose? The pitfalls and potential of Big Data'.
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 
Jsm big-data
Jsm big-dataJsm big-data
Jsm big-data
 
Presentation1a paul carpenter
Presentation1a paul carpenterPresentation1a paul carpenter
Presentation1a paul carpenter
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shifts
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Online Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical ApproachOnline Course: Real Statistics: A Radical Approach
Online Course: Real Statistics: A Radical Approach
 
Introduction_to_Quantitative_Research_Me.pdf
Introduction_to_Quantitative_Research_Me.pdfIntroduction_to_Quantitative_Research_Me.pdf
Introduction_to_Quantitative_Research_Me.pdf
 
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
 
How to Become a Data Science Company instead of a company with Data Scientist...
How to Become a Data Science Company instead of a company with Data Scientist...How to Become a Data Science Company instead of a company with Data Scientist...
How to Become a Data Science Company instead of a company with Data Scientist...
 
Social Graphs for Better Drug Development
Social Graphs for Better Drug DevelopmentSocial Graphs for Better Drug Development
Social Graphs for Better Drug Development
 
Using evidence
Using evidenceUsing evidence
Using evidence
 
Biswa research
Biswa researchBiswa research
Biswa research
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Current challenges for educational technology research
Current challenges for educational technology researchCurrent challenges for educational technology research
Current challenges for educational technology research
 
Michael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project DesignMichael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project Design
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data science
 
Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14
 
Lr 1 Intro.pdf
Lr 1 Intro.pdfLr 1 Intro.pdf
Lr 1 Intro.pdf
 

Plus de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Plus de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Dernier

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 

Dernier (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Data science and good questions eric kostello

  • 1. THE ROLE OF DATA SCIENCE IN ASKING AND ANSWERING ‘GOOD QUESTIONS’ Eric Kostello Big Data Camp--Data ScienceTrack Los Angeles, CA14 June 2014 © 2014 Eric Kostello
  • 3. THE CONTEXT OF DATA ANALYSIS Problem/Issue Outcome Form Objectives &Take actions
  • 4. THE CONTEXT OF DATA ANALYSIS Problem/Issue Outcome Prepare data Analysis Report / Summary Get data
  • 5. • Which objectives are the right objectives? • What is the right way to achieve them? • Increased data set size and increased computational power only address a small piece of the puzzle EVEN BIGGER CONTEXT
  • 6. • Question everything in the social world, and try to “get to the root of the matter” √ • How do social scientists explain social facts? • With other social facts! • Challenge: Everything in the social world is related to everything else • The whole world is the original “big data” to sociologists SOCIOLOGICALVIEW
  • 7. VOCABULARYTEST a be dawned finger Michael on Symonds adjudged beckoned day Hussey Mike overnight the an before depended inevitable much Oz this and being dreaded inside mustered perished Thus Andrew But duo latter near Ponting to/too/two as capacity edge leg new pyrotechnics unfortunate Australia cheered either little ninth run wicket bar Clarke every lunch of shown with batsmen crowd final lustily off side wizard (Any objection to any of these words?)
  • 8. VOCABULARYTEST dawned finger adjudged beckoned day overnight before depended inevitable much being dreaded inside mustered perished duo latter near capacity edge leg new pyrotechnics unfortunate cheered either little ninth run wicket bar every lunch shown batsmen crowd final lustily side wizard Stop words and proper nouns removed... Ready for a reading test?
  • 9. READINGTEST Thus, as the final day dawned and a near capacity crowd lustily cheered every run Australia mustered, much depended on Ponting and the new wizard of Oz, Mike Hussey, the two overnight batsmen. But this duo perished either side of lunch--the latter a little unfortunate to be adjudged leg-before--and with Andrew Symonds, too, being shown the dreaded finger off an inside edge, the inevitable beckoned, bar the pyrotechnics of Michael Clarke and the ninth wicket. WhatYour Kindergartner Needs to Know E.D. Hirsch, Jr. and John Holdren, eds., (2013)
  • 10. WHAT IS A SKILL? • Substantive knowledge required for reading • Reading is not a skill • Data analysis is not a skill • Both require substantive understanding • If there is such a thing as “data science” its practitioners must combine skills and subject matter knowledge
  • 11. • Hard Science • Theories + Evidence = Laws • Experiments • Repeatability! • Is anything we do remotely like that? SCIENCE
  • 12. • Essence of scientific approach is trying to find valid generalizations • “Theories” or “laws” relate causal conditions to outcomes • Making predictions about things that are actually observable is much more likely to result in valid generalizations. • Placing p-values next to regressions does not make an analysis scientific. SCIENCE???
  • 13. • Universal? • Plenty of laws are not universal. (e.g. F = ma) • Precise? • The more accurately we measure, the more we discover discrepancies. • No exceptions? • “Exceptions are just the least frequent alternative in a collection of facts.” --M. Bunge REQUIREMENTS FOR A SCIENTIFIC LAW?
  • 14. • No experiment or empirical result provides absolutely consistency (between inputs/outputs or causes and effects) • The finer the measurement, the more inconsistencies you find IN/CONSISTENCY ConsistencyConsistencyConsistency High Low More general laws, more widely applicable Laws with more limited scope • (But nobody cares about “laws” that have tiny, tiny scope)
  • 15. • Lawfulness is not identical to universal applicability + 100% consistency • There can be all kinds of lawfulness, discoverable by proceeding scientifically. • Science advances in a community • Discovery of lawfulness is not only the primary result of scientific research, it is a fundamental presupposition of scientific endeavors. SCIENCE:YES
  • 16. • Counterfactual thinking • Things could have been a different way [i.e. counter to] actual results [i.e. fact] • Find the conditions that make the difference in outcomes • A pattern in data is not evidence if you search only for that pattern without letting other possibilities into the picture • Observational data complicates things • Have to find a way to meet “all else equal” condition • Reminder: predictions about actually observable phenomena >> p-values PROCEEDING SCIENTIFICALLY
  • 18. • Matching the level of precision to the scope of the generalizations needed • It helps (a lot) to develop agreement on • What you need to know about to meet objectives • Balance required • Too limited scope doesn’t do much for the next project • Too ambitious is too costly, takes too long, and still might not answer your questions ORGANIZATIONAL IMPLICATIONS
  • 19. “SHOWYOUR WORK” • Good answers have credibility when the process that generates them is clear • Building valid generalizations is often incremental, not starting over from scratch each time (in an interactive environment) • Obstacles to modifying your analysis create intolerable friction (mental and organizational) • Spreadsheet jockeys and interactive statistics package users: you are on notice
  • 20. REPRODUCIBLE COMPUTING ENHANCES COLLABORATION • Thanks to developments like knitr, anybody can reproduce your analysis. • Motivation for the programming steps becomes much clearer • Combine with distributed version control enables collaborative, reproducible research. • People with complementary skills can collaborate • Most importantly, we are following Knuth’s dictum that we should be concentrate on explaining to other humans what we want the computer to do
  • 21. GOOD QUESTIONS • Started with “Problem/Issue” ... formulate that in form of question • “What is the relationship between this and that?” • Treats the relationship itself as a hypothesis • Good questions are posed at a useful level of abstraction • “So what?” Good questions provide answers for the inevitable “so what?” • Derived from and add to/extend/challenge current thinking
  • 22. APPROPRIATE DATA SET SIZE • Adding/collecting lots of data just because you can is not a strategy • You might find something surprising/cool • You might waste your time looking at what you have instead of what you need to • The correct balance depends on the problem • Too small to resolve issues is a waste of money • Unnecessarily large is also a waste • Lots of variables won’t save you from endogeneity problems (when cause and effect are unclear)
  • 23. • Big data is • Going to help me find a parking spot • Going to help you offer me up just the right ad at just the right time BIG [DATA] CONCLUSION • Computational complexity of dealing with big data is dropping • Not going to change the logic of establishing lawfulness
  • 24. SAMPLESVS. DATA SETS • True samples are random and representative. • Statistics from random samples have clear relationship to the whole population • The rest (i.e. “what you have”) are just ad hoc collections of data of varying quality • Selection bias exists and is a problem when who (or what) gets into the data set is related to what is being measured by the data set. • Election polls calling only landline telephones (because there is a tendency for cellphone only voters to vote differently than landline users.) • Important to think very carefully about the data-generation mechanism.
  • 25. ISYOUR DATA SUBJECTTO SELECTION BIAS? • Yes • But you have to size the impact • (Paul Rosenbaum’s Observational Studies offers an accessible framework to quantify how vulnerable results are to unmeasured biases) • Thousands of variables won’t help you figure out what is going on if... • you are missing substantial chunks of the population of interest • you are not measuring the right thing(s) • Temptation to assume that lots of data (variables or observations) means lots of coverage and therefore implies representative data • Very poor assumption
  • 26. BLACK BOX PREDICTIVE SYSTEMSTOTHE RESCUE? • Can you build a machine-learning system that compensates for the missing segments of the population? • Assumes the relationship between the represented and unrepresented population is stable. • But we can’t measure everything, so what do we do?
  • 27. THE NON-ROYAL ROAD • Measure the right things to discover valid laws about the relationships we are interested in • It could take anywhere from tens to trillions of measurements • Be cognizant of the valid scope of generalizations possible • Selection bias, unreliable humans, etc. • Knowing the weaknesses of a data set or a statistical method is up to us • Combine: Subject matter knowledge and statistical understanding and data manipulation
  • 28. CAGE MATCH! • Measure the “right” things • Use the “right” data • But this is the only data I have. I know it’s flawed, but it’s this or nothing. • Consider the limitations of who is in the data, the validity of claiming that a measurement is really the thing you want it to be, etc. • Consider if there is any other way to see how far off you are on critical issues because of these limits. Vs
  • 29. APPLICATION • During the talk I gave a too-hasty example that a lot of people didn’t get. • It showed the strength of the relationship between different items. (The sort of visualization makes sense for trade offs or flows as well.) • I created the graph using the R library circlize, which implements the circos library in R. • circlize • Zuguang Gu (2014). circlize: Circular visualization in R. R package version 0.0.8. • http://CRAN.R-project.org/package=circlize • circose • http://circos.ca/ • Then I was trying to make a little joke about how I would show the “steps” to produce the plot, with the idea that the listener might think they are about to see R code. What they actually saw was...
  • 30. f39869 f39324 f42394 f39578 f43461 f42245 f40023 f40884 f42052 f39112 f42252 f41756 f42090 STEPSTO PRODUCETHIS VISUALIZATION... • Join analyst group that uses data • Try to figure things out “their” way • (Take their problem seriously) • Ask questions • Of them • Of the data • See if there is a way to marshall the data to support better insights • Iterate
  • 31. FINALTHOUGHTS • Situational lawfulness • Defining the situation in which we are trying to find lawfulness of sufficient generality contributes to organizational harmony and success • “Defining” is not a solo activity • Black boxes can be predictive, but understanding relationships matters • The right data is better than lots of data • The right data depends on the questions, which depends on substantive understanding • Science progresses in a community: Listen and engage actively and broadly • Communicate to contribute • There is no simple formula for “good questions,” only general guidelines about how to head in the right direction