SlideShare une entreprise Scribd logo
1  sur  26
What is a Data Scientist?
Authored By Russell Tibballs MACS CP MSR
Caveat
 This slideshow does not represent the views
of the company I work for. It represents my
evolving views at this point in time and is
mainly intended to provoke thought and
discussion.
Authored By Russell Tibballs MACS CP MSR
Google searches for “data
scientist”
2011 saw the rise of “Big Data” and the term Data Scientist.
2011 saw the release of Money Ball, starring Brad Pitt as a geek.
2012 Nate Silver correctly predicted the winner of all 50 states and
the District of Columbia when the pundits were claimingObama
had lost.
Authored By Russell Tibballs MACS CP MSR
So how is a Data Scientist
portrayed - Super Human
‘Data Scientists perform data science.They use
technology and skills to increase awareness,
clarity and direction for those working with
data.The data scientist role is here to
accommodate the rapid changes that occur in
our modern day environment and are bestowed
the task of minimising the disruption that
technology and data is having on the way we
work, play and learn. Data Scientists don’t just
present data, data scientists present data with
an intelligence awareness of the consequences
of presenting that data.’
Authored By Russell Tibballs MACS CP MSR
Super Human Continued
A large IT company – ‘What sets the data scientist
apart is strong business acumen, coupled with the
ability to communicate findings to both business
and IT leaders in a way that can influence how an
organization approaches a business challenge.
Good data scientists will not just address business
problems, they will pick the right problems that
have the most value to the organization.’
Authored By Russell Tibballs MACS CP MSR
The Super Technologist
Authored By Russell Tibballs MACS CP MSR
DATA SCIENCE
Mark Biernbaum suggests ‘Data
Science is going 99% too fast’. His
complaint is that the “science’ is not
peer-reviewed and the techniques are
often questionable. He believes Data
Scientists should slow down,
specialize, and above all - have the
methodologies peer-reviewed.
Authored By Russell Tibballs MACS CP MSR
My Problem with Current
Definitions
I have been to a number of industry briefings
where supplied definitions are often very ‘pie
in sky’ and elitist.The definitions are designed
to indicate ‘you can’t possibly do this yourself
and there is no way any of your existing staff
will qualify for the role’.This may not be
intended; however it is the result.
Authored By Russell Tibballs MACS CP MSR
So What can we do about
that?
 Recognise that Data Science is a science that
has a broad brush stroke across all industry
sectors.
 Recognise that there are many specialty
areas.
 Recognise that it is not a technological
implementation.
 Recognise that there will be many levels of
expertise.
Authored By Russell Tibballs MACS CP MSR
Recognise that there are many
specialty areas.
There is not one version of data science.
 There is data science applicable to the
research sectors of Maths, Physics,
Meteorology, and Medicine that will rarely be
applied elsewhere.
 There is data science our friends in the NSA
and local equivalents will specialise in.
 There is the data science economist and
financial sectors will specialise in.
 Etc, etc … ad nauseam.
Authored By Russell Tibballs MACS CP MSR
Recognise that it is not a
technological implementation.
 Being able to query unstructured data in a
HDFS does not make you a data scientist.
 Being able to analyse Splunk data does not
make you a data scientist.
 Being able to filter petabytes of data on a
MPP RDBMS does not make you a data
scientist.
Authored By Russell Tibballs MACS CP MSR
So What is a Scientist.
Authored By Russell Tibballs MACS CP MSR
The important aspects of any
definition of a Job Title.
 The most important thing to remember here
is that we are talking about a JobTitle, and a
JobTitle should be meaningful.
 Secondly what should qualify someone for
that title.
Authored By Russell Tibballs MACS CP MSR
So what is important about the
title ‘Data Scientist’?.
Authored By Russell Tibballs MACS CP MSR
Where did this title
originate?
‘On November 10, 1998, he (JeffWu) gave his
inaugural lecture entitled “Statistics = Data
Science?” in honor of his appointment to the H. C.
Carver Collegiate Professorship in Statistics at the
University of Michigan.[14] In this lecture, he first
focused on the identity of statistics in science. He
then characterized statistical work as data
collection, data modeling and analysis, and
problem solving and decision making. In
conclusion, he proposed that statistics be renamed
to Data Science.
Authored By Russell Tibballs MACS CP MSR
So What is a Scientist?
From the Oxford Dictionary:
‘A person who is studying or has expert knowledge of one
or more of the natural or physical sciences :a research
scientist’.
Note. A scientist is not necessarily a research scientist; they
can be a practicing expert in a field.
However all scientists share one feature, they are trained in
a science and they apply scientific method to obtain
understanding of a focus of interest, and their methods and
conclusions are subject to peer review.
Authored By Russell Tibballs MACS CP MSR
A comment from a recently
retired Scientist
My neighbor has recently retired after a long
career as a scientist and academic.We were
discussing the increasing growing exclusivity
of the term scientist a few weekends ago. In
his words, ‘In the 1970s a scientist had
degree, by mid 80s they needed honors, in the
90s they needed a masters or PHD, now they
need several Post-Doctoral projects under
their belt to be considered a ‘real’ scientist.’
However, he believes someone who is
qualified (has a science degree) and who is
practicing their studied discipline, is a
scientist.
Authored By Russell Tibballs MACS CP MSR
A Slight Detour.
What qualifies a professional
 I see the Data Scientist as a specialty of the Computer Science
profession.
 We have lawyers who specialise in corporate, family, criminal,
and other aspects of the law.
 The accounting, architecture, engineering, teaching and medical
professions have several specialties and recognised levels of
expertise in each field.
 These professional’s have academic training, and in many cases
acceptance by a professional body is what makes them
acceptable as professionals in the public eye.That is a model I
strongly believe the ICT industry needs to adopt or at least move
towards.
 I believe the academic achievement makes the qualification.The
acceptance by a professional body should give standing within the
profession and wider community.
Authored By Russell Tibballs MACS CP MSR
The Australian Qualifications
Framework - AQF
 The AQF has 10 levels
 Level 1 – Certificate I
 Level 2 – Certificate II
 Level 3 – Certificate III
 Level 4 – Certificate IV
 Level 5 – Diploma
 Level 6 – Advanced Diploma,Associate Degree.
 Level 7 – Bachelor Degree
 Level 8 – Bachelor Honors Degree, Graduate Certificate,
Graduate Diploma
 Level 9 – Masters Degree
 Level 10 – Doctoral Degree
Authored By Russell Tibballs MACS CP MSR
A THE BOTTOM LEVEL OF THIS
SPECTRUM OF QUALIFICATIONS.
Summary Graduates at this level will have knowledge and skills for
initial work, community involvement and/or further learning
Knowledge Graduates at this level will have foundational knowledge
for everyday life, further learning and preparation for initial work
Skills Graduates at this level will have foundational cognitive,
technical and communication skills to:
•undertake defined routine activities
•identify and report simple issues and problems
Application of knowledge and skills: Graduates at this level will apply
knowledge and skills to demonstrate autonomy in highly structured and
stable contexts and within narrow parameters
Authored By Russell Tibballs MACS CP MSR
At the highest level of the
spectrum of the AQF 10 – The
Doctorate
Summary Graduates at this level will have systematic and critical
understanding of a complex field of learning and specialised research skills for
the advancement of learning and/or for professional practice
Knowledge Graduates at this level will have systemic and critical
understanding of a substantial and complex body of knowledge at the frontier of
a discipline or area of professional practice
Skills Graduates at this level will have expert, specialised cognitive, technical and
research skills in a discipline area to independently and systematically:
 engage in critical reflection, synthesis and evaluation
 develop, adapt and implement research methodologies to extend and redefine existing
knowledge or professional practice
 disseminate and promote new insights to peers and the community
 generate original knowledge and understanding to make a substantial contribution to a
discipline or area of professional practice
Application of knowledge and skills Graduates at this level will apply knowledge
and skills to demonstrate autonomy, authoritative judgment, adaptability and
responsibility as an expert and leading practitioner or scholar
Authored By Russell Tibballs MACS CP MSR
The Degree
Summary Graduates at this level will have broad and coherent
knowledge and skills for professional work and/or further learning
Knowledge Graduates at this level will have broad and coherent
theoretical and technical knowledge with depth in one or more
disciplines or areas of practice
Skills Graduates at this level will have well-developed cognitive, technical
and communication skills to select and apply methods and technologies
to:
 analyse and evaluate information to complete a range of activities
 analyse, generate and transmit solutions to unpredictable and
sometimes complex problems
 transmit knowledge, skills and ideas to others
Application of knowledge and skillsGraduates at this level will apply
knowledge and skills to demonstrate autonomy, well-developed
judgement and responsibility:
 in contexts that require self-directed work and learning
 within broad parameters to provide specialist advice and functions
Authored By Russell Tibballs MACS CP MSR
The Vendor’s Course
 The vendors course will usually be about how
to apply a tool to a problem.
 It is not generally designed to provide you
with knowledge that can be applied outside
the scope of their tool’s environment.
 It would generally not qualify within the AFQ
guidelines.
Authored By Russell Tibballs MACS CP MSR
So how does the AQF apply to
the question of Data Science
If the person working in the field of applying
‘Data Science’ has a degree (AQF level 6 or
above) in a related subject, ie Maths, Statistics,
or Economics; or a higher degree including Grad
Cert and Diplomas they can be expected to:
 apply knowledge and skills to demonstrate autonomy,
well-developed judgment and responsibility:
 in contexts that require self-directed work and learning
 within broad parameters to provide specialist advice
and functions
Authored By Russell Tibballs MACS CP MSR
Quo Bono. Who benefits from
this approach
 The Public - they will have greater confidence in
the profession.
 The employer – they get the assurance that
employee has the skills at the right levels to do
the work.
 The employee – because they will know what is
expected of them and know they will be able to
deliver.
 The professional body and industry through
greater faith and confidence by the public in the
profession in general.
Authored By Russell Tibballs MACS CP MSR
But!!!
 There needs to be demand from within the
industry for this to happen.
 Some group like the IAPA needs to take on the
responsibility of working out the Professional
specialisations and required frameworks for
acceptance of professional into those
specialisations.
Authored By Russell Tibballs MACS CP MSR

Contenu connexe

Similaire à What is a data scientist - a presentation I made to the Canberra IAPA

from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
i schools - panel session
i schools - panel sessioni schools - panel session
i schools - panel sessionARDC
 
Data science course in Moradabad.pdf
Data science course in Moradabad.pdfData science course in Moradabad.pdf
Data science course in Moradabad.pdfKajal Digital
 
Data Science (Moradabad).pdf
Data Science (Moradabad).pdfData Science (Moradabad).pdf
Data Science (Moradabad).pdfUmar khan
 
INFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docx
INFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docxINFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docx
INFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docxdirkrplav
 
Analytics & Data Science
Analytics & Data ScienceAnalytics & Data Science
Analytics & Data ScienceSupportGCI
 
Careers in Data Science and Analytics
Careers in Data Science and AnalyticsCareers in Data Science and Analytics
Careers in Data Science and AnalyticsSupportGCI
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014Lin Todd
 
Data Science Roadmap
Data Science RoadmapData Science Roadmap
Data Science RoadmapSupportGCI
 
How academic institutions best support PhDs and postdocs in the transition to...
How academic institutions best support PhDs and postdocs in the transition to...How academic institutions best support PhDs and postdocs in the transition to...
How academic institutions best support PhDs and postdocs in the transition to...AI Guild
 
STEM Education and Training at Northeast Campus of Tarrant Community College
STEM Education and Training at Northeast Campus of Tarrant Community CollegeSTEM Education and Training at Northeast Campus of Tarrant Community College
STEM Education and Training at Northeast Campus of Tarrant Community CollegeHEB Chamber of Commerce
 
Jumpstart a Lucrative Career in Data Science
Jumpstart a Lucrative Career in Data ScienceJumpstart a Lucrative Career in Data Science
Jumpstart a Lucrative Career in Data ScienceSharala Axryd
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnPraj H
 
Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023Careervira
 
These are topics we have worked in residency week in group project
These are topics we have worked in residency week in group projectThese are topics we have worked in residency week in group project
These are topics we have worked in residency week in group projectchestnutkaitlyn
 
These are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docxThese are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docxrandymartin91030
 

Similaire à What is a data scientist - a presentation I made to the Canberra IAPA (20)

from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
i schools - panel session
i schools - panel sessioni schools - panel session
i schools - panel session
 
Data science course in Moradabad.pdf
Data science course in Moradabad.pdfData science course in Moradabad.pdf
Data science course in Moradabad.pdf
 
Data Science (Moradabad).pdf
Data Science (Moradabad).pdfData Science (Moradabad).pdf
Data Science (Moradabad).pdf
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
INFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docx
INFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docxINFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docx
INFORMATION TECHNOLOGY AND A NALYTICS • SUBHASHISH SAMAD.docx
 
Data Scientist
Data ScientistData Scientist
Data Scientist
 
The field-guide-to-data-science
The field-guide-to-data-scienceThe field-guide-to-data-science
The field-guide-to-data-science
 
Analytics & Data Science
Analytics & Data ScienceAnalytics & Data Science
Analytics & Data Science
 
Careers in Data Science and Analytics
Careers in Data Science and AnalyticsCareers in Data Science and Analytics
Careers in Data Science and Analytics
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
Data Science Roadmap
Data Science RoadmapData Science Roadmap
Data Science Roadmap
 
How academic institutions best support PhDs and postdocs in the transition to...
How academic institutions best support PhDs and postdocs in the transition to...How academic institutions best support PhDs and postdocs in the transition to...
How academic institutions best support PhDs and postdocs in the transition to...
 
STEM Education and Training at Northeast Campus of Tarrant Community College
STEM Education and Training at Northeast Campus of Tarrant Community CollegeSTEM Education and Training at Northeast Campus of Tarrant Community College
STEM Education and Training at Northeast Campus of Tarrant Community College
 
Jumpstart a Lucrative Career in Data Science
Jumpstart a Lucrative Career in Data ScienceJumpstart a Lucrative Career in Data Science
Jumpstart a Lucrative Career in Data Science
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
 
Seagate_1
Seagate_1Seagate_1
Seagate_1
 
Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023Data Analyst Beginner Guide for 2023
Data Analyst Beginner Guide for 2023
 
These are topics we have worked in residency week in group project
These are topics we have worked in residency week in group projectThese are topics we have worked in residency week in group project
These are topics we have worked in residency week in group project
 
These are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docxThese are topics we have worked in residency week in group project.docx
These are topics we have worked in residency week in group project.docx
 

Dernier

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 

Dernier (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 

What is a data scientist - a presentation I made to the Canberra IAPA

  • 1. What is a Data Scientist? Authored By Russell Tibballs MACS CP MSR
  • 2. Caveat  This slideshow does not represent the views of the company I work for. It represents my evolving views at this point in time and is mainly intended to provoke thought and discussion. Authored By Russell Tibballs MACS CP MSR
  • 3. Google searches for “data scientist” 2011 saw the rise of “Big Data” and the term Data Scientist. 2011 saw the release of Money Ball, starring Brad Pitt as a geek. 2012 Nate Silver correctly predicted the winner of all 50 states and the District of Columbia when the pundits were claimingObama had lost. Authored By Russell Tibballs MACS CP MSR
  • 4. So how is a Data Scientist portrayed - Super Human ‘Data Scientists perform data science.They use technology and skills to increase awareness, clarity and direction for those working with data.The data scientist role is here to accommodate the rapid changes that occur in our modern day environment and are bestowed the task of minimising the disruption that technology and data is having on the way we work, play and learn. Data Scientists don’t just present data, data scientists present data with an intelligence awareness of the consequences of presenting that data.’ Authored By Russell Tibballs MACS CP MSR
  • 5. Super Human Continued A large IT company – ‘What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.’ Authored By Russell Tibballs MACS CP MSR
  • 6. The Super Technologist Authored By Russell Tibballs MACS CP MSR
  • 7. DATA SCIENCE Mark Biernbaum suggests ‘Data Science is going 99% too fast’. His complaint is that the “science’ is not peer-reviewed and the techniques are often questionable. He believes Data Scientists should slow down, specialize, and above all - have the methodologies peer-reviewed. Authored By Russell Tibballs MACS CP MSR
  • 8. My Problem with Current Definitions I have been to a number of industry briefings where supplied definitions are often very ‘pie in sky’ and elitist.The definitions are designed to indicate ‘you can’t possibly do this yourself and there is no way any of your existing staff will qualify for the role’.This may not be intended; however it is the result. Authored By Russell Tibballs MACS CP MSR
  • 9. So What can we do about that?  Recognise that Data Science is a science that has a broad brush stroke across all industry sectors.  Recognise that there are many specialty areas.  Recognise that it is not a technological implementation.  Recognise that there will be many levels of expertise. Authored By Russell Tibballs MACS CP MSR
  • 10. Recognise that there are many specialty areas. There is not one version of data science.  There is data science applicable to the research sectors of Maths, Physics, Meteorology, and Medicine that will rarely be applied elsewhere.  There is data science our friends in the NSA and local equivalents will specialise in.  There is the data science economist and financial sectors will specialise in.  Etc, etc … ad nauseam. Authored By Russell Tibballs MACS CP MSR
  • 11. Recognise that it is not a technological implementation.  Being able to query unstructured data in a HDFS does not make you a data scientist.  Being able to analyse Splunk data does not make you a data scientist.  Being able to filter petabytes of data on a MPP RDBMS does not make you a data scientist. Authored By Russell Tibballs MACS CP MSR
  • 12. So What is a Scientist. Authored By Russell Tibballs MACS CP MSR
  • 13. The important aspects of any definition of a Job Title.  The most important thing to remember here is that we are talking about a JobTitle, and a JobTitle should be meaningful.  Secondly what should qualify someone for that title. Authored By Russell Tibballs MACS CP MSR
  • 14. So what is important about the title ‘Data Scientist’?. Authored By Russell Tibballs MACS CP MSR
  • 15. Where did this title originate? ‘On November 10, 1998, he (JeffWu) gave his inaugural lecture entitled “Statistics = Data Science?” in honor of his appointment to the H. C. Carver Collegiate Professorship in Statistics at the University of Michigan.[14] In this lecture, he first focused on the identity of statistics in science. He then characterized statistical work as data collection, data modeling and analysis, and problem solving and decision making. In conclusion, he proposed that statistics be renamed to Data Science. Authored By Russell Tibballs MACS CP MSR
  • 16. So What is a Scientist? From the Oxford Dictionary: ‘A person who is studying or has expert knowledge of one or more of the natural or physical sciences :a research scientist’. Note. A scientist is not necessarily a research scientist; they can be a practicing expert in a field. However all scientists share one feature, they are trained in a science and they apply scientific method to obtain understanding of a focus of interest, and their methods and conclusions are subject to peer review. Authored By Russell Tibballs MACS CP MSR
  • 17. A comment from a recently retired Scientist My neighbor has recently retired after a long career as a scientist and academic.We were discussing the increasing growing exclusivity of the term scientist a few weekends ago. In his words, ‘In the 1970s a scientist had degree, by mid 80s they needed honors, in the 90s they needed a masters or PHD, now they need several Post-Doctoral projects under their belt to be considered a ‘real’ scientist.’ However, he believes someone who is qualified (has a science degree) and who is practicing their studied discipline, is a scientist. Authored By Russell Tibballs MACS CP MSR
  • 18. A Slight Detour. What qualifies a professional  I see the Data Scientist as a specialty of the Computer Science profession.  We have lawyers who specialise in corporate, family, criminal, and other aspects of the law.  The accounting, architecture, engineering, teaching and medical professions have several specialties and recognised levels of expertise in each field.  These professional’s have academic training, and in many cases acceptance by a professional body is what makes them acceptable as professionals in the public eye.That is a model I strongly believe the ICT industry needs to adopt or at least move towards.  I believe the academic achievement makes the qualification.The acceptance by a professional body should give standing within the profession and wider community. Authored By Russell Tibballs MACS CP MSR
  • 19. The Australian Qualifications Framework - AQF  The AQF has 10 levels  Level 1 – Certificate I  Level 2 – Certificate II  Level 3 – Certificate III  Level 4 – Certificate IV  Level 5 – Diploma  Level 6 – Advanced Diploma,Associate Degree.  Level 7 – Bachelor Degree  Level 8 – Bachelor Honors Degree, Graduate Certificate, Graduate Diploma  Level 9 – Masters Degree  Level 10 – Doctoral Degree Authored By Russell Tibballs MACS CP MSR
  • 20. A THE BOTTOM LEVEL OF THIS SPECTRUM OF QUALIFICATIONS. Summary Graduates at this level will have knowledge and skills for initial work, community involvement and/or further learning Knowledge Graduates at this level will have foundational knowledge for everyday life, further learning and preparation for initial work Skills Graduates at this level will have foundational cognitive, technical and communication skills to: •undertake defined routine activities •identify and report simple issues and problems Application of knowledge and skills: Graduates at this level will apply knowledge and skills to demonstrate autonomy in highly structured and stable contexts and within narrow parameters Authored By Russell Tibballs MACS CP MSR
  • 21. At the highest level of the spectrum of the AQF 10 – The Doctorate Summary Graduates at this level will have systematic and critical understanding of a complex field of learning and specialised research skills for the advancement of learning and/or for professional practice Knowledge Graduates at this level will have systemic and critical understanding of a substantial and complex body of knowledge at the frontier of a discipline or area of professional practice Skills Graduates at this level will have expert, specialised cognitive, technical and research skills in a discipline area to independently and systematically:  engage in critical reflection, synthesis and evaluation  develop, adapt and implement research methodologies to extend and redefine existing knowledge or professional practice  disseminate and promote new insights to peers and the community  generate original knowledge and understanding to make a substantial contribution to a discipline or area of professional practice Application of knowledge and skills Graduates at this level will apply knowledge and skills to demonstrate autonomy, authoritative judgment, adaptability and responsibility as an expert and leading practitioner or scholar Authored By Russell Tibballs MACS CP MSR
  • 22. The Degree Summary Graduates at this level will have broad and coherent knowledge and skills for professional work and/or further learning Knowledge Graduates at this level will have broad and coherent theoretical and technical knowledge with depth in one or more disciplines or areas of practice Skills Graduates at this level will have well-developed cognitive, technical and communication skills to select and apply methods and technologies to:  analyse and evaluate information to complete a range of activities  analyse, generate and transmit solutions to unpredictable and sometimes complex problems  transmit knowledge, skills and ideas to others Application of knowledge and skillsGraduates at this level will apply knowledge and skills to demonstrate autonomy, well-developed judgement and responsibility:  in contexts that require self-directed work and learning  within broad parameters to provide specialist advice and functions Authored By Russell Tibballs MACS CP MSR
  • 23. The Vendor’s Course  The vendors course will usually be about how to apply a tool to a problem.  It is not generally designed to provide you with knowledge that can be applied outside the scope of their tool’s environment.  It would generally not qualify within the AFQ guidelines. Authored By Russell Tibballs MACS CP MSR
  • 24. So how does the AQF apply to the question of Data Science If the person working in the field of applying ‘Data Science’ has a degree (AQF level 6 or above) in a related subject, ie Maths, Statistics, or Economics; or a higher degree including Grad Cert and Diplomas they can be expected to:  apply knowledge and skills to demonstrate autonomy, well-developed judgment and responsibility:  in contexts that require self-directed work and learning  within broad parameters to provide specialist advice and functions Authored By Russell Tibballs MACS CP MSR
  • 25. Quo Bono. Who benefits from this approach  The Public - they will have greater confidence in the profession.  The employer – they get the assurance that employee has the skills at the right levels to do the work.  The employee – because they will know what is expected of them and know they will be able to deliver.  The professional body and industry through greater faith and confidence by the public in the profession in general. Authored By Russell Tibballs MACS CP MSR
  • 26. But!!!  There needs to be demand from within the industry for this to happen.  Some group like the IAPA needs to take on the responsibility of working out the Professional specialisations and required frameworks for acceptance of professional into those specialisations. Authored By Russell Tibballs MACS CP MSR

Notes de l'éditeur

  1. The term "data scientist" started to rise rapidly around 2011 and almost caught up with "statistician". Searches for "data scientist" surpassed the searches for "data miner" in 2012. The chart below shows Google Trends for "Statistician", "Data Scientist", and "Data Miner" from Jan 2008 to Dec 2013.
  2. For Graph go to http://www.google.com/trends/explore#q=Statistician%2C%20%22Data%20Scientist%22%2C%20%22Data%20Miner%22&date=1%2F2007%2084m&cmpt=q
  3. http://www.datascientists.net/what-is-data-science If this had stopped at the first couple of sentences I would have been happier.
  4. I think the picture can do without the ‘go away if’ arm. All these things are good; however this expresses an ideal. Not a reality. For the quote. This has been written by a communications specialist who is telling someone ‘if you get the right person they will solve all your problems’. I believe in the Easter Bunny and Santa Claus too. On the communications side. A friend of the family works as a ‘Science Communicator ‘for a large pharmaceutical in London. Maybe if data science is that important, it will lead to ‘data communicators’ – possibly Nate Silver already falls in that camp. However if you to almost any profession there those that can do; and those who understand what they can do and can do it, and those who can do and communicate what it is they are doing. The communicator does not always rise to the top of the heap as technicians tend to respect technical ability above communications ability.
  5. It is impossible to have all these skills, however some of them would be useful. Many of these tools will quickly become redundant as newer tools and methods evolve. The data access components will be merged into simpler interfaces and existing tools. http://nirvacana.com/thoughts/becoming-a-data-scientist/ To be fair, Swami indicates this is a Roadmap to follow and is also getting people to think about what is a data scientist. He also indicates this is far from complete. I may be misinterpreting him; however it appears to that you need to an expert each stop, which seems a tall ask. By the time you have learnt many of these skills a fair percentage what you have learnt will be redundant as new tools and techniques replace them. Which is one of the joys of working in this field; you will never have time to get bored as you need to maintain continual learning to stay relevant.
  6. http://www.kdnuggets.com/2014/01/biernbaum-data-science-99-percent-too-fast.html From what I have seen tend to agree. To Quote Steven Brobst (Teradata CTO) 20140402 – Teradata Summit Series, ‘IT people love to chase shiny objects’.
  7. I am talking about vendor presentations and a few from Industry Special Interest Groups. I believe in most organisations there are staff who can be moulded to fill the required Data Science capability.
  8. http://www.abc.net.au/news/2011-12-21/albert-einstein-sticks-out-his-tongue-at-photographers/3742064 I picked this photo because everyone thinks of Einstien when they think of science Bottom Right is Ed Deiner who has studying Well Being for decades. These guys graduated as Geologists from the University of Wisconsin – They are science graduates and recognised specialists.
  9. From Wikipedia. Note the problem solving and decision making component. Using that definition anyone who has a substantial statistics and research component to their degree such as maths, economics, science, and social science graduates who works in an analytics capacity are data scientists. Therefore if they are qualified and practicing in an information analytics capacity they are data scientists. data science and statisticians data scientists.[14] Later, he presented his lecture entitled “Statistics = Data Science?” as the first of his 1998 P.C. Mahalanobis Memorial Lectures.[15]’ C.F. Jeff Wu is the Coca-Cola Chair in Engineering Statistics and Professor in the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology.
  10. Peer review does not make so much sense outside the academic realm. However it does help get rid of silly mistakes and helps ensure that outcomes are repeatable. Good method will ensure that if you repeat a process you will get the same result. A surprisingly rare feat in the wilds of data analysis. Good analysis has purpose, context, and strong methodology. So outside it equates to the “active research cycle” with a ‘business’ focus. Active Research Cycle from http://creativeeducator.tech4learning.com/v07/articles/Embracing_Action_Research
  11. Why am I bothering with this anecdote. It is designed to show the creep in requirements overtime for a skill. PS Thomas also noted the so called scientists no longer practised their craft – they spend their life applying for grants and networking. The post grad students do the work. He spent his last years pre-vetting submissions for publications by doctoral students.
  12. The most important part is specialties or streams each with recognised levels of achievement and expertise. In terms of ‘Data Science’, just because you do not have a certification should not mean you will not be able to do certain work; it would mean that the Professional body would not be endorsing your ability to do that work.
  13. In Australia we have the Australian Qualifications Framework. ‘The AQF is the national policy for regulated qualifications in Australian education and training. It incorporates the qualifications from each education and training sector into a single comprehensive national qualifications framework. The AQF was first introduced in 1995 to underpin the national system of qualifications in Australia encompassing higher education, vocational education and training and schools.’ Where there are existing frameworks that are working make use them. http://www.aqf.edu.au/aqf/in-detail/aqf-levels/
  14. This is where I see most vendor courses are sitting. They train to use a tool. In regard to ‘Data Science’, a course on Legal Privacy requirements at this level could and possibly should be compulsory.
  15. Obviously people at this level in the hard and soft sciences have a demonstrated capacity to apply a level of qualitative, quantitative, or both analysis through the lense of the research cycle to provide significant insight. These people should be able to communicate exceptionally well. The argument put forward is often that they focus is narrow and should not be used outside that sphere. I once heard a Oxford Professor state that ‘Oxford Phd graduates can learn any new subject and be an expert within 2 weeks’. Probably an exaggeration; however it does highlight the issue that this level of achievement is generally an attribute of the graduate which shows the general ability to learn and communicate ideas at a high level. When I have quizzed a number of speakers after presentations that bemoaned the lack of “data science candidates’ I would ask about the 10s of thousand of Higher degree, and research graduates. Then they would agree that the problem is not so much the lack of ‘Data Scientists’ as a lack of manager who can comprehend what data scientists are talking about.
  16. A graduate of the hard and soft sciences should be able to apply analytic tools to evaluate information and transmit solutions to complex problems. I believe this is the starting level for a Data Scientist.
  17. Accreditation to use a tool is just that. It is not really a recognisable qualification. Often it is really telling you how to use a tool and little more. There are some Vendors whose courses are imbedded in Unervisity curriculums. However that is not the norm.
  18. This is the end of the equation where people should qualify as a Professional. Below that we are really applying a tool. There are many other academic streams that would fit into this model. Basically anything where you have to use Data Analysis to apply scientific method. Ie Pyschology, engineering and others.