In this webinar for ICT Professionals Ghana, we explore the concepts of data science and its motivations as a recent specialization. creating the background for how Artificial Intelligence relates to Machine Learning and to Deep Learning. We further discuss the data science technology stack and the opportunities that exist in the space.
2. What Shall we talk about today?
● Data Science | AI, what is it
about and key enablers!
● Why should we care?
● Opportunities?
● Careers & Learning Paths,
Technology Stack
● Questions
Emmanuel Asimadi
easimadi
3. Data Science | AI, what is it about?
Let’s keep it simple
- The science of deriving value from data
Data Scientist (n.): Person who is better at statistics than any
software engineer and better at software engineering than any
statistician ~ Josh Wills tweet 2012
Data science is an interdisciplinary field that uses scientific
methods, processes, algorithms and systems to extract
knowledge and insights from data in various forms, both
structured and unstructured ~ wikipedia
4. How did we get? -
some of the transformations that enabled data science
Solution
Data Science to the rescue.
Apache Spark, Machine Learning, Deep Learning...etc.
Vertical
Scaling
Monolytic Database Systems, SQL, BI,
scale-up get a bigger Machine
Horizontal
Scaling
Hadoop mapreduce, scale horizontally, get more
commodity machines not bigger server, Separation
of Storage from compute
Web Services /
REST API
Enabling applications to share data via well defined
end-points. These have underachieved because the goal
was for discoverable and even autonomous web services.
Cloud Computing
& Big Data
pioneered by Amazon, rent computing & storage
resources, Store Big Data cheaply, Big Data [Volume,
Velocity, Variety]
5. Artificial Intelligence
Source: nvidia Additional Reading: Data Science Central (holds a different view)
Artificial Intelligence
General techniques to get machines to
achieve human level intelligence.
evaluated by TEST like
- Turing TEST
- Robot College student TEST
- Employment TEST
Machine Learning
Primarily statistical & other techniques
that help machines learn from
“experience”.
Deep Learning
A subset of machine learning
algorithms based on neural networks
mimicking how the brain works.
Essentially building more complex
functions.
6. You are already familiar with Machine Learning!!
3
5
9
?
1
2
4
5
f(x)
= 11
f(x) = 2X + 1 is the ML Model.
The goal of training is to find this function f(x).
Input
X
Output
Y
In reality
Input
X
Output
Y
Car Make Car Age Car Price
8. Your Organisation’s Data on STEROIDS
Deriving new* value and MONETISING data
Traditionally
- Data as a cost center
New Paradigm
● Data is an asset (can enable new Revenue streams)
● Data is a key differentiator
● Data Creates a new barrier to entry for your
competition
9. Your Organisation’s Data on STEROIDS
Deriving new* value and MONETISING data
Analytic Spectrum
Where does your business sit?
10. Why Should We Care?
Its pervasive and affects every industry - Medicine, Literature, Journalism, Agriculture name it.
13. Job Profiles
Business Problem Production
Domain Experts Business Analysts Data Scientist Data EngineerData Architect
Devops
Machine Learning EngineerBI Developer
Visualisation Developer
Software Developer
linux
Cloud computing
OS
Networking
SQL
Python
R
Scala
Qlik
Tableau
Looker
Data Visualisation
Apache Spark
NoSQL
Statistics & Quantitative
Analysis Apache Hadoop
Machine Learning
PowerBI
Data Mining
Creative Problem Solving
Business awareness
Sample Technical Skills:
14. Technology Stack biased towards Amazon AWS :) but captures key concepts
Source: AWS Tutorial
Other Cloud
Providers can meet
similar requirement
- google, microsoft
azure, etc
None of the major cloud
providers has a
datacenter in Africa.
- It is an issue if you have
data locality concerns.
- Most technologies are
open source so can be
implemented local
datacenters.
15. Example Learning Path Apache Spark - a big data framework
Foundation Deep Dive
What’s New Spark 2.x
Review of what’s new in Spark,
Data Structures, Key Concepts
and Operations. language!!!
Working With Spark
ML Models
Intuitively Understand, featurize,
build, evaluate and deploy Spark
Machine Learning (ML) Models.
Structured Streaming
Understand Streaming and deploy
your own ML-based structured
streaming application
Natural Language
Processing
Build Natural Language
applications working with spark
and other libraries.
Deep Learning
Understand & Apply
Spark Vs Deep learning
Use-cases
SparkSQL & Graphs
Working with SparkSQL, GraphX
and Graphframes
16. Example Learning Path Python For Data Science
Foundation Deep Dive
Intro to Data Science & Python
Key Concepts, Basics of Programming,
Data Structures (collections), Operations
(comprehensions) and Navigating help.
Data Science Libraries
Pandas, Numpy, Matplotlib,
seaborn, bokeh, sklearn, scipy
Machine Learning
Sklearn, Scipy… etc
Natural Language
Processing
Spacy, NLTK
Explore & Apply
Deep learning etc.
explore and keep
applying knowledge
Similar for R, Scala etc
17. Emml !!!! such (job) opportunities (almost) don’t exist in our world!
How can we find or create them?
Data Science Competitions
Kaggle
- Competitions $
- learn
Freelancer / Entrepreneur $$
- Upwork* (it works)
- BOAMI
- Guru, gigster Etc
Employment $$
- Demonstrate value and
get employed
18. Citizen Data Science Community
Using Data for Social Good & Learning
call-for-analysis
Analyse and build data applications
Output: inform, recommend act for social
good, learn, enable new business
call-for-data
Help us collect and publish interesting
Ghana Datasets for the community.
Output: published dataset
call-for-exploration
Explore and Curate the dataset for the
community
Output: cleaned dataset
01
03 02
#citizenDataScienceGh
#cdsgh
https://ds4good.github.io/ghana-datasets/ https://bit.ly/2BM74MK https://www.kaggle.com/citizen-ds-ghana
https://public.tableau.com/profile/datanix.ds4good#!/
19. Recab
● What is Data Science | AI and its
Motivation!
● Opportunities
● Career & Learning Paths, Technology Stack
● What can I do to take advantage?
20. Cloud Notebook Environments
don’t about worry about laptop capability or installations*
● Kaggle
● Google Colaboratory
● Azure Notebooks
And many more...
Apache Spark
don’t about worry about laptop capability or installations*
● Databrick Community Edition (Notebooks) - creators of spark
Resources
● interesting books/videos - O'reilly Media,Packt...etc
● MOOC - edX,Coursera,Udemy, Udacity...
● Social Media - not organised can be distracting.
● Blogs: towardsdatascience, KD Nuggets,analyticVidhya
● Youtube: socratica :), google adventures of AI...etc
Wanting to learn everything is like wanting to learn the dictionary
Any Interesting Resources?
You don’t need much start exploring