SlideShare une entreprise Scribd logo
1  sur  13
Could You Be a Data Scientist? 
Carlo Torniai, Ph.D. 
@carlotorniai
Goal 
• Quantify data scientist profiles features 
• Analyze aspirant data scientist profiles 
• Provide useful feedback 
?
Why this is relevant? 
• A quantitative characterization of data scientists 
profiles can help closing the loop between job 
seekers and recruiters 
Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
Data Collection 
• Linkedin API: 
– General Information 
– Past work history 
– Education 
• Web Scraping: 
– Skills 
• 1500 profiles 
– Data Scientists 
– Software Engineer 
– Business Analysts 
– Mathematicians 
– Statisticians
Data Analysis 
Feature Extraction 
Software Engineers 
Business Analysts 
Data scientists 
Statisticians 
Mathematicians
Data Analysis 
Feature Extraction 
Astronomy 
Bioinformatics 
Biology 
Computer 
Science 
Economics 
Electronics 
Engineering 
Math 
Neuroscience 
Other 
Physics 
Psychology 
Stats 
Number of PhDs by topic and profiles
Model Testing 
For the purpose of this project I trained with skills and 
education features the following models: 
Random Forest 
• Classify the profile 
Naïve Bayes 
• Multi class probabilities to asses profiles 
background components 
K-means 
• Capability of suggesting similar and relevant profiles
Model Testing 
For the purpose of this project I trained with skills and 
education features the following models: 
Model Training set Purpose 
Random 
Forest 
All 5 categories Classify the profile 
Naïve Bayes 4 classic 
categories: SE, BA, 
MT, ST 
Asses profile backgrounds 
components with multi class 
probabilities 
K-means All 5 categories Identify similar profiles
Data Product 
bit.ly/cybads
Data Product 
Naïve Bayes 
Multi class 
probabilities 
Random Forest
Data Product 
K-means 
clustering
Next Steps 
Data Collection 
Data Analysis 
Feature Extraction 
Model Testing Data Product 
Get more data: 
- Other websites 
- Indeed 
- User input on 
Web app 
- Fine grained 
parsing of 
education 
- Experiment with 
additional features 
(industry, years of 
experience) 
• Extend feature set 
and test more 
models 
• Fuzzy C-means 
• Add interactive 
data collection 
• Personalized links 
for skills 
• Explanation about 
similarity results 
Close the loop by analyzing job offers and suggest 
matching profiles
Thank you! 
Technologies 
Web App: 
Flask, jQuery, Vega, MongoDB 
NMF, HC, RF ,DT, NB, K-means models:: 
scikit-learn 
Visualizations: 
Vincent, Vega, NetworkX, Gephi 
Acknowledgement 
yatish27 : Ruby Linkedin public profile Web Scraper 
ozgut : Linkedin API Python wrapper

Contenu connexe

Tendances

FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyDatabricks
 
Cvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML modelCvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML modelLviv Startup Club
 
Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...Databricks
 
How to Use Social Media for Recruitment
How to Use Social Media for RecruitmentHow to Use Social Media for Recruitment
How to Use Social Media for RecruitmentJosé Kadlec
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesNeo4j
 
Data & AI Session @ RBS
Data & AI Session @ RBSData & AI Session @ RBS
Data & AI Session @ RBSAnkit Rathi
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
AI in the Enterprise
AI in the EnterpriseAI in the Enterprise
AI in the EnterpriseRon Bodkin
 
Strategy toolbox for startsups
Strategy toolbox for startsupsStrategy toolbox for startsups
Strategy toolbox for startsupsAsher Sterkin
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
 
Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...Data Con LA
 
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Databricks
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesConnected Data World
 
MLCommons: Better ML for Everyone
MLCommons: Better ML for EveryoneMLCommons: Better ML for Everyone
MLCommons: Better ML for EveryoneDatabricks
 
Big Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueBig Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueDan Mallinger
 
Kasuria - Lead Backend Developer
Kasuria - Lead Backend DeveloperKasuria - Lead Backend Developer
Kasuria - Lead Backend DeveloperKasuriaGmbH
 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Eugene Yan Ziyou
 
Data science with python certification training course with
Data science with python certification training course withData science with python certification training course with
Data science with python certification training course withkiruthikab6
 

Tendances (20)

FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and Cheaply
 
Cvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML modelCvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML model
 
Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...
 
How to Use Social Media for Recruitment
How to Use Social Media for RecruitmentHow to Use Social Media for Recruitment
How to Use Social Media for Recruitment
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial Times
 
Data & AI Session @ RBS
Data & AI Session @ RBSData & AI Session @ RBS
Data & AI Session @ RBS
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
AI in the Enterprise
AI in the EnterpriseAI in the Enterprise
AI in the Enterprise
 
Strategy toolbox for startsups
Strategy toolbox for startsupsStrategy toolbox for startsups
Strategy toolbox for startsups
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...
 
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
 
MLCommons: Better ML for Everyone
MLCommons: Better ML for EveryoneMLCommons: Better ML for Everyone
MLCommons: Better ML for Everyone
 
Big Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueBig Analytics: Building Lasting Value
Big Analytics: Building Lasting Value
 
Kasuria - Lead Backend Developer
Kasuria - Lead Backend DeveloperKasuria - Lead Backend Developer
Kasuria - Lead Backend Developer
 
Data Warehousing Trends
Data Warehousing TrendsData Warehousing Trends
Data Warehousing Trends
 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
 
Data science with python certification training course with
Data science with python certification training course withData science with python certification training course with
Data science with python certification training course with
 

En vedette

Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social GoodCarlo Torniai
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart ManufacturingCarlo Torniai
 
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...Carlo Torniai
 
ITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION
 
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktykaWłodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktykaAnalyticsConf
 
Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science? Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science? Carlo Torniai
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...Carlo Torniai
 
Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...SAVO
 
User empathy-with-acf
User empathy-with-acfUser empathy-with-acf
User empathy-with-acfDavid Evans
 

En vedette (10)

Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social Good
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart Manufacturing
 
Torniai icbo
Torniai icboTorniai icbo
Torniai icbo
 
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
 
ITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clients
 
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktykaWłodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
 
Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science? Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science?
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...
 
Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...
 
User empathy-with-acf
User empathy-with-acfUser empathy-with-acf
User empathy-with-acf
 

Similaire à Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.

Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent searchRyan Wu
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsQamar un Nisa
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptxHakkinsRaj
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in EuropeSteven Miller
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistLisa Cohen
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
Career in Data Using Tableau
Career in Data Using TableauCareer in Data Using Tableau
Career in Data Using TableauJen Vaughan
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Fried data summit big data for lob content
Fried data summit big data for lob contentFried data summit big data for lob content
Fried data summit big data for lob contentJeff Fried
 

Similaire à Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API. (20)

Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent search
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM Tools
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptx
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in Europe
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Career in Data Using Tableau
Career in Data Using TableauCareer in Data Using Tableau
Career in Data Using Tableau
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Fried data summit big data for lob content
Fried data summit big data for lob contentFried data summit big data for lob content
Fried data summit big data for lob content
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.

  • 1. Could You Be a Data Scientist? Carlo Torniai, Ph.D. @carlotorniai
  • 2. Goal • Quantify data scientist profiles features • Analyze aspirant data scientist profiles • Provide useful feedback ?
  • 3. Why this is relevant? • A quantitative characterization of data scientists profiles can help closing the loop between job seekers and recruiters Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
  • 4. Data Collection • Linkedin API: – General Information – Past work history – Education • Web Scraping: – Skills • 1500 profiles – Data Scientists – Software Engineer – Business Analysts – Mathematicians – Statisticians
  • 5. Data Analysis Feature Extraction Software Engineers Business Analysts Data scientists Statisticians Mathematicians
  • 6. Data Analysis Feature Extraction Astronomy Bioinformatics Biology Computer Science Economics Electronics Engineering Math Neuroscience Other Physics Psychology Stats Number of PhDs by topic and profiles
  • 7. Model Testing For the purpose of this project I trained with skills and education features the following models: Random Forest • Classify the profile Naïve Bayes • Multi class probabilities to asses profiles background components K-means • Capability of suggesting similar and relevant profiles
  • 8. Model Testing For the purpose of this project I trained with skills and education features the following models: Model Training set Purpose Random Forest All 5 categories Classify the profile Naïve Bayes 4 classic categories: SE, BA, MT, ST Asses profile backgrounds components with multi class probabilities K-means All 5 categories Identify similar profiles
  • 10. Data Product Naïve Bayes Multi class probabilities Random Forest
  • 11. Data Product K-means clustering
  • 12. Next Steps Data Collection Data Analysis Feature Extraction Model Testing Data Product Get more data: - Other websites - Indeed - User input on Web app - Fine grained parsing of education - Experiment with additional features (industry, years of experience) • Extend feature set and test more models • Fuzzy C-means • Add interactive data collection • Personalized links for skills • Explanation about similarity results Close the loop by analyzing job offers and suggest matching profiles
  • 13. Thank you! Technologies Web App: Flask, jQuery, Vega, MongoDB NMF, HC, RF ,DT, NB, K-means models:: scikit-learn Visualizations: Vincent, Vega, NetworkX, Gephi Acknowledgement yatish27 : Ruby Linkedin public profile Web Scraper ozgut : Linkedin API Python wrapper