Short presentation about my final project at Zipfian Academy about quantifying Data Scientist profiles using Linkedin data.
The prototype web app is available at: bit.ly/cybads
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.
1. Could You Be a Data Scientist?
Carlo Torniai, Ph.D.
@carlotorniai
2. Goal
• Quantify data scientist profiles features
• Analyze aspirant data scientist profiles
• Provide useful feedback
?
3. Why this is relevant?
• A quantitative characterization of data scientists
profiles can help closing the loop between job
seekers and recruiters
Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
4. Data Collection
• Linkedin API:
– General Information
– Past work history
– Education
• Web Scraping:
– Skills
• 1500 profiles
– Data Scientists
– Software Engineer
– Business Analysts
– Mathematicians
– Statisticians
5. Data Analysis
Feature Extraction
Software Engineers
Business Analysts
Data scientists
Statisticians
Mathematicians
6. Data Analysis
Feature Extraction
Astronomy
Bioinformatics
Biology
Computer
Science
Economics
Electronics
Engineering
Math
Neuroscience
Other
Physics
Psychology
Stats
Number of PhDs by topic and profiles
7. Model Testing
For the purpose of this project I trained with skills and
education features the following models:
Random Forest
• Classify the profile
Naïve Bayes
• Multi class probabilities to asses profiles
background components
K-means
• Capability of suggesting similar and relevant profiles
8. Model Testing
For the purpose of this project I trained with skills and
education features the following models:
Model Training set Purpose
Random
Forest
All 5 categories Classify the profile
Naïve Bayes 4 classic
categories: SE, BA,
MT, ST
Asses profile backgrounds
components with multi class
probabilities
K-means All 5 categories Identify similar profiles
12. Next Steps
Data Collection
Data Analysis
Feature Extraction
Model Testing Data Product
Get more data:
- Other websites
- Indeed
- User input on
Web app
- Fine grained
parsing of
education
- Experiment with
additional features
(industry, years of
experience)
• Extend feature set
and test more
models
• Fuzzy C-means
• Add interactive
data collection
• Personalized links
for skills
• Explanation about
similarity results
Close the loop by analyzing job offers and suggest
matching profiles
13. Thank you!
Technologies
Web App:
Flask, jQuery, Vega, MongoDB
NMF, HC, RF ,DT, NB, K-means models::
scikit-learn
Visualizations:
Vincent, Vega, NetworkX, Gephi
Acknowledgement
yatish27 : Ruby Linkedin public profile Web Scraper
ozgut : Linkedin API Python wrapper