SlideShare a Scribd company logo
1 of 17
Download to read offline
PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
PyDriller: Python Framework for
Mining Software Repositories
Davide Spadini, Mauricio Aniche, Alberto Bacchelli
ishepard @DavideSpadini
What?
Framework to analyse Git (and soon Mercurial)
repositories
Why?
• There are already many frameworks for Git
• Generally, one for each programming language
• Java -> JGit
• Python -> GitPython
• Javascript -> nodegit
• etc.
So, why?
How many commands does Git have?
• > 20?
• > 50?
• > 100?
• > 150?
154!!
PyDriller
• Aim: to ease the extraction of information from Git repositories
• What is supported:
• analysing the history of a project
• retrieving commit information (date, message, authors, etc.)
• retrieving files information (diff, source code)
• What is not supported:
• writing on the repo (git pull, git push, git add, git commit,
etc..)
Demo
Statistics
• Everything is lazy evaluated, so you “pay” what you get.
1. only commit information:
immediate (as git log)
2. commit and file information:
60 commits/sec (1240 commits in 22 seconds)
3. commit, file and metrics information:
4 commits/s (1240 commits in ~5min)
Thank you for your support!
• Some numbers:
1. Downloaded approximatively 4000 times
2. 100 times only last 2 weeks
• Community driven
• University of Zurich, TU Delft and University of Catania teach
PyDriller in their MSR courses
• SIG uses PyDriller in their quality assessments
What’s next?
• A company asked me to implement
RepositoryMining().traverse_files()
• Mercurial support
• Ideas? Talk to me or submit a PR :)
PyDriller
• Source code: https://github.com/ishepard/pydriller
• Doc: https://pydriller.readthedocs.io/en/latest/
• Feel free to leave a star! :)

More Related Content

Similar to PyDriller: Python Framework for Mining Software Repositories

LTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoLTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoHidenori Matsuki
 
OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] Jose Manuel Ortega Candel
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고용 최
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonSmartBear
 
The Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentThe Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentFrederick Reiss
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIsTim Osborn
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container AppsICS
 
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석iFunFactory Inc.
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
PythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development WorkshopPythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development Workshoppythonsd
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internettkisason
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 
Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta
 
Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six MonthsAnthony Baker
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSkhangtoh
 

Similar to PyDriller: Python Framework for Mining Software Repositories (20)

LTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyoLTから入門するPython開発環境 #PyLadiesTokyo
LTから入門するPython開発環境 #PyLadiesTokyo
 
OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition] OSINT tools for security auditing [FOSDEM edition]
OSINT tools for security auditing [FOSDEM edition]
 
우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고우분투한국커뮤니티 수학스터디결과보고
우분투한국커뮤니티 수학스터디결과보고
 
고등수학 스터디 결과발표
고등수학 스터디 결과발표고등수학 스터디 결과발표
고등수학 스터디 결과발표
 
hotdog a TD tool for DD
hotdog a TD tool for DDhotdog a TD tool for DD
hotdog a TD tool for DD
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
 
The Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter DeploymentThe Five Stages of Enterprise Jupyter Deployment
The Five Stages of Enterprise Jupyter Deployment
 
Git for folk who like GUIs
Git for folk who like GUIsGit for folk who like GUIs
Git for folk who like GUIs
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container Apps
 
Azure Container Apps
Azure Container AppsAzure Container Apps
Azure Container Apps
 
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
[아이펀팩토리] 2018 데브데이 서버위더스 _04 리눅스 게임 서버 성능 분석
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
PythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development WorkshopPythonSD Test Driven Django Development Workshop
PythonSD Test Driven Django Development Workshop
 
OpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internetOpenFest 2012 : Leveraging the public internet
OpenFest 2012 : Leveraging the public internet
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
Github basics
Github basicsGithub basics
Github basics
 
Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020Smile Gupta - Hacktoberfest Celebration 2020
Smile Gupta - Hacktoberfest Celebration 2020
 
Apache Geode - The First Six Months
Apache Geode -  The First Six MonthsApache Geode -  The First Six Months
Apache Geode - The First Six Months
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Resumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUSResumable File Upload API using GridFS and TUS
Resumable File Upload API using GridFS and TUS
 

More from Delft University of Technology

More from Delft University of Technology (7)

Investigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test SmellsInvestigating Severity Thresholds for Test Smells
Investigating Severity Thresholds for Test Smells
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Test-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical StudyTest-Driven Code Review: An Empirical Study
Test-Driven Code Review: An Empirical Study
 
Practices and Tools for Better Software Testing
Practices and Tools for  Better Software TestingPractices and Tools for  Better Software Testing
Practices and Tools for Better Software Testing
 
When Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsWhen Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review Tests
 
On The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code QualityOn The Relation of Test Smells to Software Code Quality
On The Relation of Test Smells to Software Code Quality
 
To Mock or Not To Mock
To Mock or Not To MockTo Mock or Not To Mock
To Mock or Not To Mock
 

Recently uploaded

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 

Recently uploaded (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 

PyDriller: Python Framework for Mining Software Repositories

  • 1. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli
  • 2. PyDriller: Python Framework for Mining Software Repositories Davide Spadini, Mauricio Aniche, Alberto Bacchelli ishepard @DavideSpadini
  • 4. Framework to analyse Git (and soon Mercurial) repositories
  • 6. • There are already many frameworks for Git • Generally, one for each programming language • Java -> JGit • Python -> GitPython • Javascript -> nodegit • etc.
  • 8.
  • 9.
  • 10.
  • 11. How many commands does Git have? • > 20? • > 50? • > 100? • > 150? 154!!
  • 12. PyDriller • Aim: to ease the extraction of information from Git repositories • What is supported: • analysing the history of a project • retrieving commit information (date, message, authors, etc.) • retrieving files information (diff, source code) • What is not supported: • writing on the repo (git pull, git push, git add, git commit, etc..)
  • 13. Demo
  • 14. Statistics • Everything is lazy evaluated, so you “pay” what you get. 1. only commit information: immediate (as git log) 2. commit and file information: 60 commits/sec (1240 commits in 22 seconds) 3. commit, file and metrics information: 4 commits/s (1240 commits in ~5min)
  • 15. Thank you for your support! • Some numbers: 1. Downloaded approximatively 4000 times 2. 100 times only last 2 weeks • Community driven • University of Zurich, TU Delft and University of Catania teach PyDriller in their MSR courses • SIG uses PyDriller in their quality assessments
  • 16. What’s next? • A company asked me to implement RepositoryMining().traverse_files() • Mercurial support • Ideas? Talk to me or submit a PR :)
  • 17. PyDriller • Source code: https://github.com/ishepard/pydriller • Doc: https://pydriller.readthedocs.io/en/latest/ • Feel free to leave a star! :)