SlideShare a Scribd company logo
1 of 22
Download to read offline
Mining Python
Software
Sarah Mount - @snim2
What do you want to know today?
What do we know about software?
● How to make it correct
● How long it will take to write
● Expected bugs per kloc
Er … yeah.
Health warning...
This is a work in progress,
don’t take the numbers and
charts too seriously just yet...
Options for mining Python software
<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>success</status>
<result>
<project>
<id>1</id>
<name>Subversion</name>
<created_at>2006-10-10T15:51:31Z</created_at>
<updated_at>2007-08-22T17:31:17Z</updated_at>
<homepage_url>http://subversion.tigris.org/</homepage_url>
<download_url>http://subversion.tigris.org/...
</download_url>
<updated_at>2007-07-12T12:21:11Z</updated_at>
<logged_at>2007-07-12T12:18:54Z</logged_at>
<min_month>2001-08-01T00:00:00Z</min_month>
<max_month>2007-07-01T00:00:00Z</max_month>
...
{
"repository":{
"url":"https://github.com/igrigorik/spdy",
"has_downloads":false,
"created_at":"2012/01/19 14:15:34 -0800",
"has_issues":true,
"description":"SPDY is an experiment with protocols for the web",
"forks":10,
"fork":false,
"has_wiki":false,
"homepage":"http://www.igvita.com/2011/04/07/life-beyond-http-11-googles-spdy/",
"size":420,
"private":false,
"name":"spdy",
"owner":"igrigorik",
"open_issues":4,
"watchers":206,
"pushed_at":"2012/01/11 10:38:16 -0700",
"language":"Ruby"
},
"created_at":"2012/02/11 10:38:16 -0700",
"public":true,
"actor":"igrigorik",
"payload":{
"head":"98f44cab69becb274c6f3b9035ef8e0bd7b2b1b7",
"size":1,
...
],
"ref":"refs/heads/master"
},
"url":"https://github.com/igrigorik/spdy/compare/5b74597e88...98f44cab69b",
"type":"PushEvent"
}
Google bigquery interface
/* top 100 repos for Ruby by number of pushes */
SELECT repository_name, count(repository_name) as pushes, repository_description,
repository_url
FROM [githubarchive:github.timeline]
WHERE type="PushEvent"
AND repository_language="Ruby"
AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00')
GROUP BY repository_name, repository_description, repository_url
ORDER BY pushes DESC
LIMIT 100
Some preliminary work
Code clones
Type 1: Identical code, copy & pasted
Type 2: Identical code modulo names, layout,
comments, etc.
Type 3: Type 2 plus further modifications such
as changes in statements
Type 4: Different code, same semantics
Roy & Cordy (2007)
Sentiment (in comments)
Some ideas for mining projects
Mining ideas
● How do programming idioms develop and
spread?
● How do projects reach a critical mass of
developers and become “popular”?
● Are metrics like cyclomatic complexity, fan
out and Halstead’s complexity measure
useful, or are they all just proportional to
kLOCs?
Thank you.

More Related Content

Similar to Mining python-software-pyconuk13

Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesLessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesVMware Tanzu
 
Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Abhishek Mishra
 
Exadata cell update
Exadata cell updateExadata cell update
Exadata cell updatepat2001
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentMatt Graham
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimizationEric Guo
 
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizAdvanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizSuthep Sangvirotjanaphat
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Scott Keck-Warren
 
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxCisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxclarebernice
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
Is your Magento fast enough?
Is your Magento fast enough?Is your Magento fast enough?
Is your Magento fast enough?Giannis Economou
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekrantav
 
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаМы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаNikita Prokopov
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployPeter Gfader
 
Continuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnContinuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnAndreas Grabner
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfOracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfAlex446314
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)DevOps Indonesia
 

Similar to Mining python-software-pyconuk13 (20)

Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesLessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
 
Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010
 
Exadata cell update
Exadata cell updateExadata cell update
Exadata cell update
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous Deployment
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimization
 
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizAdvanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023
 
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxCisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
 
Shell_Rec
Shell_RecShell_Rec
Shell_Rec
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Is your Magento fast enough?
Is your Magento fast enough?Is your Magento fast enough?
Is your Magento fast enough?
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаМы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeploy
 
Continuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnContinuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptn
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfOracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdf
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Mining python-software-pyconuk13