SlideShare une entreprise Scribd logo
1  sur  15
Linguistic diversity in
open-source development


 Bogdan Vasilescu
 Alexander Serebrenik
 Mark van den Brand
Motivation


                             …                           …

      I „speak‟ Java                        C
                                                   C++                        …
                                                           HTML
                                     Lisp
                                                              XML
                                     Java

                                                         Python
                                            Unix shell
           I „speak‟ Java                                           I „speak‟ Python
             and Python



/ Mathematics and Computer Science                                      23-4-2012   PAGE 1
Motivation

If                                   leaves the project, what is the risk of not finding
                                        replacement developers that speak Python?


     No risk, plenty of other Python                            What about now?
     developers to choose from




/ Mathematics and Computer Science                                           23-4-2012   PAGE 2
Linguistic diversity

      • Greenberg (1956)
           • compare geographic regions
           • probability that two random individuals do not speak the
             same language




/ Mathematics and Computer Science                              23-4-2012   PAGE 3
Linguistic diversity

    Probability that two random individuals do not speak the same language


                • Simple model
                     • everyone speaks exactly one language
                     • languages are independent


                                               2                           S
                       A 1                 p                 p
                                      L                                       P




/ Mathematics and Computer Science                                 23-4-2012   PAGE 4
Linguistic diversity

    Probability that two random individuals do not speak the same language


                • Related-languages model
                     • everyone speaks exactly one language
                     • languages are similar

                                                                           S
                         B 1             p pm sim(, m)      p
                                     ,m L                                     P
                                         0   sim(, m) 1
                                         sim(, ) 1
/ Mathematics and Computer Science                                 23-4-2012   PAGE 5
Linguistic diversity

    Probability that two random individuals do not speak the same language


                • Polyglot related-languages model
                     • everyone speaks at least one language
                     • languages are similar
                                                                  sim(, m)
                                                             s ,m t                       Xs
                      F 1                           ps pt                     ps
                                     s ,t P ( L )                  s t                         P

         L           A, B, C                P ( L)          A, B, C , AB, AC , BC , ABC

/ Mathematics and Computer Science                                                 23-4-2012   PAGE 6
Our risk measure

    • Probability that two random individuals do not speak the
      same language
                                                                  sim(, m)
                                                             s ,m t
                          F 1                       ps pt
                                     s ,t P ( L )                  s t

    • Risk of not finding developers that „speak‟ 

                risk () 1                      ps maxk s sim (k )
                                      s P( L)




/ Mathematics and Computer Science                                            23-4-2012   PAGE 7
StackOverflow.com




/ Mathematics and Computer Science   23-4-2012   PAGE 8
User tags




/ Mathematics and Computer Science   23-4-2012   PAGE 9
Similarity measure

      •    Reverend Gonzo: Java, C, C++, C#, Python,…
      •    Alexander Serebrenik: Prolog, SQL, C++,…
      •    Bogdan Vasilescu: Python,…
      •    Jon Skeet: C#, Java, ASP.net, XML,…
      •    … > 400,000

      • Association rule mining:
                                                               Java
           • “C => Java”
                                             nBoth   C
          sim k             conf    k   
                                             nLeft


/ Mathematics and Computer Science                       23-4-2012   PAGE 10
Similarity measure - results




     • Assembly posts: 44
     • Assembly + Java developers: > 1000
      When in need for Java developers, ask Assembly guys


/ Mathematics and Computer Science                  23-4-2012   PAGE 11
Case study - Emacs

  • 1985-2012: C, Emacs Lisp, C++, Java, Lisp, Python, M4, … (26)

                                                 Exotic languages
                                                 High/low risk




/ Mathematics and Computer Science                        23-4-2012   PAGE 12
Case study - Emacs

                        C: spoken by half of the community
                        + similar to other languages         Python: spoken very sporadically
                        low risk                            + similar to other languages
                                                              low risk




/ Mathematics and Computer Science                                            23-4-2012   PAGE 13
Conclusions

                                        What is the risk of not finding developers
                                        that speak Python?

   • Risk measure                       risk () 1         ps maxk s sim (k )
                                                     s P( L)
   • Similarity measure (StackOverflow)
      • “C => Java” sim k conf k                                  nBoth
                                                               
                                                                   nLeft

                                     Low risk                       Depends on similarity




/ Mathematics and Computer Science                                           23-4-2012   PAGE 14

Contenu connexe

Similaire à IPA Spring Days 2012

The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languagesgreenwop
 
02 c a306-phillips_langtags
02 c a306-phillips_langtags02 c a306-phillips_langtags
02 c a306-phillips_langtagssuvo1111
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpRikki Wright
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceBasis Technology
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic WebOscar Corcho
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009spierre
 
Why Languages Matter 20090123
Why Languages Matter 20090123Why Languages Matter 20090123
Why Languages Matter 20090123David Wood
 
Advanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical MarkupAdvanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical MarkupOlga Caprotti
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back AgainMarkus Voelter
 
MLE_keynote.pdf
MLE_keynote.pdfMLE_keynote.pdf
MLE_keynote.pdfmiso_uam
 
Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Vsevolod Dyomkin
 
ELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languagesELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languagesDafydd Gibbon
 
Lecture 2: Language
Lecture 2: LanguageLecture 2: Language
Lecture 2: LanguageDavid Evans
 

Similaire à IPA Spring Days 2012 (20)

The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languages
 
40cpv9ekrit7h1h772c3hp1mg2 (2)
40cpv9ekrit7h1h772c3hp1mg2 (2)40cpv9ekrit7h1h772c3hp1mg2 (2)
40cpv9ekrit7h1h772c3hp1mg2 (2)
 
02 c a306-phillips_langtags
02 c a306-phillips_langtags02 c a306-phillips_langtags
02 c a306-phillips_langtags
 
Prolog & lisp
Prolog & lispProlog & lisp
Prolog & lisp
 
About programming languages
About programming languagesAbout programming languages
About programming languages
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4Aussenac semanticsnl pwebsem2017-v4
Aussenac semanticsnl pwebsem2017-v4
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
 
Representing Translations on the Semantic Web
Representing Translations on the Semantic WebRepresenting Translations on the Semantic Web
Representing Translations on the Semantic Web
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
 
Why Languages Matter 20090123
Why Languages Matter 20090123Why Languages Matter 20090123
Why Languages Matter 20090123
 
HLT
HLTHLT
HLT
 
Advanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical MarkupAdvanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical Markup
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back Again
 
MLE_keynote.pdf
MLE_keynote.pdfMLE_keynote.pdf
MLE_keynote.pdf
 
Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?
 
ELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languagesELKL 4, Language Technology: learning from endangered languages
ELKL 4, Language Technology: learning from endangered languages
 
Esa act
Esa actEsa act
Esa act
 
Lecture 2: Language
Lecture 2: LanguageLecture 2: Language
Lecture 2: Language
 

Plus de Bogdan Vasilescu

Plus de Bogdan Vasilescu (7)

ICSM 2012 ERA
ICSM 2012 ERAICSM 2012 ERA
ICSM 2012 ERA
 
ICSM 2011
ICSM 2011ICSM 2011
ICSM 2011
 
Benevol 2011
Benevol 2011Benevol 2011
Benevol 2011
 
Sattose 2011
Sattose 2011Sattose 2011
Sattose 2011
 
Master Thesis presentation
Master Thesis presentationMaster Thesis presentation
Master Thesis presentation
 
Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011
 
WETSoM 2011
WETSoM 2011WETSoM 2011
WETSoM 2011
 

Dernier

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 

Dernier (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

IPA Spring Days 2012

  • 1. Linguistic diversity in open-source development Bogdan Vasilescu Alexander Serebrenik Mark van den Brand
  • 2. Motivation … … I „speak‟ Java C C++ … HTML Lisp XML Java Python Unix shell I „speak‟ Java I „speak‟ Python and Python / Mathematics and Computer Science 23-4-2012 PAGE 1
  • 3. Motivation If leaves the project, what is the risk of not finding replacement developers that speak Python? No risk, plenty of other Python What about now? developers to choose from / Mathematics and Computer Science 23-4-2012 PAGE 2
  • 4. Linguistic diversity • Greenberg (1956) • compare geographic regions • probability that two random individuals do not speak the same language / Mathematics and Computer Science 23-4-2012 PAGE 3
  • 5. Linguistic diversity Probability that two random individuals do not speak the same language • Simple model • everyone speaks exactly one language • languages are independent 2 S A 1 p  p  L P / Mathematics and Computer Science 23-4-2012 PAGE 4
  • 6. Linguistic diversity Probability that two random individuals do not speak the same language • Related-languages model • everyone speaks exactly one language • languages are similar S B 1 p pm sim(, m) p ,m L P 0 sim(, m) 1 sim(, ) 1 / Mathematics and Computer Science 23-4-2012 PAGE 5
  • 7. Linguistic diversity Probability that two random individuals do not speak the same language • Polyglot related-languages model • everyone speaks at least one language • languages are similar sim(, m)  s ,m t Xs F 1 ps pt ps s ,t P ( L ) s t P L A, B, C P ( L) A, B, C , AB, AC , BC , ABC / Mathematics and Computer Science 23-4-2012 PAGE 6
  • 8. Our risk measure • Probability that two random individuals do not speak the same language sim(, m)  s ,m t F 1 ps pt s ,t P ( L ) s t • Risk of not finding developers that „speak‟  risk () 1 ps maxk s sim (k ) s P( L) / Mathematics and Computer Science 23-4-2012 PAGE 7
  • 9. StackOverflow.com / Mathematics and Computer Science 23-4-2012 PAGE 8
  • 10. User tags / Mathematics and Computer Science 23-4-2012 PAGE 9
  • 11. Similarity measure • Reverend Gonzo: Java, C, C++, C#, Python,… • Alexander Serebrenik: Prolog, SQL, C++,… • Bogdan Vasilescu: Python,… • Jon Skeet: C#, Java, ASP.net, XML,… • … > 400,000 • Association rule mining: Java • “C => Java” nBoth C sim k conf k  nLeft / Mathematics and Computer Science 23-4-2012 PAGE 10
  • 12. Similarity measure - results • Assembly posts: 44 • Assembly + Java developers: > 1000  When in need for Java developers, ask Assembly guys / Mathematics and Computer Science 23-4-2012 PAGE 11
  • 13. Case study - Emacs • 1985-2012: C, Emacs Lisp, C++, Java, Lisp, Python, M4, … (26) Exotic languages High/low risk / Mathematics and Computer Science 23-4-2012 PAGE 12
  • 14. Case study - Emacs C: spoken by half of the community + similar to other languages Python: spoken very sporadically low risk + similar to other languages  low risk / Mathematics and Computer Science 23-4-2012 PAGE 13
  • 15. Conclusions What is the risk of not finding developers that speak Python? • Risk measure risk () 1 ps maxk s sim (k ) s P( L) • Similarity measure (StackOverflow) • “C => Java” sim k conf k nBoth  nLeft Low risk Depends on similarity / Mathematics and Computer Science 23-4-2012 PAGE 14