The existing large amount of OSS artifacts has provided abundant materials for understanding how code is reused in open source universe, in particular, what code pieces are mostly reused, in what circumstances people reuse code, and so forth. Understanding this process could help with legacy software maintenance, as well as help to explore best practice of software development. Targeting the change history data of thousands of open source projects, we try to answer the following question: First, how is code reused by other projects? Second, how are code files organized in project and how does this organization structure change over time? To answer these questions, there are several technical difficulties we have to overcome. For example, because of the different kinds of VCSs, it is hard to figure out a uniform model which can represent the evolution progress of code files stored in them. Also, each VCS may have its own data format, so, extracting data from them is a big challenge. Furthermore, using current software algorithm and hardware platform to analyze the version iteration and reuse information of about a billion code files is another challenge.
Introduction of Trustie Software Repository & Passion-Lab Data Center, OW2con'12, Paris
1. Introduction of Trustie Software Repository
& Passion-Lab Data Center
Meng Li & Minghui Zhou
Peking University, Beijing, China
Software Institute, School of Electronics Engineering and
Computer Science, Peking University
Key Laboratory of High Confidence Software Technologies,
Ministry of Education
12/4/12
2. Agenda
• Introduction of TSR
• Trustworthiness Evaluation in TSR
• Passion-Lab Data Center
12/4/12
3. Introduction of TSR
• Background
– Software reuse aims to improve software development via
reusing software resources (SR).
– Typical SR: components, Web services, tools, architectures, etc.
– Software resource repository is the infrastructure that pro
vides the SR management mechanism to support software r
euse.
Code
– Such as publishing, retrieving, classification, storage, feedback, evaluati
on, etc. API Software Resource
Repository
JAR
A1 A3
A
Web
A2
R A4
B B1
Services
B2
12/4/12
4. Introduction of TSR
• About TSR
– Trustie Software Resource Repository (TSR) is to provid
e mechanism to describe, collect, evaluate, classify and ma
nage SR’ trustworthiness, to support trust software develo
pment.
– TSR is part of Trustie Project (Major 863 project, China)
– Trustie Project: http://www.trustie.net/
– TSR: http://tsr.trustie.net/
– TSR is developed by Peking University
12/4/12
5. Introduction of TSR
• Functionalities of TSR
– Resource management
– Publishing/editing/downloading/deleting resources
– Retrieving resources
– Recommending resources
– User management
– Registration, sign in/out, etc.
– Feedback
– Rating or comment
– Quality template-based feedbacks
– Tag management
– Trustworthiness evaluation
– Statistics
12/4/12
8. Introduction of TSR
• Application of TSR
– TSR has been deployed in several Software Incubators & C
ompanies throughout China.
– Installation for OW2 in Grenoble, France (in Nov. 2012).
12/4/12
9. Agenda
• Introduction of TSR
• Trustworthiness Evaluation in TSR
• Passion-Lab Data Center
12/4/12
11. Trustworthiness Evaluation in TSR
• Trustworthiness Evaluation Methods
– Feedback-Based Trustworthiness Evaluation (FBTE)
– Internet-Based Trustworthiness Evaluation (IBTE)
12/4/12
12. Feedback-Based Trustworthiness Evaluation (FBTE)
• Rationales behind FBTE:
– When I want to find and reuse a trustworthy SR,
the most straightforward way is to choose SRs that are iden
tified trustworthy by other users.
– In other words, users’ feedbacks are an important kind of
trustworthy evidences.
– Moreover, if the user is trustworthy (e.g. expert), his feedbac
ks is more likely to be trustworthy.
• We collect users’ feedbacks in TSR, and provide FBT
E to help users to find and reuse SRs.
12/4/12
13. Feedback-Based Trustworthiness Evaluation (FBTE)
• FBTE is based on the feedback functionality in TSR
– Simple feedback: rating & comment
– Template-based feedback: detailed feedback (stability, secu
rity, portability, etc.)
12/4/12
14. Feedback-Based Trustworthiness Evaluation (FBTE)
• FBTE Procedure:
a) TSR admin assign trustworthiness evaluation experts;
b) Experts evaluate the trustworthiness of a SR by providing t
emplate-based feedback;
c) TSR admin confirm the feedbacks, and the latest overall ra
ting is taken as SR’s trustworthiness level.
• FBTE is straightforward but widely used in software r
epositories/portals.
12/4/12
15. Internet-Based Trustworthiness Evaluation (IBTE)
• Background
– SRs are diversifying
– From closed, static, code
– To open, dynamic, service
– With the development of Web 2.0,
there are more and more informatio
n on the Internet
– Feedbacks on different websites
• E.g. comments and ratings on Seekda,
Service-finder
• We try to collect trustworthy evid
ences from the Internet and eval
uate the trustworthiness of SRs i
n TSR with them.
12/4/12
16. Internet-Based Trustworthiness Evaluation (IBTE)
• Framework Overview
Collecting
SR List SR Information
TSR Related Info
d from Internet
te
ela
f R tion
Tr
Me wort tion nt
o a n
us orm eme
od rm ctio
Inf nag
t h o h in
t
Ma
h
et fo le
d o es s
M In Col Trustworthiness evaluation
f
a
system Trustworthy
Trustworthy
Trustworthiness Evidence
Level
Evaluation Model xtraction Extraction
SR
fevidence e
Method o
Item of
Item of Evidence
SR Evidence
Tr Met
List us ho
t d
Ev wort of
alu hin Trustworthiness
at i es Evidence
on s Evaluation Storage
12/4/12
18. Internet-Based Trustworthiness Evaluation (IBTE)
• Collecting trustworthy evidences from the Internet
– Objective evidences (QoS)
– Implement a QoS monitor, invoke Web services on a regular basis.
– Over 2 million records
– Subjective evidences (Reputation)
– Calculated with users’ feedbacks including ratings and comments
NO. Comment Result
– For comments, we first apply sentiment analysis
1 It works perfectly. Positive (+1)
2 Cool service! Positive (+1)
3 No data returned Negative (-1)
Gave a wrong result ("false") on a
4 Negative (-1)
valid email address. Useless.
5 It’s not working. Negative (-1)
12/4/12
20. Agenda
• Introduction of TSR
• Trustworthiness Evaluation in TSR
• Passion-Lab Data Center
12/4/12
21. What are the best practices in OSS?
• Research question:
– How people reuse code over time?
– Which code is reused most often?
– What attributes they have?
• Build infrastructure to review code reuse across OSS
universe
12/4/12
23. Infrastructure
• We keep tracking various commercial and open sourc
e projects.
• This “universal” repository records data from:
– Version control
– Issue tracking
– Email archives
– ……
12/4/12
24. Hardware and data levels
• Machines
– DELL R910(4U), 64GbRAM, 16-cores X7550
– DELL MD3200, 12*2TB SAS
– DELL R710 * 4, 64GbRAM
• Data Levels
– Level0: raw data
– Level1: filtered data
– Level2-n: customized data
12/4/12
25. What we could do with this data?
• To enable better eco-system, user experience, and pr
actices … …
– Understand the past, predict the future
HTTP://passion-lab.org
A cloud for OSS best practices
based on issue tracking data, version
control data and …
12/4/12
26. Related Websites
• Trustie Software Repository
http://tsr.trustie.net
• CoWS Web Services Search Engine
http://www.cowebservices.com
• Trustworthiness Evaluation
http://pingji.trustie.net
• API Example
http://www.apiexample.com
• Passion-Lab
All the systems are
http://passion-lab.org
• TSR @ OW2 Open Source
http://tsr.ow2.org/
12/4/12