Open source caqdas what is in the box and what is missing
1. Computer-Aided Qualitative Research Europe
7 & 8 Oct 2010, Lisbon
For more information about our events, please visit:
http://www.merlien.org
2. Open Source CAQDAS: What is in
The Box and What is Missing
CAQR Europe 2010, Lisboa, PT
Francisco Freitas | Junior Researcher | francisco.freitas@ipleiria.pt
3. I. The Open Source Movement
• 1983: Richard Stallman and the announcement of the
GNU Project (“GNU’s Not Unix”)
• Aim: [to develop] ‘a sufficient body of free software
[...] to get along without any software that is not free.’
• ‘The word free in free software pertains to freedom,
not price.’ (freedom to copy, freedom to re-use…)
• GNU: clear social and political objectives since the
beginning, not simply a technical ‘apparatus’
3
4. I. The Open Source Movement
• Writing a complete operating system is a vast task that
implies a kernel, compilers, editors, text formatters, mail
software, etc.
• 1985: Free Software Foundation (FSF) is constituted to help
the development of GNU
• 1991/92: all the major components assembled, GNU/Linux
kernel unveilled, finally a free Operating System!
• 1998: Open Source Initiative (OSI) is jointly founded by Eric
Raymond and Bruce Perens
4
5. I. The Open Source Movement
• The ‘open source’ label is coined is 1998, partly ‘to dump
the moralizing and confrontational attitude that had been
associated with free software’
• Main priciples: production by peers, through exchange and
collaboration, with a final product and source code usually
available to all at no cost
• Open Source: more than to ensure the access to the source
code
• OS is very succesful in some fields (e.g. server appliances,
database management, GIS)
5
6. Open Source Definition (by OSI)
– Free Redistribution – No Discrimination
– Source Code Against Fields of
– Derived Works Endeavor
– Integrity of The Author's – Distribution of License
Source Code – License Must Not Be
– No Discrimination Specific to a Product
Against Persons or – License Must Not
Groups Restrict Other Software
– License Must Be
Technology-Neutral
6
8. Free and Open Source Software (FOSS)
• Different philosophies, different cultures, different distribution
methodologies, a similar development models…
• FOSS: Inclusive concept to cover both free software and open
source software
– Free Software and the philosophical freedoms
– Open Source and the advantages of the peer-to-peer
development model (‘Wikinomics’)
• Liberally licensing to grant the right to use, study, change and
improve through the availability of its source code
• Benefits to all the involved players, avoiding biased conceptions 8
9. Licencing under FOSS
• The need to protect users and developers
(regulation)
• Created by the FSF, aproved by OSI
• An economical dimension on its own:
FOSS is not anti-capitalistic!
9
10. Licencing under FOSS
• The most important:
– GPL: GNU Public License
• Copyleft: characteristics of the original software are transposed
to every subproduct or every new version
• Protects the authors since every distribution is marked
• Forbids the inclusion of libraries in proprietary software
– LGPL: Lesser GNU Public License
• More appropriated to deal with software libraries (subroutines,
coded lines or classes that creat functionalities into programs)
• Libraries are important in terms of sharing funcionalities
between applications
• Under LGPL, libraries may be inclueded in proprietary software
10
11. Proprietary Software
• Closed development (Copyright)
• Restricted licencing (no modifications, no further distribution, no
reverse engineering)
• Software usage under certain conditions
• The source code is locked
• The user is not able to adapt the software to his needs
• Restrictions enforced either by legal and/or technical means
11
12. Why FOSS?
• Financial advantages
– Avoids licencing and the required administrative support
– Less hardware needs
– Technical support on demand
– Better technological adaptation to different economical contexts
– Equal opportunities
• Usage advantages
– Licencing flexibility (e.g. number of instalations, users or local clients)
– Can be fully tested for free
– Decisions not influenced by a particular offer, but instead considering the best
options to solve any need (e.g. a bundle of applications)
– No comercial monopolys, no chains to a particular brand
– New mentality: peering and cooperation
– Free data access (e.g. corporations and individuals using the same file formats)
12
13. Why FOSS?
• Technological advantages
– Free standards (interoperationality and improved access to
data)
– Adaptability
– Bugs and other issues my be solved faster
– A support community (linking developers and end-users)
– Proved reliability and quality (e.g. mission-critical usages)
– Improved knowledge of the information output
– Local parametrizations and the development of software
tools
13
14. II. CAQDAS (Fielding & Lee, 1995)
• Main features to handle the data (Lewins &
Silver, 2007):
– Content searching tools
– Linking tools
– Coding tools
– Query tools
– Writing and annotation tools
– Mapping or networking tools
14
16. Proprietary CAQDAS Packages
• ‘Theory-building support tools’ (di Gregorio & Davidson, 2008)
• Extremely complete set of features
• User-friendly interfaces
• Substantial differences between the offers available (Lewins &
Silver, 2009)
• Specific CAQDAS packages to meet particular methodologies
(Koenig, 2004; Weitzman and Miles, 1995)
• A good value for money?!
16
18. • Runs over Windows and Linux
• Code and retrieve package
• Simple user interface
• Limited functionalities
• Last version: 1.0.1 (2006!)
• Imports plain text and PDF’s
• Memoing
• Tree structure to organize categories
• Searches using boolean queries
• Exports do HTML and CSV formats
• Single project file (*.qdp)
• Simple coding statistics
18
19. • Cloud computing platform
• Efficiently code raw text data sets
• Annotate coding with shared memos
• Manage team coding permissions via the Web
• Create unlimited collaborator sub-accounts
• Assign multiple coders to specific tasks
• Easily measure inter-rater reliability
• Adjudicate valid & invalid coder decisions
• Report validity by dataset, code or coder
• Export coding in RTF, CSV or XML format
• Import Plain Text, HTML, CAT XML, merged ATLAS.TI Coding
• Archive or share completed projects
• No multi-stream support
19
20. • For Windows and MAC OS
• Multi-stream analysis package
• ‘Inexpensive, not cheap’
• Multi-user collaboration
• Data management features
• Reporting
• Data mining and hypothesis testing
• Search tools with boolean operators
• Visual tools
• Complete set of output options
• Advanced analytic analysis options
• Memoing capabilities
• Current version. 2.42 (20100902)
20
21. • For MAC OS and Linux
• Text retriever/content analysis package
• Designed for ethnographic and discourse research
• PDF coding and analysis support
• Image and Video coding and analysis support
• Multi-user support using MySQL as a server
• XML file formats
• Complex search of information
• Multiple coding possibilities
• Last version: 4.00b4 (20100825)
• MAC version has full support for transcription
• Hierarchical coding system
21
22. • R integration (a add-on for qualitative
analysis in a statistical software
package)
• For Windows, Linux, MAC
• Import documents from plain text or
on-the-fly Support non-English
documents
• File Editing after coding
• Memos of documents, codes, coding,
project, files and more
• Single-file (*.rqda) format, which is
basically SQLite database
• Facilitator helps to categorize files and
codes
• File atributes (content analysis)
• Write and organize field work journals
• Boolean operations and, or, not for
codings, files or cases coded by codes
22
23. • Digital Records for e-Social Science (DReSS)
• Work in progress...
• Runs over Windows and MAC platforms
• Digital records:
• Traditional qualitative data
• System logs
• Time
• Concordancer engine (data interrogation across different streams and
time-based synchronization)
• Multi-stream support
• Multiple formats import
• Transcription tools
• Memoing and annotation
• Hierarchical or un-hierarchical coding scheme
• Innovative features
• Heterogeneous data, interactions analysis, non-verbal communication 23
24. IV. Thinking Outside The Box…
• The FOSS CAQDAS project is still missing (e.g. a single package,
containing a very complete set of features)
• An extensive network to bring together researchers, practitioners,
trainees, developers, programmers…
• …and simultaneously to improve the existent linkages
• Not necessarily free, but open source at least!
• Established companies may redesign their businesses models (e.g.
to grant access to specific parts of the source code and foster the
development of this branch of technology)
24
25. Ethical Concerns
• Does FOSS CAQDAS packages formulate new
ethical issues?
– Probably not: the analysis and the dataset remain
the key-elements, not the piece of software
itself…
– Unless we start thinking about Web 2.0 disposals
(e.g. metadata, paradata)
25
26. • An example of another Academia bound information tool
• Software to index, organize, reference and share research
papers
• Both a desktop application and a website
• A fruitful example of a collaborative project
• A step forward through read-only formats (dynamic
usage)
• Hosted by a private company (closed-source)
• Multiplatform (Windows, MAC, Linux)
• Web 2.0 device:
– Collective data from users available (Pros & Cons)
– Open API [Application Programming Interface]
26