AmCAT is a document management and analysis platform for the social sciences and humanities. It allows users to input, manage, and analyze documents through features like keyword analysis, linguistic processing, and manual annotation. AmCAT is open source, uses open standards, and aims to provide open access to data. It is built using Django and separates business logic from presentation, with all data and methods exposed through a REST API for accessibility from scripts.
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Using Django for a scientific document analysis (web) application
1. AmCAT3
Using Django for a scientific document analysis
website: Tastypie, unit tests, R, open platforms
and open questions
Wouter van Atteveldt (VU Amsterdam)
2. AmCAT
What is AmCAT?
Design considerations
Open data and the publication cycle
Tables, TastyPie, and R
Unit tests
3. What is AmCAT?
Document management and analysis
Aimed at social sciences and humanities
Input: scraping, uploading
Management: projects, selections
Analyses: keyword analysis, linguistic processing
(lemmatizing etc), manual annotation
Open source, open standards, open access
4.
5.
6.
7.
8. Design Choices
Default Django: web site backed by a database
AmCAT: database with a web front end
9. Design Choices
Default Django: web site backed by a database
AmCAT: database with a web front end
Data should be accessible from outside
ORM should be usable without web site code
DB should be final authentication/authorisation
10. Design choices
Separate 'apps' for business, presentation
Custom authentication middleware and user
management
save() and update() with using=
database-specific code for creating users
We don't actually like this too much...
All data and methods (should be) exposed
through web service API
11. Open data and Publication Cycle
AmCAT Navigator
(web site)
REST API
ORM
(web service)
(django)
Relational
SPARQL External scripts
DB
End point (Python, R, ...)
12. Open access publication cycle
Source: Analysis: Publication:
DANS/AmCAT3
(Linked) data R, matlab, ... e.g. Sweave PDF + hyperlinks
Web service + Latex
Structured data?
'data link'
from site
Links back to
13. Tastypie + Datatables
Django Model-based REST api
Jquery datatables with AJAX call
The good news:
It works
Unified point of entry for tables in website and
scripts
The bad news:
Tastypie code horribly redundant
(Unless we're doing it wrong!)
14. Unit tests
Web pages tough to test well
Move as much code as possible from
presentation to business layer
Trivial views need less testing
Regular python modules easy to test
Our choices:
Put all unit tests in the 'target' module
Put more complicated integration tests in tests/
package
15. Bonus slide: Plugins
Django (model)forms as interface description
for plugins
Plugins callable from web site, as web service,
and from cli
Single point of entry for actions
(relation with REST data modification?)
Notes de l'éditeur
Met open standards, open access bedoel ik dat de gegevens ontsloten zijn voor 'alle clients', niet alleen voor 'AmCAT' of python scripts, met behulp van open standards als SQL, RDF, HTTP, XML, JSON, etc. Op die manier kan een onderzoeker een eigen script schrijven in R, Perl, etc dat met AmCAT communiceert.
Waarbij een codebook ook een taxonomie / ontologie / etc genoemd kan worden...