Looking for a database where user profiles and image galleries are equally at home? That comes with built-in full text search, fine-grained access control, flexible schemas, versioning and many more advanced features? Take a look at Apache Jackrabbit, the Java-based content repository that combines the best parts of file systems and databases. This introductory presentation covers Apache Jackrabbit and its hierarchical content model, and shows how it can be used as a powerful foundation of modern content-based applications.
2. Outline
• Repository model
• Property and node types
• Sessions and namespaces
• References and versioning
• Search and observation
• Access control
• Persistence and clustering
• Deployment and configuration
• Questions?
7. Node structure
Property name Type Value
jcr:primaryType Name nt:unstructured
jcr:mixinTypes Name[] mix:referenceable
jcr:uuid String c6d27a10-bf23-11e3-b…
title String My new node
author String Jukka Zitting
Child nodes
foo, bar, baz[1], baz[2]
9. Common property types
Property type Used for Examples
String Short to medium-sized text “foo”, “This paragraph…”
Binary Binary data and long text PNG, PDF, “This book…”
Name Node and property names “nt:folder”, “content”
Path Node and property paths “/jcr:system”, “/etc/map”
Boolean, Long, Double Scalar data true, 0, -2846, 3.14, NaN
Date ISO 8601 timestamp 2014-04-08T12:00:00.000Z
Reference Graph structures c6d27a10-bf23-11e3-b…
10. Multi-valued properties
• Zero or more values
• Limit at around 10-100k values, depending on size of values
• All values must be of the same type
• Duplicates allowed
• No “null” values
• Automatically removed
• Order is preserved
16. Session
• All content access goes through a session
• Sessions are created with an authenticated login() call
• Session-based authorization of reads, writes and other operations
• Tracking of transient changes
• Atomic save()
• Not thread-safe!
• for concurrent operations, use multiple sessions
17. Namespaces
• The repository has a set of prefix -> URI namespace mappings
jcr: http://www.jcp.org/jcr/1.0
nt: http://www.jcp.org/jcr/nt/1.0
mix: http://www.jcp.org/jcr/mix/1.0
xml: http://www.w3.org/XML/1998/namespace
etc.
• Used to prevent naming conflicts between different clients
• Each session can override (non-default) mappings locally
• designed for cases like XML imports, etc.
• in practice seldom used, and often not recommended
20. References, cont.
• hard references
• enforced integrity; target can not be removed
• least flexibility; think twice before using
• weak references
• remains valid across moves/renames
• paths, names, URLs, etc.
• no backreferences
22. Versioning, cont.
• To make a node versionable, add the mix:versionable mixin
• scope of “versionability” determined by node types (OPV)
• A checkin freezes a piece of content and makes a copy of it in the
version history
• A checkout unfreezes the content and allows it be modified
• A restore goes back in time to a previously checked in version
• A merge combines changes from another workspace to those made
in this workspace
24. Search examples
// find all PDF files within this workspace, most recent first
SELECT * FROM [nt:file]
WHERE [jcr:mimeType] = ‘application/pdf’
ORDER BY [jcr:lastModified] DESC
// find all content about Christmas within my blog
/jcr:root/sites/myblog//*[jcr:contains(., ‘Christmas’)]
25. Search
• By default all content is indexed
• Configurable per repository
• Support for full text search
• Also binaries indexed with automatic text extraction
• Full access control of search results
• However:
• Limited join support/performance
• No facets or aggregate queries
26. Observation
• An observation listener can select to receive events
• on changes of specified types
• on changes at or below a specified path
• on changes at nodes with specified identifiers
• on changes at nodes of specified types
• The events are delivered in asynchronous callbacks
• Remember the non-thread-safety of sessions!
• Often used to maintain a cache of expensive-to-compute data
28. Access control
• Fine-grained, ACL-based access control
• Applies to all content accesses
• Writes
• Reads
• Search
• Observation
• etc.
• Support for custom privileges
• e.g. an “execute” privilege
36. Deployment packages
• jackrabbit-webapp
• basic web interface (still no content browser/editor)
• exposes the repository through JNDI, WebDAV, RMI
• jackrabbit-standalone
• runnable jar
• jackrabbit-webapp plus embedded Jetty
• basic tooling: backup/migration, CLI, etc.
• jackrabbit-jca
• designed for full J2EE environments
• support for managed transactions
37. Embedded deployment
• jackrabbit-core plus all dependencies
• Maven recommended
• slf4j used for logging
• Full control over the repository
• Extra work to make the repository externally manageable
38. Repository configuration
• repository.xml
• main repository configuration file
• security, clustering, data store, /jcr:system, etc.
• workspace.xml
• configuration of each workspace
• persistence manager, search index, etc.
• automatically created based on template in repository.xml
• indexing_configuration.xml
• optional, customizes the search index
• see http://jackrabbit.apache.org/jackrabbit-configuration.html